I have moved my blog to Wordpress at theunixgeek.wordpress.com. I will still be checking back periodically on this one as well, though. 19 April 2009

featured

Merging Mkdir and Cd | 280 Slides Interview | I Switched to KDE 4

clickable portals

Thursday, October 30, 2008

How to Write a Programming Language

It's actually quite easy to write a programming language in C++, believe it or not. All you really have to do is follow Stroustrup's own example: write a program that can translate your language's code into C and compile that. After all, that's how C++ started out! Here's a step-by-step walkthrough of a simple programming language where the line printstr Hello, World! prints Hello, World! as the output. (Note: the system() calls used in this code might or might not work on non-Unix systems)

1. Include the following headers: iostream, fstream, vector, string, and cstdlib. Use the std namespace. Set up a simple main function with (int argc, char *argv[]) as arguments.

2. Let's call this language Examplang. We want the user to run the command as examplang file.expl, so let's make sure they always enter an argument following the program name:

if (!argv[1]) {
cout << "usage: " <<>
return 1; 
}

3. Let's declare some basic, yet very important, variables: a vector to read program lines into, an input file stream to read the source code, an output file stream to write to the c source code, and a string variable that holds the current line of the source code.

vector source;
ifstream source_code;
ofstream c_code;
string current_line;


4. Open up the source code and push back all its lines onto the vector. Then, get the number of lines in the source code by getting the vector's size.

source_code.open (argv[1]);
while (getline (source_code, current_line)) 
source.push_back (current_line);
int source_code_size = source.size( );

5. Now, begin looking for keywords in the source code... this part gets tricky.

As a custom, i is used as the index variable in the for loop. The for loop reads each line of the file and stores the string available in the current line in the current_string variable. Each keyword name is stored in its own string before it's used to ease access to its length later on if it's found.

Then number of characters between the space after the keyword and the last character of the line is calculated and that value is used as programmed. In the case of the printstr keyword, the string after the space following the keyword until the end of the line is printed.

int i;
c_code.open("c_output.c");
//note:due to technical restrictions, the greater-than and less-than signs around stdio.h could not be displayed  in the following line
c_code << "#include stdio.h
" <<>
for (i = 0; i <>
current_line = source.at(i); 
string printstr = "printstr"; 
if (current_line.compare (0, 9, "printstr")) {
c_code << "printf (\""; 
int until_rest_of_string = current_line.length() - printstr.length(); 
string fstring = current_line.substr (8, until_rest_of_string); 
c_code <<>
}


6. Finish the outputted C source code and compile it. Once that's done, close the file streams and end the main function.

c_code << "return 0;" <<>
system ("gcc c_output.c"); 
c_code.close(); 
source_code.close(); 
return 0;

That's it! After this, one can easily extend the language to have variables, conditional statements, loops, and whole lot else. The code worked fine on OS X (I haven't been able to test it on Fedora yet), but if there were any problems in copying the code into the post or if it doesn't work on your system, please leave a comment, which will be followed up as soon as possible.

2 comments:

Anonymous said...

Writing a programming language/compiler is not this easy. For parsing a real language, you might want to learn yacc or boost.spirit.

Ralph said...

Pity this post comes in top 5 results when googled for "How to write a programming language" :|