I dedicated some time to study and build compilers. It is a very challenging and stimulating activity, since it involves many Computer Science areas.
The project Woodpecker Lexical Analyzer is a simple lexical analyzer that I wrote to understand better how this stage works.
Lexical analysis is the first phase of the compilation process. It analyses individual characters of the source code and break streams of characters into a series of tokens, and removes any white space and comments.
For example, given the input string:
program CreateAccount: begin end
The result is:
<program, keyword>, <CreateAccount, identifier>, <:, symbol>, <begin, keyword>, <end, keyword>
The Woodpecker Lexical Analyzer reads the source code
woodpecker.language and generate a list of tokens according to the lexical convention of the language Woodpecker
I would love to have some feedback, so feel free to share this code and suggest improvements.
Source code is at here
You can read more about Lexical analysis at https://en.wikibooks.org/wiki/Compiler_Construction/Lexical_analysis