Woodpecker: a Lexical Analyzer written in Python

I dedicated some time to study and build compilers. It is a very challenging and stimulating activity, since it involves many Computer Science areas.

The project Woodpecker Lexical Analyzer is a simple lexical analyzer that I wrote to understand better how this stage works.

Lexical analysis is the first phase of the compilation process. It analyses individual characters of the source code and break streams of characters into a series of tokens, and removes any white space and comments.

For example, given the input string:

program CreateAccount:
begin
end

The result is:

<program, keyword>, <CreateAccount, identifier>, <:, symbol>, <begin, keyword>, <end, keyword>

The Woodpecker Lexical Analyzer reads the source code woodpecker.language and generate a list of tokens according to the lexical convention of the language Woodpecker

I would love to have some feedback, so feel free to share this code and suggest improvements.

Source code is at here

You can read more about Lexical analysis at https://en.wikibooks.org/wiki/Compiler_Construction/Lexical_analysis

Reply