Archive for April 1st, 2011

Compiler Tutorial, part 2 – Lexing

In Part 1 of my compilers tutorial, I went over context-free grammars. Read that first, unless you only care about lexing.

Lexing, short for “lexical analyzing”, basically means “turn this string into something I can use”. Since I’m assuming you’ve programmed before, I’m sure you’ve had an occasion to take input from the user and turn it into smaller components you can use. Even if it’s a simple expression like “5+2”, you don’t get it as input in a nice format – but rather an obtuse string. If you’ve ever accepted input from the user and done anything other than print it right back out, you’ve written a lexer. Any sort of string-splitting or “understanding” the contents of a string requires a lexer. Even Java’s String.split(“, “) method is lexing, in a simple form.

Lexing is necessary for compilation, but also for many other programs. I’ve written too much input-handling code in my time, and I wish somebody had told me about lexing long ago. So this section of the tutorial is probably the most useful to areas other than compiler writing.

Let’s get started. Read more