Ten years ago I wrote a lexer myself for my own language, Q, because I didn't kn...

purewater · on Aug 23, 2011

> However, for the most part, the "efficiency" argument is off base. You can COMPILE regular expressions into tight code.

Rob Pike has written multiple regular expression engines and at least 4 programming languages. I think he understands the efficiency part pretty well.

dlsspy · on Aug 23, 2011

> However, for the most part, the "efficiency" argument is off base. You can COMPILE regular expressions into tight code.

When I started programming ocaml, I was surprised to find no string primitives for things like split, find, etc... There was only regex. I didn't want to write a regex for doing a simple find or split, so I wrote a string library for my programs. My string library was consistently 4x faster than regex and did only exactly what I wanted.

yowiesteve · on Aug 23, 2011

I'm in the same boat just recently, only I knew about regular expressions but was intimidated by them. My first attempt used regular expressions, but got to a point where I made faster progress rolling my own lexer. I was surprised at how much fast simple looping over characters were. It was about twice as fast as my regular expression implementation.

nandemo · on Aug 23, 2011

You don't have to implement a state machine yourself. You can use a lexical parser generator such as flex.

psykotic · on Aug 23, 2011

The lex program and its descendants are generally regarded poorly by Ken and Rob and others from their tribe of Unix systems programmers. The blog post suggests why. It is often faster and easier to write the lexer in C by hand. But they do tend to put great value on yacc. Once you wrap your head around LALR shift-reduce parsing, yacc is expressive in ways that are hard to replicate with a hand-written parser.