Ten years ago I wrote a lexer myself for my own language, Q, because I didn't know about regular expressions.
It was actually quite powerful. But I literally just implemented a state machine. To be honest, I think there were a couple things it did that a regular expression wouldn't be able to do.
However, for the most part, the "efficiency" argument is off base. You can COMPILE regular expressions into tight code.
> However, for the most part, the "efficiency" argument is off base. You can COMPILE regular expressions into tight code.
When I started programming ocaml, I was surprised to find no string primitives for things like split, find, etc... There was only regex. I didn't want to write a regex for doing a simple find or split, so I wrote a string library for my programs. My string library was consistently 4x faster than regex and did only exactly what I wanted.
I'm in the same boat just recently, only I knew about regular expressions but was intimidated by them.
My first attempt used regular expressions, but got to a point where I made faster progress rolling my own lexer.
I was surprised at how much fast simple looping over characters were. It was about twice as fast as my regular expression implementation.
The lex program and its descendants are generally regarded poorly by Ken and Rob and others from their tribe of Unix systems programmers. The blog post suggests why. It is often faster and easier to write the lexer in C by hand. But they do tend to put great value on yacc. Once you wrap your head around LALR shift-reduce parsing, yacc is expressive in ways that are hard to replicate with a hand-written parser.
It was actually quite powerful. But I literally just implemented a state machine. To be honest, I think there were a couple things it did that a regular expression wouldn't be able to do.
However, for the most part, the "efficiency" argument is off base. You can COMPILE regular expressions into tight code.