Not to sound like a pretensions CS guy ... but how can you have a book on regula...

mekoka · on Dec 1, 2011

I'm not a CS graduate. I first learned of state machine from an article posted here on HN a few months ago.

I learned regular expression from the regular-expression.info site and from the first 4 chapters of Jeffrey Friedl's book over 5 years ago. I don't recall an introduction to NFA/DFA from them either. I'm pretty comfortable with regex and use them regularly (no pun intended) and so far, I don't feel that I've missed much.

I'm not disagreeing with your view, but I think it's worth pointing out from someone having developed an understanding of regex from a different perspective, that NFA/DFA might not be such a strong prerequisite and people should not feel like they're going in at a disadvantage.

bad_user · on Dec 1, 2011

An understanding of finite-state machines is useful, because that way you'll understand the limitations you have when using regular expressions, which have been grossly misused.

It also helps you write optimal regular expressions - for instance if your regular expression clearly describes a DFA, then it will probably perform well. Otherwise in many cases you can optimize it - just as you can turn any NFA into a DFA.

It isn't so clear and cut these days however, as the regexp capabilities provided by modern libraries exceed the capabilities of finite state machines - for instance the engine in Perl 5.10 has support for recursion, which means a regular expression in Perl 5.10 can describe a Pushdown Automaton (PDA) instead, so things get way more complicated.

Either way, I suggest you do some reading on Automata Theory, because it's fun and not that hard.

spp · on Dec 1, 2011

Chapter 4 of Jeffrey Friedl's "Mastering Regular Expressions" goes in depth about NFA/POSIX NFA/DFA. It even has car analogies!

mekoka · on Dec 2, 2011

It must have been the 3 first chapters then. At the time, they were enough for me to understand and "get going" with regex. Various other online resources have filled little gaps here and there over time. This thread makes me want to revisit the book and resume my reading of the next 3 chapters, a long overdue promise to myself. I'm glad that the book will cover the topic of FA then.

zedshaw · on Dec 1, 2011

Yep, I'm going at it from another direction where I teach the symbols like a language, and then it makes it easier to explain how they work later. I agree that state machines are a great way to understand them, but I don't think they're practical for using them and they just complicate getting capable in it.

furyg3 · on Dec 1, 2011

Just wanted to chime in, thanks a lot for making the guide with this approach. Sysadmins, specifically, will find this to be a great resource.

Many people start out tweaking config files, editing other's scripts, manipulating logs and output data, and then eventually sit down and say "Damn, it's about time I learned how to really write good [bash/perl/python/regex]". I find the theory behind it interesting, but it's a bit of a distraction for many.

burke · on Dec 1, 2011

I'm a CS graduate, and I read the Friedl book on regular expressions in high school. I don't seem to remember learning anything about automata from it, and I know for certain that my automata course in university didn't make me any more competent at understanding or applying regular expressions.

irahul · on Dec 1, 2011

Personally I don't find NFA/DFA helpful with learning regular expressions.

To a newcomer, even a naive implementation of NFA/DFA doesn't make sense. Consider this small gist https://gist.github.com/1414061

I have a feeling a newcomer will come out none the wiser even if he were shown this. And this is just a naive implementation - no nfa to dfa conversions, inefficient implementation etc.

I would root for state diagrams though. Generally diagrams help people see patterns mere text doesn't.

marchdown · on Dec 1, 2011

I know the theory pretty well, but writing actual regexes always felt so slow and frustrating that I really welcome this kind of rote hand-holding tutorial. I might finally learn the arbitrary DSL that everyone seems to have settled on.

kd0amg · on Dec 1, 2011

I imagine that sort of implementation issue is only relevant to the typical user if that user comes across a case where the potential inefficiency of NFA->DFA conversion (or NFA simulation) actually does happen. I'm not sure how common it is to encounter this.

I'd be more concerned about making sure there is some discussion of nonregular languages; this appears to be in the works (I'm curious to see what exactly it turns out to be).

dfc · on Dec 1, 2011

Do not worry, in order to sound "pretensions" you need to be able to spell pretentious correctly;)

leeoniya · on Dec 1, 2011

i learned regex inside-out before ever learning or implementing any sort of FSM - just sayin'.