The title here seems to be inaccurate. The title of the post--based on the Google cache--is "What’s in a parser? Attoparsec rewired (2/2)". It's not actually about building parsers so much as comparing the new Haskell attoparsec parser to C parsers.
I'm glad to see this improvement being made in the LL parsers in Haskell. They're much more natural to write than parsers with a bottom-up parser generator. I believe Happy is the major parser-generator for Haskell and since it uses YACC-like syntax for the grammar it starts to feel like I'm writing C into my Haskell code. Parser-generators always feel a bit ugly, but they usually yield the most efficient and fastest code.
isn't attoparsec a haskell parser? is it lazy? i am wondering if it's fast because it's not actually doing the parsing... (ie it's returning thunks that will require more evaluation before providing, say, an AST).
The article is written by Bryan O'Sullivan, author of the criterion[1] and text[2] packages and co-author of Real World Haskell[3]. If he's not the world's foremost expert on Haskell performance, he's certainly in the top 5. This benchmark undoubtedly takes laziness into account.
ok! (i didn't mean it as personal criticism - i am currently debugging/tuning my own recursive descent library (lepl) and am having a world of pain with memoisation, which is a similar problem...).
bryan, if you see this, more info on what makes that lib so fast would be greatly appreciated!
If you need to do any sort of benchmarking for Haskell libraries, I can't recommend Criterion (and Neil Brown's package Progression[1]) enough. It handles all the details for lazy evaluation (NF vs WHNF, pure vs IO, etc) quietly behind the scenes, and uses a well-designed statistics analysis library to calculate the final numbers. Don Stewart has written a short introduction, Modern Benchmarking in Haskell[2].
It's worth mentioning that Criterion can be used for anything that can be called via the FFI -- I'm using it for benchmarking Python libraries, for example.
My Haskell skills are practically non-existant, certainly not good enough to jump into that project and understand what's going on. I was hoping for a more high level explanation of what he was doing.
I'm glad to see this improvement being made in the LL parsers in Haskell. They're much more natural to write than parsers with a bottom-up parser generator. I believe Happy is the major parser-generator for Haskell and since it uses YACC-like syntax for the grammar it starts to feel like I'm writing C into my Haskell code. Parser-generators always feel a bit ugly, but they usually yield the most efficient and fastest code.