How to write a really fast parser without going insane

Periodic · on March 3, 2010

The title here seems to be inaccurate. The title of the post--based on the Google cache--is "What’s in a parser? Attoparsec rewired (2/2)". It's not actually about building parsers so much as comparing the new Haskell attoparsec parser to C parsers.

I'm glad to see this improvement being made in the LL parsers in Haskell. They're much more natural to write than parsers with a bottom-up parser generator. I believe Happy is the major parser-generator for Haskell and since it uses YACC-like syntax for the grammar it starts to feel like I'm writing C into my Haskell code. Parser-generators always feel a bit ugly, but they usually yield the most efficient and fastest code.

ecaron · on March 3, 2010

Google's cache of the article: http://74.125.95.132/search?q=cache:www.serpentine.com/blog/...

andrewcooke · on March 3, 2010

isn't attoparsec a haskell parser? is it lazy? i am wondering if it's fast because it's not actually doing the parsing... (ie it's returning thunks that will require more evaluation before providing, say, an AST).

jmillikin · on March 3, 2010

The article is written by Bryan O'Sullivan, author of the criterion[1] and text[2] packages and co-author of Real World Haskell[3]. If he's not the world's foremost expert on Haskell performance, he's certainly in the top 5. This benchmark undoubtedly takes laziness into account.

[1] http://hackage.haskell.org/package/criterion

[2] http://hackage.haskell.org/package/text

[3] http://www.realworldhaskell.org/

andrewcooke · on March 3, 2010

ok! (i didn't mean it as personal criticism - i am currently debugging/tuning my own recursive descent library (lepl) and am having a world of pain with memoisation, which is a similar problem...).

bryan, if you see this, more info on what makes that lib so fast would be greatly appreciated!

jmillikin · on March 3, 2010

If you need to do any sort of benchmarking for Haskell libraries, I can't recommend Criterion (and Neil Brown's package Progression[1]) enough. It handles all the details for lazy evaluation (NF vs WHNF, pure vs IO, etc) quietly behind the scenes, and uses a well-designed statistics analysis library to calculate the final numbers. Don Stewart has written a short introduction, Modern Benchmarking in Haskell[2].

It's worth mentioning that Criterion can be used for anything that can be called via the FFI -- I'm using it for benchmarking Python libraries, for example.

[1] http://hackage.haskell.org/package/progression

[2] http://donsbot.wordpress.com/2010/02/23/modern-benchmarking-...

BudVVeezer · on March 3, 2010

Pretty light on actual information...

pmjordan · on March 3, 2010

Yeah, I wish he'd explained what actually made it so fast.

johngunderman · on March 3, 2010

You could always check it out yourself: http://hackage.haskell.org/package/attoparsec

Although I'm not sure if he's pushed the most recent changes yet.

pmjordan · on March 3, 2010

My Haskell skills are practically non-existant, certainly not good enough to jump into that project and understand what's going on. I was hoping for a more high level explanation of what he was doing.

wglb · on March 3, 2010

Error establishing a database connection.