High-Performance Web Applications in Haskell

bobfunk · on April 6, 2011

Can attest to Haskell and Snap being excellent for small web services where you need a boost - especially when concurrency is involved.

Wrote two small services for http://www.webpop.com in Haskell, both based on Snap. One for dynamically resizing images on the fly and one for handling uploads of large files and streaming them straight to Cloudfiles.

In both cases I started by trying node.js (both cases deals with either fetching or sending files from Cloudfiles so IO operations is what takes time) since the problems seemed very suited for node's eventloop model. In both cases node.js really disappointed.

Ended up giving Snap a try and got really surprised at just how well it worked. Performance is amazing, it's not much more code than the node.js version and theres no endless nesting of callbacks. It handles high levels of concurrency incredibly well and both services ended up using around 8MB of memory each - even under load...

Main problem is the lack of libraries that you tend to take for granted in more webby languages, and obviously the learning curve is pretty steep (but also very rewarding).

pjscott · on April 6, 2011

I had a good experience writing a little web service in Haskell a couple days ago, an autocomplete server. You give it a prefix, and it returns some number of strings in its database starting with that prefix. I wanted it to be fast, and lighter on memory than the approach some people have been doing with Redis, where they store all prefixes in a sorted set:

http://antirez.com/post/autocomplete-with-redis.html

Sounds like a fun toy project, right? I figured I would store everything in a trie, so I could do really fast prefix searches, and then bolt on a REST interface that would send back JSON or JSONP. So I wrote some C code to handle the prefix searching in a JudySL trie (see http://judy.sf.net/ for details; it's a nice library) and bolted that onto Haskell and wrote some bridge code. What I ended up with was a really nice, easy server that took me only a few hours to write, and was every bit as memory-efficient as I'd hoped. Source code here, if anybody's curious:

https://github.com/PeterScott/acdaemon

The learning curve on Haskell, though, is really something. If you really want to see something ridiculous, try looking at the auto-generated library documentation for anything involving regular expressions. The type declarations in Text.Regex.TDFA are like something H. P. Lovecraft might write about.

mjcmirr · on April 6, 2011

It's a fair point about Haskell's learning curve. Haskell is different enough from the languages most people have grown up using that--at least in my experience--there are times that simply understanding a few lines of code can take minutes.

On the other hand, that fact has always seemed to me a mark of why it's worthwhile to persevere: as when you are learning mathematics, things can seem entirely opaque until they 'click', but then you've really made an advance in your understanding.

Probably most people are aware of it, but the O'Reilly book _Real World Haskell_ by Bryan O'Sullivan, Don Stewart, and John Goerzen, is a great way to get started. The book is on-line at

http://book.realworldhaskell.org/

Chapter 8 discusses (one of) the Haskell regexp implementation(s), and in particular addresses one of the puzzling aspects: polymorphism in return type. The whole book is excellent.

(I should also point out that another book _Learn You A Haskell_, has just been published, and is well-regarded. I think it assumes rather less programming experience than RWH, and also covers less of the "real-world" aspects. It also can be read on-line, at

http://learnyouahaskell.com/

and in fact there is a coupon code on that page now for 40% off.)

Also, the Haskell community is, in my experience, extremely helpful and welcoming. People should come and join!

pjscott · on April 6, 2011

The place where I've seen Haskell really shine is parsing. Parsers are remarkably easy to write in Haskell, thanks to Parsec and its various spinoff libraries. For example, here's a minimal HTTP request parser that Bryan O'Sullivan wrote by essentially translating part of the RFC into Haskell:

https://bitbucket.org/bos/attoparsec/src/tip/examples/RFC261...

It's 54 lines long, pretty trivial, and here's a really cool part: it parses incrementally on strict bytestrings, so you can just do raw socket recv calls and pass the chunks of input to the parser. Similarly, I wrote a parser for the subset of YAML that beanstalkd uses (as part of the hbeanstalk client library), and it was just 11 lines of code. It's very impressive.

(The concurrency and parallelism stuff is also notably slick, but I haven't had as much occasion to use it. Haskell's aversion to side-effects really pays off sometimes.)

By the way, you're right about the Haskell community being friendly and welcoming -- as you yourself demonstrate. :-)

alavrik · on April 6, 2011

The HTTP request parser doesn't look readable or trivial to me at all.

I'm not familiar with Haskell, so to me the code looks like a mix of high-level declarative and low-level specialized constructs (e.g. skipWhile, takeWhile) interleaved with syntax noise. And it seems to be using quite a lot of external libraries. Also, correspondence of the code with the HTTP spec is completely non-obvious.

In contrast, here's an HTTP response parser I wrote in OCaml using just one library: https://github.com/alavrik/piqi/blob/master/piqi-tools/piqi_... (see lines 26 - 218).

I've just noticed that the code you are referring to is an example. Well, looking at the example, I can hardly come to a conclusion that Haskell shines for parsing.

pjscott · on April 6, 2011

If you don't know Haskell, and you've never used attoparsec, then I'm not surprised that you don't find that code to be particularly clear or readable. Haskell has a steep learning curve, as I said earlier. This is not a very damning criticism of Haskell's suitability for writing parsing code.

By the way, that long import list is actually just importing some basic stuff from the standard library, and the attoparsec parsing library. One of the persistent minor annoyances of Haskell is writing long lists of module imports.

alavrik · on April 6, 2011

Fair enough. I didn't mean to criticize Haskell. In fact, I'm planning to learn it.

I was just surprised that this code was presented as a good example of Haskell's fit to the parsers domain. It would be interesting to see if my perception of this code changes once I get more familiar with the language.

pjscott · on April 6, 2011

Ah, okay, I misunderstood you. Yes, I think your perception of the code will change. In particular, the operators from Control.Applicative (the ones that look especially like line noise) are critical to making heads or tails of it. When I posted that link, I momentarily forgot that they're not obvious. They make sense after you've used them for a little while, though.

jamii · on April 6, 2011

I write a lot of ocaml code (for money, even!) and haven't touched haskell for about four years. I still find the haskell version here easier to read. I imagine what's tripping you up is the operator soup. Whilst ugly (one of the things that turned me off haskell) it is a lot easier to read once you are familiar with basic haskell typeclasses (applicative, functor etc). That said, the ocaml version could definitely be a lot nicer. Check out eg

Yojson (search for 'let positive_int') http://forge.ocamlcore.org/scm/browser.php?group_id=153

Mirage (using MPL for zero copy parsing) https://github.com/avsm/mirage/blob/master/lib/net/direct/mp...

alavrik · on April 8, 2011

Hi there! I do quite a bit of both OCaml and Erlang.

In fact, I borrowed Yojson's json parser for my project (piqi.org). Also, I'm familiar with MPL. It looks very nice, but from what I heard it is fairly immature. And you definitely can't parse HTTP this way :)

T-R · on April 6, 2011

I get the impression that the Haskell code was meant to be a direct transliteration of the spec, or had some other artificial restriction on the amount of abstraction. Code for "0 or more of any character" is repeated at least 3 times, as with the use of applicative functor operators to ignore separators (a lot of what looks like syntax noise). Things like numbers (takeWhile isDigit_w8) aren't abstracted, either. There's no real reason that these couldn't be abstracted away. Also, some of the verbosity (and most of the imports) come from using Strict 8bit ByteStrings instead of a type with more built-in abstractions. That said, I don't think you can be blamed for your impression.

tibbe · on April 7, 2011

It depends on what you compare to. Bryan aimed for speed got to 56% of the speed of the C parser he used as a benchmark, using 54 lines of code. The C parser is a 1,672 lines hand-rolled parser that's only does HTTP, while attoparsec is a general purpose library. Bryan wrote about it here: http://www.serpentine.com/blog/2010/03/03/whats-in-a-parser-...

alavrik · on April 8, 2011

That's definitely an interesting perspective! Thanks for pointing it out.

gtani · on April 6, 2011

shd point out therre's lots of other resources e.g. Graham book, school of Expression, etc (and for all the folks that have been doing scala, ocaml, erlang, F#, some of te concepts will be familiar

http://stackoverflow.com/questions/16918/beginners-guide-to-...

http://stackoverflow.com/questions/1012573/how-to-learn-hask...

jimmyjazz14 · on April 6, 2011

Haskell takes the complex and tames it with a wonderful type system and sensible abstractions; personally I have never been more productive using any other language.

acconrad · on April 6, 2011

Could someone comment on Haskell in comparison to Erlang, Clojure or Scala? I felt like when I started choosing between Ruby on Rails and Python/Django, they were so similar that choosing either one would be a fine choice. Now that I'm interested in functional languages too, it seems that comparing Haskell to Erlang is just not the same as Python to Ruby.

jimmyjazz14 · on April 6, 2011

The main thing that sets Haskell apart is its type systems, there is none better. In addition to the strong type system Haskell is purely functional which means the compiler can reason pretty strongly about an applications side-effects. I'm not sure Erlang or Scala can really compare on either of these things (though I am not an expert on either of those languages). The last big thing I can think of is that Haskell is "lazy" which means it only evaluates values as they are needed, this aspect has its upsides but also many downsides. Laziness can help one write clear concise code but you must be aware of it or things can get a little sticky.

jamwt · on April 6, 2011

They're so different and yet so similar it's probably best taking them one topic at a time. I've written nontrival things in all of those except clojure, but I think I still have a pretty good idea what it's about from a bit of playing around, reading the notes of others, time spent in PLT Scheme etc. I hope this is helpful for some people looking to dive in--remember, all of this "IMO/IME".

Type Systems:

Haskell and Scala are closest in that they're strongly, statically typed with Hindley-Milner type inference. Erlang is dynamically typed as is clojure. This tends to break the advantage/disadvantage scheme along the same lines as Java/Ruby wrt typing: the former languages generally give you more speed and safety, and the latter languages can do cool tricks with deciding typing at runtime, which can aid in expressiveness.

Type System Complexity/Power:

Haskell has the most sophisticated type system--by far, I think it's fair to say. This adds power but at the cost of a steeper learning curve for people unused to thinking so deeply about the contracts/promises code and data structures are making. Scala's is also fairly involved since it's basically a subset of Haskell's (well, plus traits and inheritance blah)--but not quite as consistent or clear. Clojure and Erlang have somewhat simpler type system that may be more tractable out of the gate--Erlang in particular is pretty straightforward.

"functional-ness", how sharp a break from OOP/imperative is imposed:

Scala is by far the least dogmatic about doing thing in a "functional" as opposed to an OOP or imperative way. Scala's heavy emphasis on seamless interop with Java makes it a sort of up to the programmer to place their style on the continuum between something like Java and something like ML. Haskell, with its purity-by-default, is probably the most opinionated of the group. You will be forced to approach problems very differently. I'd say clojure follows just behind, also having a strong bias toward doing functional-only styles. Erlang is also pretty interested in doing things in a particular way that could be construed as functional (not rebinding names, focus on immutability, for example), but its "big idea" is more tailored around its flagship library and virtual machine.

Syntax:

Erlang will probably feel the most foreign to a 2011 programmer. Erlang started as a modified Prolog, and the syntactic legacy of Prolog is not widespread in common contemporary languages. Haskell's syntax is rather clean and restricted--though the fondness of veteran Haskellers for custom operators can sometimes be overwhelming for a new programmer. Scala, while a hybrid OOP language that shares many idioms with Java etc, has so heavily dipped into the operator well, even for standard language constructs--the syntax can be a bit crowded. Imagine Java with twice the non-alphanumeric density. Clojure is a LISP, so it's very clean and clear syntactically, provided you are a parenthesis master or make use of SLIME, etc.

Speed (average job, single core, not scalability etc etc):

Haskell, Scala, Clojure, Erlang--descending. Though on certain workloads these players will switch places, as a general rule of thumb that's my aggregated observed performance order in "real world-ish" code scenarios. Clojure is sorta interpolated based on experiences others have relayed to me and online. From a memory standpoint, Haskell is substantially stingier than the rest. This impacts startup times and the practicality of shell tools, etc.

Compilation/VM + Concurrency:

Haskell is alone in this group in that it compiles to native code--well, at least, the most commonly-used configuration does: the compiler ghc. GHC is a beauty of a project. Very good native code generation, aggressive improvement projects (new I/O manager, LLVM backend) incredibly advanced concurrency capabilities (sparks, first class green threads, seamless async IO). Erlang uses a VM, beam, which also uses green threads (which erlang calls processes) and transparent async I/O--but, erlang goes even further to provide syntactic support for super-efficient message-passing between these processes, and facilities to transparently pass those messages between machines in a cluster. Clojure and Scala both use the JVM. It is what it is--mature, fast, but primarily designed for Java and things a lot like Java. Cool features like tail call optimization and green threads are simply not possible b/c the JVM does not (currently) support them. (I know Scala has Actors, Akka, etc, but JVM-based actors are not in the same league as what Erlang and Haskell have.)

All of these languages have a REPL, thank god.

Breadth of Libraries:

The JVM languages here obviously have a HUGE wealth of libraries available. The only concession is calling them can be somewhat awkward and "downgrade" you out of the functional idioms you'd rather be using, depending on the library. Haskell's library situation is pretty good (Hackage has libraries for most everything you need and is growing quickly every day), and Erlang is probably bringing up the rear, here. OTP is excellent, and there are many libraries related to large networked systems etc, but not as much "general" coverage.

Key Strength:

If you want to write high-uptime distributed systems, Erlang/OTP is the bees' knees. Everything about the language and runtime is tailored to that environment--supervisor trees, mnesia, good scheduling of hundreds of thousands of lightweight processes.

Haskell is great at correctness. Like its non-lazy cousin ML, if you satisfy the type system, a shockingly high percentage of the time your program is just flat out correct the first time you run it. Haskell is also pretty darn good at single-machine multicore concurrency.

Clojure is a great "modern Lisp". Many of the concerns, fairly or unfairly, leveled at Lisps in the past (no libraries, stagnant platform, closed development, etc) are non issues. Building on the JVM enables a very small developer core to make progress on the language while leveraging a mature platform that evolves independently and at scale.

Scala is a "better Java". While iterop with large Java-oriented frameworks is possible in Clojure, it's pretty much trivial in Scala. You can dip a toe incrementally into the functional world while keeping retaining your existing projects, favorite libraries, 3rd party frameworks, etc--transitioning as fast and far as appropriate.

Summary:

The good news is, all of these languages are pretty damn good, and all of them actually have healthy, thriving developer communities and razor-sharp maintainers.

If someone has never done functional programming before, I'd personally recommend trying something like clojure or Racket first to get the fundamentals down before digging into something truly mind-bending like Haskell. Scala is probably not a big enough leap to discipline yourself to grok functional just b/c it's so easy to relapse into non-functional patterns and Scala will happily comply.

technomancy · on April 7, 2011

Good overview; a couple nitpicks:

Technically Scala's inference is not Hindley-Milner since you still have to declare method argument types for JVM interop.

The speed difference between Scala and Clojure has less to do with the static/dynamic gap than the fact that Clojure relies upon reflection by default. However, once you identify your bottlenecks, adding type hints to remove reflection is pretty easy.

Good stuff.

Arrgh · on April 8, 2011

A minor nitpick-nitpick: it's not so much that a JVM requirement prevents Scala from using H-M type inference, it's that H-M doesn't work very well with subtyping, which Scala supports for its own O-O reasons. This comment from Martin provides some context: http://www.codecommit.com/blog/scala/is-scala-not-functional...

mahmud · on April 6, 2011

Erlang is the shitzness! Not weird at all, very palatable, and thankfully, different enough to be a new experience.

jiry · on April 9, 2011

> Haskell and Scala are closest in that they're strongly, statically typed with Hindley-Milner type inference > Haskell has the most sophisticated type system--by far, I think it's fair to say

HM is a type system first, and that means is a logic system. HM is a logic that is deliberately kept relatively weak in order to allow full inference. Scala is not HM and never has been - it's much, much more sophisticated and so Scala uses a form of type inference not based on HM. Haskell '98 is HM plus a tiny bit (and yes that tiny bit does mean that the occasional type annotation is required). GHC Haskell is HM plus a ton more and so there are a lot more GHC Haskell programs that need some type annotation. The people behind GHC Haskell have done a good job of carefully adding type system features that don't totally break the model, though, so you rarely have to annotate much unless you love playing in advanced type land.

I would hesitate before saying GHC Haskell is more sophisticated "by far". It's not at all obvious that that is so. I'd want a type theorist to spend some time with the two core logics to determine which one subsumes what parts of the other before making that call.

chalst · on April 6, 2011

For correctness, both Haskell and Erlang have Quick Check, which is simply fantastic. Quick Check is somewhat easier to use with stateless code, which favours Haskell over Erlang.

http://www.cse.chalmers.se/~rjmh/QuickCheck/

gtani · on April 6, 2011

For more correctness, scala has had interop hiccups

http://lampsvn.epfl.ch/trac/scala/ticket/2991

http://stackoverflow.com/questions/3313929/how-do-i-disambig...

(and does Orig. Commenter have examples of clojure hiccups? Interop is very important to both (and F#). Maybe a more meaningful compare/contrast is haskell, scala and F# (which I can't do)

http://matt.might.net/articles/best-programming-languages/

---------------

http://james-iry.blogspot.com/2010/05/types-la-chart.html

(his league table of "Most Powerful Type Systems": Agda, epigram, scala, haskell

seancribbs · on April 8, 2011

On the contrary, QuickCheck is brilliant at testing stateful systems, as long as you can model the actions that occur and how they modify state (statem in eqc helps here).

eru · on April 6, 2011

Thanks. Please consider putting this comparison on a blog or similar, it deserves a wider audience.

mononcqc · on April 8, 2011

Just adding to the list that Erlang has optional type specifications, used only for analysis. http://www.erlang.org/doc/reference_manual/typespec.html

astrofinch · on April 6, 2011

Hm, this isn't such a good recommendation.

http://snapframework.com/

(Right now there's an internal server error.)

Argorak · on April 6, 2011

Maybe you sent it the wrong type of request :).

how_gauche · on April 6, 2011

A system upgrade borked the server, sorry (i.e. libev got upgraded from libev.so.3 -> .4 and I had to relink the server).