A fast HTTP request/response parser for Common Lisp

eudox · on Oct 30, 2014

For context, Eitaro Fukamachi has been working on a whole bunch of things relating to the low-level parts of Common Lisp web development: A fast HTTP parser, a fast URI parser[0], a libuv-based web server[1] (That outperforms Node!), and he's also written Clack[2], which is Common Lisp's Rack/WSGI.

[0]: https://github.com/fukamachi/quri

[1]: https://github.com/fukamachi/woo

[2]: https://github.com/fukamachi/clack

pnathan · on Oct 30, 2014

Fukamachi has done some stellar work. I've used his code and have generally been very happy with it.

rgbrgb · on Oct 30, 2014

> (That outperforms Node!)

In what way? All of the benchmarks I'm seeing on that page indicate otherwise.

eudox · on Oct 30, 2014

When he first posted it, it did, it looks like recent changes have brought that down:

https://github.com/fukamachi/woo/commit/be4a0db688af05ee204e...

fukamachi · on Oct 30, 2014

Right. When I added default headers, the score went way down.

Probably libevent2 is the bottleneck and no way to make it faster without switching its backend (the cl-async's author is working on rewriting it with libuv now).

aidenn0 · on Oct 31, 2014

Out of curiosity did you try basic-binary-ipc? I would expect it would lose out to cl-async, but have never benchmarked the two against each other.

thirsteh · on Oct 31, 2014

It would be quite surprising if it did, since Node's HTTP parser isn't Node/JavaScript, but highly optimized C.

negativeview · on Oct 30, 2014

That's quite interesting! I tried to learn Lisp a while back and found the string handling to be horrible, and it's such a huge part of modern development.

It sounds like there's someone with a lot of skill working to fix exactly this.

Now, do I have time to look into Lisp again...

PuercoPop · on Oct 31, 2014

out of curiosity, what did you found horrible about string handling?

negativeview · on Oct 31, 2014

So I just looked up Common Lisp string handling and none of it looks familiar. It's possible what I was trying to use before wasn't Common Lisp. It's also possible I was trying to learn from a site that was just doing a horrible job of it. It's possible things have gotten much much better since I last tried to learn it.

Whatever the cause, I was under the impression that you essentially had to treat strings as lists of characters and that there really weren't any built-in functions to handle strings as strings. It seems that's wrong at least in modern day.

aerique · on Oct 31, 2014

Strings are sequences and functions that work on sequences work on strings as well. So getting substrings etc. is all possible.

I've had very little problems operating on strings in Common Lisp (all I missed was starts-with and ends-with functions), could you give some concrete examples so that I might help you?

negativeview · on Oct 31, 2014

I can't now because it was years ago I was having problems. I'm starting to think that it was a different Lisp and not Common Lisp I was looking at, too. Too long ago to remember for sure.

PuercoPop · on Oct 31, 2014

Strings are a simple-array of characters and functions that work sequences work on strings (ej. remove). They are complemented with a some string specific functions like string-equal, string-trim, string-upcase, etc. It has been that way for at least 20 years. It might have been that the site was doing a horrible job at explaining it.

bkurtz13 · on Oct 31, 2014

Try Clojure

TheMagicHorsey · on Oct 30, 2014

I've always assumed Clojure was the right choice for a Lisp to do web applications. Perhaps I should revaluate my assumptions.

apgwoz · on Oct 30, 2014

If you're assuming this based on some microbenchmarks, you should probably instead reevaluate your evaluation techniques.

(This isn't meant as a personal jab, but too many people focus on silly things like microbenchmarks when choosing tools, and not on other more important things like support avenues, tooling, libraries, etc.)

aroman · on Oct 30, 2014

I don't think it's fair to assume that TheMagicHorsey was basing his evaluation on the microbenchmarks themselves.

The real story here is that people are working on a number of new libraries/modules/tools for doing web development in CL — the benchmarks (as you correctly point out) are a minor detail.

apgwoz · on Oct 30, 2014

Perhaps you should read more carefully. "If you're assuming ..."

It's just not clear from the context we have. But, caution! Microbenchmarks should only be Microtrusted.

adrianm · on Oct 30, 2014

I think this is truly sagely advice.

My grand takeaway from Aphyr's Jepsen series (http://aphyr.com/tags/jepsen) is that the more sophisticated software becomes in the never-ending quest for features and performance, the more likely the fundamentals underlying it have been overlooked, glossed over, or ignored completely.

eudox · on Oct 30, 2014

When you compare the time fukamachi's spent on these libraries to the time spent on Clack, Caveman, etc., it's pretty clear there's obsession with performance.

As aroman said, "The real story here is that people are working on a number of new libraries/modules/tools for doing web development in CL".

adrianm · on Oct 30, 2014

I didn't mean to imply that the original post was not worthy in anyway whatsoever, in fact I think all of the work he is doing in the CL ecosystem is awesome and I look forward to playing around with it soon in my next CL project.

But apgwoz's general point about how we evaluate software is spot on. In general, I think benchmarks and the like have a sort of "woo"ing effect on people (in the James Randi sense), and we should probably take a few moments to reflect on that before making any major architectural decisions. :)

eudox · on Oct 30, 2014

I agree entirely, but at the same time, I think too often people make social rather than technical decisions. Lots of stars on GitHub and a well-designed website are usually indicative of a good project -- but people put too much faith on CSS, the same way they put too much faith on microbenchmarks taken in laboratory conditions.

PuercoPop · on Oct 31, 2014

This -> people make social rather than technical decisions

Leo Meyerovich did a study on the social factors that affect language adoption[0]. Vivek Haldar has a highlights the salient points[1]. More info at his site: [2]

[0]: http://lmeyerov.github.io/projects/socioplt/papers/oopsla201...

[1]: http://blog.vivekhaldar.com/post/56972066006/empirical-analy...

[2]: http://lmeyerov.github.io/

wes-exp · on Oct 30, 2014

I think too often people make social rather than technical decisions

Indeed. It's sad that as engineers (who theoretically espouse logic, reason, and evidence) we still end up making so many decisions based on herd behavior.

porker · on Oct 31, 2014

I would posit that engineers are human creatures rather than robots, and basing decisions on social factors is a natural part of being human.

For my anecdotal experience, apart from JS libs (where style over substance is frequently the norm) I find a well-designed website is usually a sign the project cares about its users, and is more likely to provide good documentation & a space to ask questions.

pnathan · on Oct 30, 2014

I've generally been happy with my use of Common Lisp in the web environment - it's had a long-standing presence there. I've personally not found deployment the most happy story, but I expect that with the rise of Docker, shipping a Common Lisp image in a Docker image will be a very popular method of deployment.

ska · on Oct 30, 2014

I'm curious, why would you assume that?

The primary benefit and cost, depending on your point of view, of clojure is that it runs on the JVM. I don't see how that would be a strong argument for or against using this tooling for any problem domain unless you were already committed to or against the JVM for other reasons.

PuercoPop · on Oct 31, 2014

Being able to run on the JVM is not a differentiating factor among lisps. Kawa Scheme and ABCL give you access to Java Libraries.

If you ask me the strengths of Clojure that it is built upon abstraction/interfaces (ie. conj instead of cons) and how it manages hygiene while retaining quasiquotes templates leveraging clj's namespaces. But then again you should probably ask a clojurian instead.

_pmf_ · on Oct 31, 2014

An advantage of Clojure over ABCL is the syntactically nicer Java interoperation and the mechanism to avoid reflection by using "warn-on-reflection" and introducing type hints.

147 · on Oct 30, 2014

Can somebody give arguments for using Racket or CL for web dev? Whenever I evaluate Lisps I end up choosing Clojure because to me it seems to be the most practical to develop web apps in.

jwr · on Oct 31, 2014

I don't think you should look for arguments. Just try it. Write something in every one of the environments you are considering and you'll learn more than you ever could from reading online discussions.

As a data point, I moved from Common Lisp to Clojure and never regretted it. But the reason I switched was not because I read something online, but because of Clojure's concurrency support, which means I can create concurrent software that is robust.

thu · on Oct 30, 2014

In the HTTP benchmark game, there is also Bryan O'Sullivan's attoparsec-based parser: http://www.serpentine.com/blog/2014/05/31/attoparsec/ The important point is that the parser is a few dozen lines of code (i.e. it relies on a general parsing library instead of being custom hand-written code).

Animats · on Oct 30, 2014

Is parsing HTTP headers a major bottleneck? HTML parsing, sure, especially with the horrors of HTML 5 error handling per the spec and the need to back up after guessing the character set. But HTTP headers are not that big or complex.

edsiper2 · on Oct 31, 2014

HTTP protocol grammar is pretty simple, but parsing correctly considering the server side and desired performance may be hard.

By nature parsing strings is expensive (not sure why they designed HTTP in string format, binary would save a ton of power in processing). So imaging you are receiving a request, most of the time due to Network latency (client or server side), you don't get the full Request at once, most of the time you need to perform a couple of read(2) operations.

Now, what do you do on each read ?, did you manage full state ?, how many parsing rounds ?, how to catch the end of the protocol block , etc etc etc.

So answering your questions I would say "YES", the HTTP protocol in a Server "may be" a bottleneck if its not well implemented. In our HTTP Server project[0] we are working in a new parser to improve performance, so for real, this is a very important topic.

[0] http://monkey-project.com

best

eudox · on Oct 30, 2014

From the Reddit thread: http://www.reddit.com/r/lisp/comments/2klhpn/quri_a_uri_libr...

>Right. When I was profiling Wookie, I found URL parsing was the third bottleneck (the first is HTTP request parsing, and the second is libevent2).

Animats · on Oct 30, 2014

Yes, that's really important for compatibility with Google Wave.

verytrivial · on Oct 30, 2014

This reminds me of John Fremlin's work back in 2009 http://john.freml.in/teepeedee2-c10k . (Hello, John!)

wspeirs · on Oct 30, 2014

Just curious what about this parser makes it faster? Can anyone give a basic explanation?

easytiger · on Oct 30, 2014

Well if you implemented the c parser with precisely the same algorithms the c one would become faster, so one can assume they have totoally different underlying implementations

haberman · on Oct 30, 2014

That is not necessarily true (and I say this as someone who heavily favors C for parsers). An implementation with a JIT will likely be able to optimize away work by noticing that the table of callbacks is empty.

I would be more impressed if the callback table was not empty (and the callbacks actually did something, like increment an integer) and Lisp was still winning.

EDIT: I see others saying that SBCL is AOT, not JIT. If so I'd be very interested to look at the disassembly side-by-side and see an explanation for why SBCL is winning.

muyuu · on Oct 30, 2014

SBCL is a compiled Common LISP (even in interpreted mode, it compiles every eval-ed lambda before calling it).

On the other hand there is nothing inherently stopping the optimiser from optimising the resulting code as well or better than GCC or any C compiler. Theoretically it could produce optimal machine code.

haberman · on Oct 30, 2014

Correct me if I'm wrong but I was under the impression that Common Lisp is memory-safe (implying guards like bounds-checks), dynamically typed (implying type checks) and garbage collected (implying GC overhead).

muyuu · on Oct 30, 2014

Yep but all these checks don't necessarily need to happen at runtime. A clever enough compiler can optimise all of it (or most of it) at compile time. In general this is a very hard problem but for something static enough (like a simple parser) the compiler can produce code and determine its safety without needing to carry checks on to the runtime.

Note that C code or machine code would also have to perform input checks at runtime when doing http header parsing. A Common Lisp compiler wouldn't theoretically need to produce code performing any more runtime checks than these.

_delirium · on Oct 30, 2014

You can also manually turn some checks off. In some Lisp systems, declaring an optimization setting of (safety 0) will turn off a bunch of checks, even ones that the compiler can't statically determine are safe to skip. Of course this should be used very carefully if at all (especially in code that's parsing arbitrary data from the internet!).

eudox · on Oct 30, 2014

SBCL (And other compilers) implement a type inference engine to reduce type checks, and you can also provide your own type declarations, i.e. gradual typing before it had that name.

samth · on Oct 30, 2014

CL type declarations are cool, but they aren't gradual typing. It doesn't either reject your program if you get them wrong, or dynamically enforce that everyone else lives up to them.

hajile · on Oct 31, 2014

That depends on the optimization levels you select. At it's highest optimization level, there is no type checking and you'd better have your code right.

http://www.sbcl.org/manual/#Declarations-as-Assertions

4.2.1 Declarations as Assertions

The SBCL compiler treats type declarations differently from most other Lisp compilers. Under default compilation policy the compiler doesn’t blindly believe type declarations, but considers them assertions about the program that should be checked: all type declarations that have not been proven to always hold are asserted at runtime.

Remaining bugs in the compiler’s handling of types unfortunately provide some exceptions to this rule, see Implementation Limitations.

CLOS slot types form a notable exception. Types declared using the :type slot option in defclass are asserted if and only if the class was defined in safe code and the slot access location is in safe code as well. This laxness does not pose any internal consistency issues, as the CLOS slot types are not available for the type inferencer, nor do CLOS slot types provide any efficiency benefits.

There are three type checking policies available in SBCL, selectable via optimize declarations.

Full Type Checks

All declarations are considered assertions to be checked at runtime, and all type checks are precise. The default compilation policy provides full type checks.

Used when (or (>= safety 2) (>= safety speed 1)).

Weak Type Checks

Declared types may be simplified into faster to check supertypes: for example, (or (integer -17 -7) (integer 7 17)) is simplified into (integer -17 17).

Note: it is relatively easy to corrupt the heap when weak type checks are used if the program contains type-errors.

Used when (and (< safety 2) (< safety speed))

No Type Checks

All declarations are believed without assertions. Also disables argument count and array bounds checking.

Note: any type errors in code where type checks are not performed are liable to corrupt the heap.

Used when (= safety 0).

aidenn0 · on Oct 31, 2014

I'm not familiar with gradual typing, but, SBCL with any safety level above 0 does both of those things. This isn't specified by the standard, of course.

muyuu · on Oct 30, 2014

Not necessarily true, although C compilers are usually very good.

PuercoPop · on Oct 31, 2014

Why do you think that? Is it because most languages compile to C? That is not the case with most CL implementations. (ej. CL & CCL).

Although I agree that it is generally the case, you can read here[0] for an example where CL would gain an advantage over C compilers. Albeit in an implementation specific way.

[0]: http://www.pvk.ca/Blog/2014/08/16/how-to-define-new-intrinsi...

patrickmay · on Oct 30, 2014

Possibly, but you might not be able to do that without implementing enough of Lisp to allow the algorithms to be implemented, as per Greenspan's Tenth: http://c2.com/cgi/wiki?GreenspunsTenthRuleOfProgramming

halayli · on Oct 30, 2014

parsed = http_parser_execute(parser, &settings_null, buf, strlen(buf));

assert(parsed == strlen(buf));

He's calling strlen(buf) twice here. There's no guarantee that this is optimized at compile time. Usually len of buf is determined from read/recv system calls and for a fair comparison strlen should be removed here.

fukamachi · on Oct 30, 2014

No difference even though I ran the benchmark again after removing the line.

halayli · on Oct 30, 2014

Interesting. I was expecting at least cpu branch improvements given the lack of assert.

halayli · on Oct 30, 2014

Also use gettimeofday() instead of clock(), because you don't know how far into the second you are at the time it's called.

jeffreyrogers · on Oct 31, 2014

Any good compiler will optimize that out. (Actually, even a bad compiler will optimize that out... it's a really easy optimization).

Ono-Sendai · on Oct 30, 2014

assert only runs in debug mode anyway.

haberman · on Oct 30, 2014

It runs unless NDEBUG is defined, which the author did not do according to his methodology. Anyway I looked at the assembly output and strlen() is optimized away.

webreac · on Oct 30, 2014

I really do not like when bad habits are encouraged by the optimizer.

halayli · on Oct 30, 2014

no, that's not correct.

http://pubs.opengroup.org/onlinepubs/009695399/functions/ass...

aidenn0 · on Oct 31, 2014

"Forcing a definition of the name NDEBUG, either from the compiler command line or with the preprocessor control statement #define NDEBUG ahead of the #include <assert.h> statement, shall stop assertions from being compiled into the program"

serbrech · on Oct 31, 2014

Hmm, I can't really write lisp, but, how maintainable is this?? https://github.com/fukamachi/fast-http/blob/master/src/parse...

adwf · on Oct 31, 2014

It could do with some comments and doc-strings certainly, but I'd give it a pass for now as it's a new project. As for the code, I imagine that in any language, highly optimised programs can start looking pretty odd after a while. Try understanding "grep" for example:

http://git.savannah.gnu.org/cgit/grep.git/tree/src/grep.c

As a guy not particularly proficient in C, that looks crazy to me. Although it does at least have decent comments, which I'd expect for such a mature project!

haberman · on Oct 30, 2014

I was not able to run the benchmark for Lisp. What Lisp implementation is expected? I tried CLISP and got:

    *** - READ from #<INPUT BUFFERED FILE-STREAM CHARACTER #P"benchmark.lisp" @1>: there is no package with name "SYNTAX"

eudox · on Oct 30, 2014

I'd recommend SBCL, since a high-performance JIT compiler is likely to give more representative results than a bytecode VM like CLISP. Plus, I'd expect some minor portability errors to arise when you push the envelope like this.

By the way, your error seems to be that the cl-syntax library is not being found, as opposed to a portability problem, so maybe you don't have Quicklisp installed and loaded?

gjm11 · on Oct 30, 2014

Pedantic correction: Unless something's changed very radically since I last looked, SBCL's compiler is AOT not JIT. (Though in interactive use it compiles things as you enter them, which I suppose is kinda JIT if you look at it from just the right angle and squint.)

eudox · on Oct 30, 2014

You're right, and I always make this mistake for the reason that AOT/JIT look the same from the outside.

haberman · on Oct 30, 2014

> By the way, your error seems to be that the cl-syntax library is not being found, as opposed to a portability problem, so maybe you don't have Quicklisp installed and loaded?

Indeed I don't, I didn't know to install it since it wasn't listed as a dependency. Thanks for the info!

eudox · on Oct 30, 2014

Quicklisp is CL's package manager, so that's necessary to install dependencies. You can install it with:

    $ curl -O http://beta.quicklisp.org/quicklisp.lisp
    $ sbcl --load quicklisp.lisp \
           --eval '(quicklisp-quickstart:install)' \
           --eval '(ql:add-to-init-file)' \
           --quit

_RPM · on Oct 30, 2014

Is Lisp a write-only language?

adwf · on Oct 30, 2014

I'll assume this is an honest question - not a troll (and hence don't deserve the downvotes), but no, Lisp isn't write-only.

It's an immensely flexible language that lets you code in a variety of different styles, so you can end up with code spaghetti. But that doesn't mean that you have to, or even that it's the most likely thing to happen.

If you compare it to idiomatic Perl, where a lot of the community seem to get obsessed with the famous one-liners, Lisp seems a lot better to me. And if you follow a decent style guide like:

https://google-styleguide.googlecode.com/svn/trunk/lispguide...

Then there's no particular reason why you should end up with disaster code anymore than in any another language. I'd rather read a single page of Lisp, than the equivalent numerous pages of Java or some other verbose language.

wnewman · on Oct 31, 2014

The old book _Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp_ contains a number of nontrivial programs (like an Othello player and a little symbolic math program). AFAIK, most people think the code is unusually clear, both a good example of how to write clear CL programs and a good example of clear programming period (independent of language). If you are seriously interested in clarity of CL programs, I recommend taking a look at that book.

aerique · on Oct 31, 2014

Coming from Common Lisp and having tried out Haskell recently I can ask the same about the latter while I find Lisp to be one of the most readable languages out there.

It's basically what you're used to and for Lisp it does not take long to get used to.