Palindromes (Clojure vs. Common Lisp)

_ivvf · on Dec 27, 2009

He's using vectors with clojure and lists with lisp. Why? It looks like he's concatenating vectors, which is O(n), with clojure, but cons'ing lists, which is O(1), with lisp. As much as I like seeing clojure's performance get critiqued, this comparison doesn't seem fair at all. Why aren't the implementations identical?

KirinDave · on Dec 27, 2009

You need to remember that clojure vectors are NOTHING like arrays. They're persistent ultra-wide tree structures. I was curious if you were right and vectors were the wrong choice, so I profiled the code.

I had the big-test function run several times to actually show the real numbers (you need to prime the JVM to get a real indication of its speed):

  ~/Projects/palindromes % 〕bin/run-tests 

  1000 X ' amanaplanacanalpanama ' 
  naive : (skipped)
  fast  : "Elapsed time: 449.397 msecs"
    "Elapsed time: 182.432 msecs"
    "Elapsed time: 149.721 msecs"
    "Elapsed time: 162.804 msecs"
    "Elapsed time: 159.086 msecs"

http://gist.github.com/264369

     CPU SAMPLES BEGIN (total = 205) Sun Dec 27 10:52:27 2009
     rank   self  accum   count trace method
        1  1.46%  1.46%       3 301336 clojure.template__init.load
        2  1.46%  2.93%       3 300339 java.lang.Class.forName0
        3  0.98%  3.90%       2 301445 clojure.lang.PersistentHashMap.containsKey
        4  0.98%  4.88%       2 301456 clojure.lang.RestFn.invoke
        5  0.98%  5.85%       2 301486 clojure.lang.PersistentVector$TransientVector.editableTail
        6  0.98%  6.83%       2 301487 clojure.lang.PersistentVector$TransientVector.conj
        7  0.98%  7.80%       2 301465 clojure.lang.PersistentVector.chunkedSeq
        8  0.98%  8.78%       2 301489 palindromes$ext_centers__96$fn__101.invoke
        9  0.98%  9.76%       2 301498 clojure.lang.PersistentList.first
        10   0.98% 10.73%       2 301482 clojure.lang.PersistentVector$TransientVector.conj

I do see a lot of vector activity, but no smoking gun. As other people have pointed out in this thread, SBCL is notorious for being very good at recursing down big data structures. It's also notorious for being totally terrible with lots of other common tasks. I suspect this is just a case where the algorithm runs in such a way that languages that compile down all the way have a big advantage.

I've asked some more experienced hands in #clojure on freenode to take a stab at it. There are a lot of recent optimization developments in the clojure-1.1.0 release that I don't know how to use well yet. Hopefully someone there can tackle it.

_ivvf · on Dec 27, 2009

Vectors may be implemented very differently from mutable arrays, but it is my understanding that they allow for efficient appends and updates, not efficient prepends. I would still switch to using lists, if only because when doing performance comparisons between languages and implementations you want the code and data structures to be as identical as possible.

jgrant27 · on Dec 27, 2009

Using lists instead of vectors brings the Clojure version down to around 10X the speed of the CL code.

  user=> (#'i27.palindromes/big-test)

  1000 X ' amanaplanacanalpanama ' 
  fast  : "Elapsed time: 183.748 msecs"

scythe · on Dec 27, 2009

SBCL is very impressive. Until recently[1], it was far and away the quickest implementation for a dynamic language. Considering we have yet to see invokedynamic or tail calls on the JVM, I'm rather unsurprised by SBCL's comfortable lead in performance.

[1] LuaJIT 2

lispm · on Dec 27, 2009

There are other Lisp compilers which are about as fast as SBCL. For example on my Mac, the 64bit LispWorks is for many tasks faster than SBCL. SBCL is faster for some numerics, but 'typical' Lisp programs are running faster in the 64bit LispWorks. Then there is Scieneer CL, which is also forked from CMUCL, like SBCL.

There are many more compilers for dynamic languages that people usually don't know. I would be careful to claim 'fastest', given that only few advanced compilers in that area are known to people.

jrockway · on Dec 27, 2009

I think SBCL's speed is orthogonal to invokedynamic and tail call elimination. Those features would not speed this benchmark up very much.

jules · on Dec 27, 2009

> invokedynamic or tail calls

Neither would improve Clojure's performance.

oconnor0 · on Dec 27, 2009

Tail calls would improve Clojure performance since you wouldn't have to trampoline.

Edit: Not that they would, necessarily, improve the speed of this benchmark that much.

pvg · on Dec 27, 2009

It seems easy to hit some implementation wart that makes performance abysmal - it's hard to believe that a JVM targeted language would be 20x times slower. Case in point - I tried the benchmark with a bigger input - the performance ratio between the 'fast' implementations remained the same. Incredibly, though, the CL implementation took nearly 2 minutes to construct the input string, something that is essentially instant in the Clojure version.

  time sbcl --noinform --load palindromes.lisp --eval "(progn (big-test) (quit))"

  30000 X ' amanaplanacanalpanama ' 
 
  fast : 
  Evaluation took:
  0.197 seconds of real time
  0.191206 seconds of total run time (0.186952 user, 0.004254 system)
  [ Run times consist of 0.006 seconds GC time, and 0.186 seconds non-GC time. ]
  96.95% CPU
  471,342,768 processor cycles
  13,802,832 bytes consed
  

  real	1m55.551s
  user	1m51.954s
  sys	0m0.778s

lispm · on Dec 27, 2009

Because the CL code to create the string is completely inefficient. It starts that he prints a string with format, using a format directive. Instead of (FORMAT s "~a" string) use (write-string string s). Also, if the result string length is known, one can just allocate a result string and copy the source string repeatedly into the result string...

pvg · on Dec 27, 2009

It's clear that it's inefficient and I'm no expert in either CL or FORMAT but it's not just inefficient, it's staggeringly inefficient. 2 minutes to make a few hundred k string on a 2+ghz CPU is hard to fathom, even if one assumes the method is imperfect, generates piles of garbage, etc, etc. It'd be interesting if someone could explain how this actually happens.

lispm · on Dec 27, 2009

I'd say it's a performance bug. Replacing (format stream "~a" string) with (format stream string) gets rid of the problem.

LispWorks has no performance problem with both forms, though.

The reason is the SBCL pretty printer. If

    *pretty-print*

 is NIL, then both forms run fast in SBCL...

pvg · on Dec 27, 2009

Interesting. For the massive hit, that better be the prettiest printing pretty printer in the galaxy. OpenMCL didn't choke on it either.

Edit: Well, the last bit is a lie, it did the moment i set

  *PRINT-PRETTY*.

lispm · on Dec 27, 2009

Even with

    *print-pretty*

set to T, LispWorks runs fine.

imd · on Dec 27, 2009

I don't know much about optimization, but...it looks like he's timing the Clojure version with Clojure already loaded, but timing the SBCL version with SBCL not loaded. Would that change the measurements much?

ced · on Dec 27, 2009

Nah, that's not what's happening. The (time ...) special form is what prints the timing results, and it's squarely in the (big-test) function. All is right under the sun.

astine · on Dec 27, 2009

Some context might help... which implementation of Common Lisp? SBCL?

pvg · on Dec 27, 2009

From the article -

"triton:palindromes jgrant$ sbcl --noinform --load [...]"

astine · on Dec 27, 2009

Thanks

rbanffy · on Dec 27, 2009

Would the SBCL be able to spread across the many cores a common desktop has today? How is threading implemented in CL?

varjag · on Dec 27, 2009

If I recall correctly, as in many other Lisps, it's a derivative of MP facility dating back to Lisp machines, and it runs over native threads.

rbanffy · on Dec 28, 2009

I am looking into the cl-cookbook at sourceforge. It looks very interesting and worth investigating more. This specific benchmark seemingly would benefit from multiple threads on multiprocessors, if I could read its implementations correctly.

mahmud · on Dec 29, 2009

cl-cookbook is mostly outdated. use bordeaux-threads; it works across all lisps :-) and it's multicore for sbcl linux and clozure everything else (dunno about the commercial lisps)