Hacker News new | past | comments | ask | show | jobs | submit login
Palindromes (Clojure vs. Common Lisp) (imagine27.com)
40 points by jgrant27 on Dec 27, 2009 | hide | past | favorite | 24 comments



He's using vectors with clojure and lists with lisp. Why? It looks like he's concatenating vectors, which is O(n), with clojure, but cons'ing lists, which is O(1), with lisp. As much as I like seeing clojure's performance get critiqued, this comparison doesn't seem fair at all. Why aren't the implementations identical?


You need to remember that clojure vectors are NOTHING like arrays. They're persistent ultra-wide tree structures. I was curious if you were right and vectors were the wrong choice, so I profiled the code.

I had the big-test function run several times to actually show the real numbers (you need to prime the JVM to get a real indication of its speed):

  ~/Projects/palindromes % 〕bin/run-tests 

  1000 X ' amanaplanacanalpanama ' 
  naive : (skipped)
  fast  : "Elapsed time: 449.397 msecs"
    "Elapsed time: 182.432 msecs"
    "Elapsed time: 149.721 msecs"
    "Elapsed time: 162.804 msecs"
    "Elapsed time: 159.086 msecs"

http://gist.github.com/264369

     CPU SAMPLES BEGIN (total = 205) Sun Dec 27 10:52:27 2009
     rank   self  accum   count trace method
        1  1.46%  1.46%       3 301336 clojure.template__init.load
        2  1.46%  2.93%       3 300339 java.lang.Class.forName0
        3  0.98%  3.90%       2 301445 clojure.lang.PersistentHashMap.containsKey
        4  0.98%  4.88%       2 301456 clojure.lang.RestFn.invoke
        5  0.98%  5.85%       2 301486 clojure.lang.PersistentVector$TransientVector.editableTail
        6  0.98%  6.83%       2 301487 clojure.lang.PersistentVector$TransientVector.conj
        7  0.98%  7.80%       2 301465 clojure.lang.PersistentVector.chunkedSeq
        8  0.98%  8.78%       2 301489 palindromes$ext_centers__96$fn__101.invoke
        9  0.98%  9.76%       2 301498 clojure.lang.PersistentList.first
        10   0.98% 10.73%       2 301482 clojure.lang.PersistentVector$TransientVector.conj
I do see a lot of vector activity, but no smoking gun. As other people have pointed out in this thread, SBCL is notorious for being very good at recursing down big data structures. It's also notorious for being totally terrible with lots of other common tasks. I suspect this is just a case where the algorithm runs in such a way that languages that compile down all the way have a big advantage.

I've asked some more experienced hands in #clojure on freenode to take a stab at it. There are a lot of recent optimization developments in the clojure-1.1.0 release that I don't know how to use well yet. Hopefully someone there can tackle it.


Vectors may be implemented very differently from mutable arrays, but it is my understanding that they allow for efficient appends and updates, not efficient prepends. I would still switch to using lists, if only because when doing performance comparisons between languages and implementations you want the code and data structures to be as identical as possible.


Using lists instead of vectors brings the Clojure version down to around 10X the speed of the CL code.

  user=> (#'i27.palindromes/big-test)

  1000 X ' amanaplanacanalpanama ' 
  fast  : "Elapsed time: 183.748 msecs"


SBCL is very impressive. Until recently[1], it was far and away the quickest implementation for a dynamic language. Considering we have yet to see invokedynamic or tail calls on the JVM, I'm rather unsurprised by SBCL's comfortable lead in performance.

[1] LuaJIT 2


There are other Lisp compilers which are about as fast as SBCL. For example on my Mac, the 64bit LispWorks is for many tasks faster than SBCL. SBCL is faster for some numerics, but 'typical' Lisp programs are running faster in the 64bit LispWorks. Then there is Scieneer CL, which is also forked from CMUCL, like SBCL.

There are many more compilers for dynamic languages that people usually don't know. I would be careful to claim 'fastest', given that only few advanced compilers in that area are known to people.


I think SBCL's speed is orthogonal to invokedynamic and tail call elimination. Those features would not speed this benchmark up very much.


> invokedynamic or tail calls

Neither would improve Clojure's performance.


Tail calls would improve Clojure performance since you wouldn't have to trampoline.

Edit: Not that they would, necessarily, improve the speed of this benchmark that much.


It seems easy to hit some implementation wart that makes performance abysmal - it's hard to believe that a JVM targeted language would be 20x times slower. Case in point - I tried the benchmark with a bigger input - the performance ratio between the 'fast' implementations remained the same. Incredibly, though, the CL implementation took nearly 2 minutes to construct the input string, something that is essentially instant in the Clojure version.

  time sbcl --noinform --load palindromes.lisp --eval "(progn (big-test) (quit))"

  30000 X ' amanaplanacanalpanama ' 
 
  fast : 
  Evaluation took:
  0.197 seconds of real time
  0.191206 seconds of total run time (0.186952 user, 0.004254 system)
  [ Run times consist of 0.006 seconds GC time, and 0.186 seconds non-GC time. ]
  96.95% CPU
  471,342,768 processor cycles
  13,802,832 bytes consed
  

  real	1m55.551s
  user	1m51.954s
  sys	0m0.778s


Because the CL code to create the string is completely inefficient. It starts that he prints a string with format, using a format directive. Instead of (FORMAT s "~a" string) use (write-string string s). Also, if the result string length is known, one can just allocate a result string and copy the source string repeatedly into the result string...


It's clear that it's inefficient and I'm no expert in either CL or FORMAT but it's not just inefficient, it's staggeringly inefficient. 2 minutes to make a few hundred k string on a 2+ghz CPU is hard to fathom, even if one assumes the method is imperfect, generates piles of garbage, etc, etc. It'd be interesting if someone could explain how this actually happens.


I'd say it's a performance bug. Replacing (format stream "~a" string) with (format stream string) gets rid of the problem.

LispWorks has no performance problem with both forms, though.

The reason is the SBCL pretty printer. If

    *pretty-print*

 is NIL, then both forms run fast in SBCL...


Interesting. For the massive hit, that better be the prettiest printing pretty printer in the galaxy. OpenMCL didn't choke on it either.

Edit: Well, the last bit is a lie, it did the moment i set

  *PRINT-PRETTY*.


Even with

    *print-pretty*
set to T, LispWorks runs fine.


I don't know much about optimization, but...it looks like he's timing the Clojure version with Clojure already loaded, but timing the SBCL version with SBCL not loaded. Would that change the measurements much?


Nah, that's not what's happening. The (time ...) special form is what prints the timing results, and it's squarely in the (big-test) function. All is right under the sun.


Some context might help... which implementation of Common Lisp? SBCL?


From the article -

"triton:palindromes jgrant$ sbcl --noinform --load [...]"


Thanks


Would the SBCL be able to spread across the many cores a common desktop has today? How is threading implemented in CL?


If I recall correctly, as in many other Lisps, it's a derivative of MP facility dating back to Lisp machines, and it runs over native threads.


I am looking into the cl-cookbook at sourceforge. It looks very interesting and worth investigating more. This specific benchmark seemingly would benefit from multiple threads on multiprocessors, if I could read its implementations correctly.


cl-cookbook is mostly outdated. use bordeaux-threads; it works across all lisps :-) and it's multicore for sbcl linux and clozure everything else (dunno about the commercial lisps)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: