More precisely, if you tell the SBCL compiler to trust that all data types are as declared and omit type checks, it gives you code that's faster than gcc with the options at the bottom of this page: http://shootout.alioth.debian.org/u32/benchmark.php?test=spe...
These are "inner loop only" compiler settings, at least the way I'd use it --- but it's still nice to see concrete demonstrations that you don't have to drop down to C code to get maximum performance.
EDIT: (declaim (safety 0)) also omits array bounds checks, and checks for undefined variables.
I just started reading the thread linked from the blog post, and it felt like reading House of Leaves. Here are some choice quotes from various authors:
Re Clojure: "This is a 'babel' plot to destroy lisp."
"Pocket Forth is a free Forth interactive-interpretor that runs fine on my Macintosh "Performa 600" (68030-CPU) System 7.5.5."
"The Mac is a desktop-publishing 'appliance' --- considering
that you don't have a laser-printer, a Mac is about as useful to you as a bicycle is to a fish. Besides that, you don't seem like the desktop-publishing type of guy --- that is mostly a marketing-department girl thing."
"I really foresee the collapse of civilization. The majority of people in America are motivated entirely by hate, fear, greed and envy, and this situation can't continue indefinitely. This is what I describe in my book, 'After the Obamacalypse,' which is included in the slide-rule package on my web-page."
Another time I was sitting in my van in a parking lot. A skinny
Jew walked up to the van, peered inside, then tried to open the door
but discovered that it was locked, so he walked away. I got out and
walked over to him, and I said: "What the hell do you think you're
doing?" He also said that he thought it was his friend's van, but he
didn't apologize at all, but became prideful and belligerent. When I
said, "I think you're a thief," he said: "Look at the way you're
dressed; you're the thief!" (I was wearing a hoodie). He told me that
if I continued bothering him, he was going to call the police, and he
got out his cell-phone. When I said, "I think you were looking for
something to steal," he said: "There is nothing in your van worth
stealing!" I beat him thoroughly with my fists and left him face down
on the sidewalk in his own blood. Somewhat belatedly, be began to cry:
"I'm sorry! I'm sorry!"
It ends shortly after "Discussion subject changed to 'Whining (was Re: ordered associative arrays)' by John Passaniti."
One of the reasons I left comp.lang.lisp (I used to be part of that community) was because there were so many people there who would, at the slightest provocation, fly off the handle and explain to you their alternative theory of whatever. A lot of crazy people.
It seemed like for every Peter Seibel or Kenny Tilton, there were 8 people who had 10% of 100 projects done, were happy to tell you about the anti-lisp conspiracy, and also had alternative health advice.
I also enjoy comp.os.linux.advocacy for complete insanity. The fun part is that there's exactly 0 constructive conversation going on. It's all flames, all the time.
He specifically asked gcc for optimisation for code size (-Os). For speed, he should be using -O3 only. He used "-Os -O3". This invalidates the benchmark.
This is true for real software, not microbenchmarks. All of the shootout benchmarks will fit in their entirety in L1i cache -- which makes reducing the executable size pointless.
Incidentally, this is probably the largest reason why so many people still use -O3 -- it wins in exactly the kind of programs that are used as simple and common benchmarks. It solidly loses on almost everything else.
Do you have any data on that? Most CPU bound programs should have pretty good instruction locality, negating the effects of smaller code. But without some numbers this is pointless guesswork.
I've not timed anything, but asking gcc to optimise for size is the wrong thing to do when benchmarking for speed. I can think of lots of ways that this would cripple performance. Why not let gcc make its own decision?
The only justification for using -Os in a speed benchmark is "I tried it both with and without the flag, and it was faster this way". I don't see any such assertion.
> The only justification for using -Os in a speed benchmark is "I tried it both with and without the flag, and it was faster this way". I don't see any such assertion.
Really? It seems to me that this is quite enough:
> I’ve just re-run the C benchmark without -Os (only -O3) but the results are the same.
If this holds true, I'll concede this specific point.
As we know, however, benchmarking can often come down to tuning. If this most basic of compiler options has not been set to the obvious choice for speed, how can we have any confidence that the C code as written is written in an efficient way?
Are we comparing language against language here, or somebody's implementation in one language against somebody's implementation in another?
I note that there appear to be hand optimisations in the C code. Were these done well, or would the compiler have done a better job?
Of course we are comparing implementations; languages do not have a speed. My language (purely hypothetical, unfortunately) language at builds on 'Principia Mathematica' may need a 10000 page program to compute 1+1, but its compiler could, in theory, produce the same executable as C (or Fortran, or whatever) would from their one-liners that do the same thing.
Is this really still news? Yes, we know you can get great performance in some tasks with languages other than C. I swear, if I see ANOTHER article with the linkbait title of "X faster than C"...
The decent ones posted at least bother to do a comparison with several pseudo-representative tasks. This one just goes "hey, I played around with this ONE SPECIFIC TASK NOBODY GIVES A CRAP ABOUT and IT RAN 0.006 MILLISECONDS FASTER THAN IN C! WOOOOOOOOOOOO!"
"We beat C" is a claim that goes hand in hand with "we are viable for scientific computing", so I'm always interested in hearing it (although more benchmarks would be nice).
I recall at least one old FORTRAN guy wandering into comp.lang.c who had very few good things to say about C's handling of floating-point calculations...
Floating point is not the problem, it's memory issues mostly due to C defaulting to allowing aliasing. C99 has the `restrict` keyword so you can generally get identical object code from both languages. SSE intrinsics are only available from C, you will either use them or assembly any time you care a lot about performance of tight kernels (very few nontrivial kernels are vectorized adequately by any of today's compilers).
As I recall, this had very little to do with this guy complaints - it was a mixture of C allowing use of x87 80-bit-wide doubles and not allowing sufficient reordering of operations.
That said, yes, restrict was added for this kind of thing.
That's true, but these posts aren't ever of the "language Y is actually as good as or better than C, always!" variety, are they? Instead what we get is the results of (in the best case) a couple of micro-benchmarks that happen to show comparable performance to C.
If someone could show me that "yes, your Python programs are now AS FAST AS C!" then of course I'd be ecstatic to hear that; but the posts letting me know that "Python is as fast as C when approximating solutions to problem X, for some X you've never heard of and never will" get kind of old after the 137th time I read them.
For me this is comparable to someone posting about yet another problem in NP that is REALLY FRICKIN' HARD, so probably P=/=NP. I know many problems in NP are hard - you're not adding anything to the discussion by showing me yet another one. Let me know when you have an actual proof that P=/=NP.
It is interesting to note that a beautiful python program from the "interesting alternative" category [1] beats the C program, and LuaJIT is always impressive [2] on these sorts of microbenchmarks (beating SBCL, with one third the source code).
If you are such an expert on what makes a benchmark representative of "real-world" problems, you're welcome to make a contribution to the Shootout. I'm sure they'll be glad to accept it, and everyone will be relieved to find out all the other benchmarks are worthless and everyone has been wasting their time.
Actually slavak, inner loop C performance is all that's necessarily to make many of these tools viable.
If I can write my entire program in LANGUAGEX and just compile the inner loop a magic way and voila, the program runs at 85% C speed, we have a winner. We can use it in long-running programs which have a fierce compute time bounding.
This is an article explaining the magic way for a flavor of lisp.
At least in the past the Shootout code wouldn't have explicit (declaim (optimize ...)) in the source files, but the command used to compile the files would have it. Did it really get removed from the command line?
Alexey Voznyuk wanted it removed - My point is that obligatory "(optimize (speed 3) (safety 0) (debug 0) (compilation-speed 0) (space 0))" is totally wrong.
Thanks for the explanation. I think he's totally wrong though, and could have just overridden whatever setting he was unhappy with in his own program. It seems crazy to pessimize every other program for the sake of one solution though, especially when the approach taken is considered cheating and the solution marked as "interesting alternative".
And no criticism implied on the Shootout maintainers, I'm sure that dealing with the submitters is like herding cats :-)
I did years ago, and a few of them still seem to be around. But won't be doing it again both due to reasons we have discussed before, and because the implementations seem to have totally dived off the deep end of complexity by now, and don't really look like they'd be much fun anymore.
Well for fun you might do want no one else has done, and contribute a Lisp program for meteor-contest - solve as you please, there's no cat herding for that one
After reading all these interesting and enlightening comments (no pun intended here, there are all really useful), the blog post should really be titled:
"A particular SBCL-compiled LISP-implementation of a specific algorithm gives comparable results to an analogous GCC-compiled C-implementation, when run on particular boxes."
More precisely, if you tell the SBCL compiler to trust that all data types are as declared and omit type checks, it gives you code that's faster than gcc with the options at the bottom of this page: http://shootout.alioth.debian.org/u32/benchmark.php?test=spe...
These are "inner loop only" compiler settings, at least the way I'd use it --- but it's still nice to see concrete demonstrations that you don't have to drop down to C code to get maximum performance.
EDIT: (declaim (safety 0)) also omits array bounds checks, and checks for undefined variables.