Hacker News new | past | comments | ask | show | jobs | submit login

... wherein you learn how slow Python is, and learn that the author severely underestimates how fast optimized C can be.

Many of these questions are heavily dependent on the OS you're running and the filesystem used, and of course the heavy emphasis on Python makes it hard to make good guesses if you've never written a significant amount of it. I mean, I have no idea how much attention was paid to the development of Python's JSON parser; it's trivial to write a low-quality parser using regexes for scanning, OTOH it could be a C plugin with a high-quality scanner, and I could reasonably expect 1000x differences in performance.

Interpreted languages tend to have less predictable performance profiles because there can be a large variance in the amount of attention paid to different idioms, and some higher-level constructs can be much more expensive than a simple reading suggests. Higher level languages also usually make elegant but incredibly inefficient implementations much more likely.




Python's JSON parser will obviously create Python objects as its output. There is a limit of how much you can gain with clever C string parsing when you still have to create a PyObject* for every item that you parsed. Because of this, I don't think you can gain 1000x performance with C optimizations unless the parser is really horrible (unlikely, considering the widespread use of JSON).


There are some speed comparisons of Python json parsers here http://stackoverflow.com/questions/706101/python-json-decodi...

Yajl (Yet Another JSON Library) seems to go about 10x faster than the standard library json


Fwiw I know of some benchmarks of Perl deserializers, some written in pure perl, others in C (but still producing Perl data structures), and a speed difference of a faktor 100 is not uncommon.


Well, I'm not a python programmer, and I guessed 14/18 correctly. Some of the questions are hard to know without knowing details of the libraries - JSON, as you point out, for instance (it's also one I got wrong, expecting better throughput). But since each question lists the actual performance numbers after you answer, you can get a feel for how big the overhead is in trivial pure python.

But yeah, some of the questions were poorly chosen. For example, on my machine, using gcc -O2 as the author specifies, the first program always executes instantly, since gcc optmizes that loop to nothing. That leads me to believe he may be running a mac (IIRC those come with absurdly out of date gcc versions as a result of the GPLv3 dispute?) or something else fishy is going on.

One interesting thing to note is that the memory access latency on his machine is almost an order of magnitude faster than you might expect based on "Latency Numbers Every Programmer Should Know" (https://gist.github.com/jboner/2841832) - and that's not a coincidence; those 100ns have always been a bit conservative, and the latency number has slightly improved in the past two decades.


gcc is little more than a symlink to clang for recent versions of Xcode, so something else fishy is going on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: