Lots of people have spent years writing programs spanning platforms, servers, services, and languages. However, efficient and elegant code is far and few between.
What code has stood out to you for being elegant and efficient? Why or why not?
It was, IIRC, only 3 C++ classes and just a few hundred lines of code. It outsourced much of the distribution, task running, and disk-access tasks to other Google infrastructure, and only focused on running the computation, collecting results for each key, and distributing to the reducers.
The current (as of ~2012, so not that current anymore) version of MapReduce is much faster and more reliable, but there's a certain elegance to starting a trillion-dollar industry with a few hundred lines of code.
There was another doozy, also by Jeff Dean, in the current (again, as of 2012) MapReduce code. It was an external sorting algorithm, and like most external sorts, it worked by writing a bunch of whole-machine-RAM sized temporary files and then performing an N-way merge. But how did it sort the machine's RAM? Using the STL qsort() function, of course! But how do you sort ~64GB of data efficiently using a standard-library function? He'd written a custom comparator that compared whole records at a time, using IIRC compiler intrinsics that compiled down into SIMD instructions and did some sort of Duff's-Device like unrolling to account for varying key lengths. It was a very clever mix of stock standard library functions with highly-optimized, specialized code.
My memory's actually hazy over whether it was qsort or sort; my intuition is that it would've been qsort because QuickSort is what you'd use when you need an in-place sort with little additional RAM required, but it's been so long that I honestly don't remember.
Ah, good. I'd initially written std::sort in the comment and then went back and edited it because I was like "Isn't std::sort usually mergesort? That wouldn't work here because it takes extra space." It's been a while since I've written C++.
I'd read somewhere [1] that the built-in Python sort function has a lot of good / clever optimizations too, though maybe not of the same kinds that you describe, i.e. may not be at machine language level. Tim Peters did a lot of that, per what I read, though others may have too.
[1] Think I read it in the Python Cookbook, 2nd Edition, which is a very good book, BTW.
Different kinds of optimizations. Timsort tries to collect runs and falls back on insertion sort for them; it exploits the fact that much real-world data is already partially sorted to reduce the number of comparisons made. This optimization exploits the fact that MapReduce keys are always strings in contiguous areas of memory, and are often fairly large, to compare them really quickly.
Since it's old and no longer relevant to the Google, it would be really interesting to have that code in some kind of code museum, as I feel it gave an interesting insight on how Google did big data (the real one, not the marketing one) back then. Not sure it that's feasible, but I guess it doesn't hurt to ask.
I wish, but it's unfortunately not my call to make. A few other companies have done this, eg. Microsoft open-sourcing Altair Basic about 30 years after it came out or id open-sourcing DOOM. Maybe if I ever go back to work for them, I can propose it. For now, consider getting a job at Google if you want to peek into the VCS history.
On a larger scale than most suggestions so far, I’ve always been impressed with SQLite. The developers have managed to create a useful-in-the-real-world tool, small and efficient enough for even quite demanding applications like embedded systems work.
The original source code was solid and well documented in the parts I’ve seen, but the remarkable thing to me is the way they distribute it: it comes as a single ANSI C file and a single matching header, which they refer to as the “amalgamation”. It therefore works just about anywhere, has no packaging or dependency hell issues, can be be incorporated into any build process in moments, and can be statically linked and fully optimised.
Joe Armstrong - the creator of erlang would turn his cluster of 1000's of machines to change behaviour. The code is really 5 lines.
My favorite Erlang program
21 Nov 2013
The other day I got a mail from Dean Galvin from Rowan
University. Dean was doing an Erlang project so he asked
“What example program would best exemplify Erlang”.
...
The Universal Server
Normally servers do something. An HTTP server responds to HTTP requests
and FTP server responds to FTP request and so on. But what about a Universal
Server?
surely we can generalize the idea of a server and make a universal server that
which we can later tell to become a specific sever.
Here’s my universal server:
universal_server() ->
receive
{become, F} ->
F()
end.
...then I set up a gossip algorithm to flood the network with become messages.
Then I had an empty network that in a few seconds would become anything
I wanted it to do.
a process in erlang, is nothing but a tail-recursive function. the moment it stops being so - it dies. so here it morphes into F; which can be passed in.
I love the story of the fast inverse square root. A bizzare piece of code from quake 3 shows up on usenet with a magic constant that calculates the inverse square root faster than table lookups and approximately four times faster than regular floating point division. Inverse square roots are used to compute angles of incidence and reflection for lighting and shading in computer graphics. Author unknown but was once thought as of Carmack.
I would say that the actual source code there is extremely ugly. It may be an elegant solution, but there's no way I would want to crawl around in that repo:
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 );
// what the fuck?
y = * ( float * ) &i;
That has to be LuaJIT. The amount of knowledge hidden in that code base is mind blowing. It contains a state of the art tracing JIT compiler, optimized for at least 5 different platforms, a custom assembler (dynasm) which is used for implementing hand optimized code for the interpreter in 6 different architectures. Additionally it has one of the best FFI implementations I've ever seen: Just parse a C-header file at runtime and you can call into C code with zero overhead. All of that written by a single person (Mike Pall). If you haven't had a look, you should: https://github.com/LuaJIT/LuaJIT/tree/v2.1
I was going to say LuaJIT also. JIT compilers have a habit of looking rather like chicken scratch, but LuaJIT is pretty readable (for a JIT).
Specifically, the way it manages traces (lj_trace.c, lj_record.c) is really elegant. It's not much code, but it's the best available tracing JIT compiler and manages traces quite elegantly IMO.
I really love this blog post by Norvig. Last Christmas I was looking for a trivial project to try Clojure and had a lot of fun working through this. Highly recommended - https://github.com/prakhar1989/clj-spellchecker
The Solaris kernel source. I got a deep appreciation for the elegance of it when I was working for Sun a long time ago. I can enjoy analysing that code base, which is opposite to the feeling I get when looking at the Linux code.
It's not overly clever, but it's incredibly clear and easy to understand.
While I don't know exactly what parts were inherited, I have looked at parts of the code that definitely was not inherited from Bell Labs (for features that are comparatively new) and they are also very nice to read.
Perhaps they just followed in the footsteps of the masters, but whatever the reason, the code is, in opinion, incredibly nice to work with.
Doom 3 was touted[1] as having some "exceptional beauty." Naming, spacing of properties, consistency, and how multiple parameterized calls were formatted are quite nice. You can see an example on github[2], but for some reason the spacing is a bit off in the web view. I'd recommend cloning a local copy and taking a look.
Fast inverse square root[0] is something that I encountered in the mid-2000s in the Q3A source code. It took me a really long time to understand it, and I eventually had to show it to some professors before I really understood what was going on and why this worked.
That's really an example of how arbitrary human thought processes are. When you release the constraint that your code has to have some human-comprehensible analog, you might arrive at interesting results.
Most of that is fairly straightforward. It is using newton's method to calculate the inverse square root. But to get that with one or two iterations, you need a good estimate to start with. The square root of the floating point exponent is half the value. Knowing how the floating point number is packed, we know a right shift is equal to divide by two and negate it. What remains is how the shift affects the mantissa and if some correction factor is needed. This could have been gotten by least square optimization to minimize the error.
> Knowing how the floating point number is packed, we know a right shift is equal to divide by two and negate it.
Shifting an IEEE754 floating point number does not have that effect.[0] The fact that it doesn't do that is the source of the "mystery" of fast inverse square root.
It has been a while since I looked at it, but I was impressed by Russ Cox's implementation of channels in the LibThread library of Plan 9. All channel operations including blocking on several channels and selecting one that is ready, are described by Alt structures, which under the hood implement a simple but elegant algorithm that appears in Rob Pike's "The Implementation of Newsqueak." I feel you want to understand roughly how Go channels works, read that paper and look at LibThread's channel implementation.
A second choice would not be so much code, but an algorithm: Thompson constructions for regular expressions.
For me, the memorable pieces of code are those that made me "want to become a programmer", to choose the path I chose, way back in the day.
Unsurprisingly, these were games:
1) A C64 "SnakeByte-like" game, whose exact name I forgot. It was written entirely in BASIC 2.0 (so you could list and read the source code), and with me having no C64 manual, and no English, it was a true revelation. So much fun and beauty emerging from such a concise, approachable program!
2) An ancient five-in-a-row implementation, I think in BASIC again or maybe Pascal. I remember the shock after seeing how simple the code was, compared to its (surprisingly good) playing strength and speed. My github user name, "piskvorky", is an echo of this old experience :-)
The underlying appeal seems to be a combination of simple, elegant rules giving rise to complex and fun behaviour. That, to me, is elegance.
> A Professor of Computer Science gave a paper on how he uses Linux to teach his undergraduates about operating systems. Someone in the audience asked 'why use Linux rather than Plan 9?' and the professor answered: Plan 9 looks like it was written by experts; Linux looks like something my students could aspire to write.
But see also the HN thread :
Code which every programmer must read before dying
I worked for a networking software developer at some point in the past and they were considering licensing a VPN stack from the SSH company [1]. That thing was in C and the sample code included was stunningly beautiful as was the documentation - well modularized, consistent, good naming notation, but above all it was concise. I think I still have a CD with the SDK demo, I can pull up some code from there if anyone's interested.
I was reading some of the Python standard library code and one piece stood out for me in the heapq.py module.
Merge sorted iterables using min heap:
def merge(*iterables):
'''Merge multiple sorted inputs into a single sorted output.
>>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
'''
_heappop, _heapreplace, _StopIteration = heappop, heapreplace, StopIteration
_len = len
h = []
h_append = h.append
for itnum, it in enumerate(map(iter, iterables)):
try:
next = it.next
h_append([next(), itnum, next])
except _StopIteration:
pass
heapify(h)
while _len(h) > 1:
try:
while 1:
v, itnum, next = s = h[0]
yield v
s[0] = next() # raises StopIteration when exhausted
_heapreplace(h, s) # restore heap condition
except _StopIteration:
_heappop(h) # remove empty iterator
if h:
# fast case when only a single iterator remains
v, itnum, next = h[0]
yield v
for v in next.__self__:
yield v
If I'm not wrong, this is written by Raymond Hettinger and it's always a pleasure to read/use his code.
I really like the code for redux. It's only a few hundred lines of code, but has spawned an ecosystem of plugins and has over 13,000 github stars already.
Definitely the implementation of a (simplified) Scheme interpreter in Scheme we were doing years ago in a functional programming course on university. It made me fall in love with functional programming.
It's all in how one defines "elegant", I suppose. It relies on a handful of almost side-effects and the lucky confluence of a series of unrelated language choices; I see this and it reminds me of a dance, someone seeing the swinging pieces of the code and making a minor touch in just the right place, taking advantage of edge-effects that were never designed for exactly this but combine to make something emergent so simple and easy. I see this as breathtakingly elegant.
> What's the most elegant piece of code you've seen?
Well, not the most elegant, but a very elegant piece of code, solves the Towers of Hanoi puzzle in a few ingenious lines (Python as shown):
def printMove(disk,src,dest):
print("Move disk %d from %s to %s" % (disk,src,dest))
def moveDisk(disk,src, dest, using):
if disk >= 1:
moveDisk(disk-1,src,using,dest)
printMove(disk,src,dest)
moveDisk(disk-1,using,dest,src)
count = 3
moveDisk(count,'A','B','C')
Result:
Move disk 1 from A to B
Move disk 2 from A to C
Move disk 1 from B to C
Move disk 3 from A to B
Move disk 1 from C to A
Move disk 2 from C to B
Move disk 1 from A to B
Turbo's code seems very beautiful to me. Before finding this, I saw web servers as either a big-black-box, coded in a hard to grok systems language (Apache and Nginx), or as a testing tool written in a slow, but readable scripting language (Python or PHP). Now, I can dive into a fast, robust, and easy to read web server (and client) to see nearly every bit of functionality. Lua continues to impress and strike me as a great language for web development.
I have limited exposure being a junior web dev, but I find Laravel to be very well structured and what some might describe as Elegant. Jeffrey Way discussed his opinions on it that I agree with last year at a Laracon: https://youtu.be/mDotS5BDqRM?t=1539
In the sense that perfection is achieved when there is nothing left to take away, GO.COM as described in http://peetm.com/blog/?p=55. It doubles as a winning entry in 1994's IOCCC.
A quiz question from the old Imphobia demo diskmag: a single instruction to extend a byte value in al across a full 32-bit word. Impossible right? Nope:
Some people might hate this answer, but sometimes the most elegant piece of code doesn't actually exist.
Code is a tool that is used to solve problems. Sometimes the most elegant solution to a problem doesn't involve writing code, but involves examining a business process and changing the work flow a bit to avoid needing to code something...or rewording an item in the user interface so that certain code isn't anymore needed.
Sometimes rather than being a drone and just churning out whatever code you're told, the bravest and most elegant solution to a problem is figuring out a way to avoid it altogether.
This is the hardworking lazy programmer's most elegant code.
It was, IIRC, only 3 C++ classes and just a few hundred lines of code. It outsourced much of the distribution, task running, and disk-access tasks to other Google infrastructure, and only focused on running the computation, collecting results for each key, and distributing to the reducers.
The current (as of ~2012, so not that current anymore) version of MapReduce is much faster and more reliable, but there's a certain elegance to starting a trillion-dollar industry with a few hundred lines of code.
There was another doozy, also by Jeff Dean, in the current (again, as of 2012) MapReduce code. It was an external sorting algorithm, and like most external sorts, it worked by writing a bunch of whole-machine-RAM sized temporary files and then performing an N-way merge. But how did it sort the machine's RAM? Using the STL qsort() function, of course! But how do you sort ~64GB of data efficiently using a standard-library function? He'd written a custom comparator that compared whole records at a time, using IIRC compiler intrinsics that compiled down into SIMD instructions and did some sort of Duff's-Device like unrolling to account for varying key lengths. It was a very clever mix of stock standard library functions with highly-optimized, specialized code.