This is one my favourite questions, asked by idlewords: https://news.ycombinator.com/item?id=879101
But its been a long time since then, and with a new set of programming languages, I believe there are new code bases that's worth studying about and learning from.
So, which code-base should I be reading about to improve myself?
I think you get more benefit from reading code if you study something very close to what you are working on yourself, something in the same domain, in the same framework perhaps, or at least in the same programming language, at best something you are deeply involved in currently.
I never seem to get enough motivation to read deeply into random "grand" code bases like Lua or SQLLite, but some months ago I got into the habit of always studying a bunch of projects that use a given technology before I use this technology, and it greatly decreased the amount of time it takes me to get to a "idiomatic" coding style. So instead of diving in a random, I would recommend making researching existing code-bases related to what you are currently doing an integral part of your workflow.
I really enjoy doing this as well. Are you aware of any resources or metrics (number of contributors on github, etc.) to find which projects are "well crafted" in X framework/language?
Perhaps I overestimate how much risk there is in learning idiomatic practices from a project which is not actually all that idiomatic. I like the assurance that what I read is quality, especially with frameworks or languages I'm very new to where it can be tough to tell.
Slightly off topic, but Peter Seibel's take on the idea of code reading groups, and the idea of code as literature, is interesting: http://www.gigamonkeys.com/code-reading/
"Code is not literature and we are not readers. Rather, interesting pieces of code are specimens and we are naturalists. So instead of trying to pick out a piece of code and reading it and then discussing it like a bunch of Comp Lit. grad students, I think a better model is for one of us to play the role of a 19th century naturalist returning from a trip to some exotic island to present to the local scientific society a discussion of the crazy beetles they found."
The reason this is off topic is that it sounds like you were after interesting specimens anyway. I don't have any code examples as such, although if algorithms count I'm particularly fond of Tarjan's algorithm for finding strongly connected components in a directed graph, and the Burrows-Wheeler transform (as used in bzip).
Fabien Sanglard http://fabiensanglard.net has some excellent code reviews on his website, particularly games.
You could read some of the code-bases he reviews, and then read his review. You'll be able to compare and contrast your opinions with his, and if there's interesting variation you can blog about it ;)
The Architecture of Open Source Applications book[0] gives a high level overview on many open source projects. It's a good starting point to dive into the code of these projects.
Great series of books! There are three now, and a fourth is being worked on which is very relevant to this posts query. There isn't much too it yet, but its working title is 500 Lines or Less, and it aims to implement some manner of a working application in 500 lines or less. There is a GitHub repo [0] where the project is being coordinate.
I've heard lots of people sing praises for Redis source - https://github.com/antirez/redis. A cursory look into the source shows a very well documented code-base. It's one of the top items in my to-read-some-day list. Salvatore is an excellent C programmer and takes a lot of pain in writing good documentation, despite his not so great English skills. A shout out for him, thanks for setting an example.
To mix things up a bit, I'm going to give two very small examples of code that can be understood quickly, but studied diligently. Both are in JavaScript, which I notice you mention specifically in another comment:
[2] Bouncing Beholder. A game written in 1K of highly obfuscated code, which the author expands upon here. Worth it because it teaches some crazy optimisation techniques that are applicable to all programming, but also includes plenty of javascript-specific trickery. http://marijnhaverbeke.nl/js1k/
Ron Jeffries attempts to create a sudoku solver – here, here, here, here and here. (You really ought to read these articles. They are ummm…{cough} …err…. enlightening.)
Those that are implemented in C are in the "Modules" folder from the top level.
I think you [OP] will get the most benefit from reading libraries that you use often because it will give you some direction.
If there are 3rd party libraries you use heavily those will be interesting too.
It is very very difficult to just pick up megabytes of code and start reading them and find it useful. You will be able to pick up style and conventions but not really high-level engineering decisions.
My suggestion is to take software that you use regularly and run it through a debugger. Since you use it you know the problem domain. And with the debugger the code you're reading gets real context. I've done this with Git for example.
And also look at the design documents for big OSS projects.
For C# i found the disruptor.net code to be very nice. It's mostly a direct port of disruptor from java - but it definitely was pretty nice to read through.
I have some experience with Yii and coming from Django I'm not impressed with the _functionality_. Maybe the code is really great but why not study something Symphony which, arguably, achieves more and is therefore better at solving the same problem?
Agreed but if function follows form then Yii is somewhat lacking. Maybe I'm just biased because when I was in a tight spot digging through the source and docs didn't get me where I wanted to be.
Looking at their GitHub now I must admit that the source code and comments look pretty good.
Erlang: Riak
https://github.com/basho/riak
Riak is actually a layering of a few different projects including Riak KV, Yokozuna (Solr), Riak Core, etc. It was grown out of the Dynamo paper.
Haskell: Snap
https://github.com/snapframework/snap
Snap is another project built in layers (snap-server, io-streams, snaplets, snap-core). The 1.0 release makes some pretty massive structural changes behind the scenes changes with minimal breakage of the public api and io-streams is a very nice api to work with.
JavaScript: Underscore.js
http://underscorejs.org/docs/underscore.html
Underscore is a utility library that gives a nice overview of various techniques in JS, such as how to handle equality, use of apply, ternary operators, etc. Many functions have fallbacks to ECMAScript 5 native functions.
Reading the Underscore and Lodash sources was hugely beneficial to me. If nothing else, it taught me how to reimplement what I could and drop the dependencies in my libraries. I find the whole "vanilla js" meme a bit snarky and elitist, but it really does make an impact if you are writing a lot of modules used by others.
Agreed, and after you've worked through the source to the canonical implementation of Lua, you can level-up by looking into the source of LuaJIT: http://luajit.org/download.html
Both are immensely awesome codebases, but the level-up is perhaps a bit too steep. Going from delightful ANSI C (vanilla Lua) to a lot of assembly + C (LuaJIT's interpreter) and runtime assembly generation (the JIT) is no slight task.
Please enjoy the source code of PostgreSQL (any version, but latest is generally recommended) core. It is very well factored, and typically also very well commented. This community cares a great deal about code quality, because they are so clear on the relation between readability, diagnosability, and execution correctness.
Honestly, aside from learning to express a few extremely specific patterns in your language of choice concisely and elegantly and reminding yourself of the existence of certain libraries and utility functions so you don't accidentally waste time reinventing them, I think reading source code is a pretty useless exercise unless you also have a detailed record of how that source code came to exist in its present form. Until there is some revolutionary new tool for generating a human-understandable narrated history of large-scale design decisions from a source control history, your time will almost certainly be better spent reading textbooks that incrementally develop a piece of software over several chapters. Even that is cheating -- the authors know exactly where they want to end up and they won't include all the missteps they made when they first started writing similar programs. But it's still loads better than the alternative. Just as sitting in a law school library absorbing an encyclopedic knowledge of the law won't really train you to make arguments that will fly in front of a judge, reading a code base as a dead, unchanging document won't teach you what it is to live in that code.
It is possible to make misteps gracefully - that is, even when making mistakes you can mitigate the problems by writing clean code and trying to make it fit nice patterns you've already seen.
Sure, you can't copy directly, but you can get a feel for what is nice and isn't by looking at lots of code that has reached it's goal, and thus get an intuition for code smells.
Further, seeing how principles and patterns are used elsewhere helps take abstract concepts and turn them into real examples - for me anyway this is vital for my learning of anything.
Take a look at Redis sometime. You might want to actually work on it a bit to help internalize what you're reading. Here are a couple of articles that might help get you started:
It's a free cross-platform implementation of Apple's Cocoa, so there's a lot of stuff there. But the project is well organized, and almost everything is written in a minimalist oldschool Objective-C style.
I've looked at some other cross-platform frameworks, and they are often hard to understand because they have been developed by a large group of developers and include lots of complex optimizations and platform-specific code paths. Cocotron is not as finely tuned as Apple's CoreFoundation (for example), but much more readable.
Worth pointing out that the tool used to produce the annotated version is open source, and annotated by running it against its own source code: http://jashkenas.github.io/docco/
I have to say that the annotation is very nice. Would be a nice feature for something like Github, assuming that developers care enough to do the write up. Might get messy for larger projects though.
I learned a huge amount about how real operating systems are put together and the compromises that get made by reading the V6 Unix source via John Lions Commentaries (yes...I had a photocopied copy). Made exploring the BSD 4.2 and 4.3 source trees (another worthwhile exercise) much easier. I suppose if I was starting out today and not in 1985 I'd look at xv6 or Minix.
Also can someone suggest what is the best way to approach code reading? When I open a library in Python, I am not sure where to start reading, just a bunch of files. Should I randomly pick one file and start reading from there? Is there any common strategy?
I don't get much out of simply reading code. I have to get my hands in there and see how it works.
I load it into a repl and play around. See how changing little things effects tests. I add print statements to give a narrative. I draw diagrams of the code flow, especially startup, shutdown and sometimes error/exception handling.
Slight tangent to your question, but one thing I have noticed recently is that having to deal with really crap code inspires me to do my own better.
I inherited a colleagues work after she left, and it was horrible. But I thought about why it was horrible, and how to make it better. What would it look like if it was done well?
Even with my own code, if I look at something I did 6 months ago, and it doesn't make sense straight away, the it can usually be improved.
It frightens me how my future replacement is probably going to think "gee, that guy was horrible" unless he knows all the historic context and reasons why the code was written the way it was written.
Not a specific codebase, but I went through "Code Reading"[0] many years ago, I found it interesting. Most reviews are not very positive though, so maybe it was just at the right point for me.
The most interesting things to read are those where a programmer has done something cleverly, but this only needs to happen when your language or libraries make it hard for you to begin with. Aside from low-level performance intensive functions, the best code is not interesting to read - it just reads like statements of fact.
Eric S. Raymond wrote a book The Art of Unix Programming [1] that has many "case studies" as well as recommendations of which software/RFCs are particularly worthy of study.
Except that Eric Raymond doesn't know what the hell he's talking about, and the main point of everything he writes is ideological warfare to tear down Richard Stallman's work and promote his own career as a pundit and batshit crazy racist right wing whack job.
If you really want to read something by Eric Raymond, you can start here, but you'll need to take a shower afterwards:
"In the U.S., blacks are 12% of the population but commit 50% of violent crimes; can anyone honestly think this is unconnected to the fact that they average 15 points of IQ lower than the general population? That stupid people are more violent is a fact independent of skin color." -Eric S. Raymond
3.0 out of 5 stars Autohagiography with some programming tips, December 24, 2003 By A Customer
The writing style of this book tends to hurt the reading experience, as Raymond trumpets his own minor achievments in the free software community. The work feels like it needed one more rewrite before being released to the public: some related sources Raymond hadn't yet read at the time of writing, and some of his advice gets repetitive.
The exposition itself is not up to par with The Elements of Programming Style. Raymond tries to give a list of programming rules or principles to follow, but it reads more like a list of slogans that should be taken as axioms. While The Elements of Programming Style itself had a list of rules, the rules were well woven with each other, well defended, and they were used as a means of conveying a larger story. In Raymond's case, he relies upon the slogans in absence of such a story.
Thus, the book ends up more like a list of random unrelated tips. Some very profound, like his writings on threads (which he acknowleges Mark M. Miller for his help). Others are very shallow and pointless in a book that supposes to call itself about "Art." Some of the pieces appear only to function to attack Windows, and sometimes the information about Windows is embarassingly inaccurate.
One final criticism is that Raymond does not understand object-oriented programming very well and misses the point in several cases. You just need to see the popularity of Python, Java, C# (Mono), OO Perl and C++ in the Linux world to see that Raymond is off base calling OO a failed experiment. In fact, with almost any matter of opinion in the book you can feel Raymond's bias and be hit in the face with misinformation or dull false dilemmas.
However, given this book's many flaws, I rate this 3 stars instead of 2 stars because it also has valuable information from the many contributors, some of them Gods in the Unix world. These contributors often even disagree with Raymond, or point out other interesting tidbits. For these tips alone, it is worth checking out this book, though I would not recommend you buy it.
To get the true Unix programming philosophy, I recommend Software Tools, by Kernighan and Plauger. It's somewhat dated, and I recommend the Ratfor version of it, but that single book has became very influencial as I grow as a Unix programmer.
[...]
One of the problems with this book is the overly partisan tone it takes - one gets the impression that absolutely nothing Microsoft has ever done is of value, but the other major desktop PC OSes (Apple, Linux) represent different forms of perfection. (At home, I run Mac OSX, RedHat Linux and Windows, and have a reasonable sense of their relative strengths and weaknesses.)
So, be warned: Art of Unix Programming paints a one sided picture. The author is a well-known figure in the open source community, one of its fiercest advocates, and one of Microsoft's most vocal critics, so it might seem to strange to wish for less anti-Microsoft spin from this source. After all, the Raymond brand certainly carries with it an obligatory expectation of Windows-bashing, doesn't it?
One of the only Windows design decision which Raymond doesn't condemn is the (now discontinued) .ini file format. Even the thorough-going support for object-orientation in Windows is given short-shrift: after explaining the many horrors of object-oriented programming (according to Raymond), Unix-programmers are praised as "tend[ing] to share an instinctive sense of these problems." This section ([...]) is particularly illustrative of the one-sided approach that Raymond takes.
[...]
His comments about the Windows registry were a bit distressing, though -- not because they're negative, which I consider fine. Rather, it was obvious he'd never used it (comments like "there's no API for it") and it was also clear that he hadn't even bothered to research why it existed and what problems it was intended to solve. The comments were typical of what I'd expect of a Slashdot troll, but not of a bright, respectable person like ESR. I've programmed on both platforms extensively and only comment on what I have first-hand experience and knowledge of; I'd expect him to do the no less, especially as an author.
It was also curious that several times he implied unit testing == XP == agile software development. For as tuned in as he seems to be to methodolgy work, missing the forest for a single leaf is a bit embarrassing.
[...]
Hate to be the one to burst the proverbial bubble..., March 13, 2004 By Bill Joy (Hamburg, DE)
...but ESR's theory and ideology is rediculously flawed with misappropriated valuations. This is, yet another advocacy campaign for gnu/Linux, mixed in with a UNIX context. Given the purpose of the book, it's a fair assessment to label the burdening bias as filler and firewood: filler for those who really just wanted an explanation of the single-purpose, POSIX illustrated, hows and whys of UNIX programming philosophy; and firewood for those who tend to buy into slanderous hype at the whim of suggestion.
Questions: What is the title of the book? What programmatic philosophies are portrayed? How many unices are open source? Of those, how many subscribe to the same opensource mentality as gnu/Linux? Answer: zilch. Question: Then what is the relevance of such topics to the objectives of the book, as depicted in the title? Answer: fudd-ala-mode.
Aside from these intertwined distractions, and a severe Napoleonic complex against anything new or different, ESR does adequately represent the real purpose of the book, and in those efforts place value in the read.
[...]
an interesting and often annoying read..., June 1, 2004 By A Customer
I suppose any book containing so many interesting quotes from so many UNIX luminaries cannot be overlooked. (I wonder if any of them would have co-authored this) It also happens to contain a great many topics that are well-worth writing about; My only wish is that someone less in awe with the contents of his own field of vision, and with greater depth and objectivity (not to mention humility) had the opportunity to write this book.
Quality of discussion is varied as expected; Raymond is not quite the UNIX expert he thinks he is. In places, Raymond's tone encourages one to throw the book at the nearest wall and go out just to get some fresh air; He is condescending, hectoring, lecturing, and sometimes just misleading. Alas, I will still recommend it as worth reading (check your local library) with a nice grain of salt; just enough friction for thought is provided in this edition.
[...]
programming book without any code, May 11, 2013 By davez (LA, CA, United States)
If you expect to see any code, you'll be disappointed. The author gives ample examples, but mostly preaching the Unix philosophy or religion, at many times repeating the similar points. 500 pages are a little long for a book of this kind. Having read other Unix books, I've heard many of the stories retold here, but still learned a few new things. It reenforces many notions of Unix.
This is a book that I would recommend to someone new to Unix, but it is not a book that I would keep. After 10 years, the computing landscape has changed with the rise of the mobile computing and cloud computing; the Unix has evolved to support smart phones and tablets, in which the user interface has become essential.
One good example of the problem with his "technical" works and his lack of actual experience with the things he likes to talk about so much, is that he makes and promotes outrageously incorrect claims like:
Given ANY number of eye, "all" bugs are NOT shallow.
Some bugs are NEVER "shallow".
And only a FEW eyes are qualified to see some bugs, while MANY eyes are totally unqualified, including his own:
His mouth is certainly not qualified to make sweeping generalizations about "all" bugs, given his lack of experience as a programmer, and his spectacular public failure at auditing code in his pathetic attempt to discredit the now-exonerated scientists whose code predicted global warming (described below).
Neither "enough eyeballs" nor "the right eyeballs" are a GIVEN, even for open source software.
"Not enough eyeballs" (or "ZERO eyeballs" as he loves to claim) are NOT a GIVEN for proprietary software, because you can license much proprietary source code, and some proprietary source code is available for you to read and audit for free, under licenses like Microsoft's "Shared Source" license.
And qualified eye balls are NOT FREE, and usually very busy being well paid to look at much more interesting things than poorly written buggy code like OpenSSL. I doubt that Eric Raymond has contributed any of the profits from his books or VA Linux stocks to Theo De Raadt or anyone else who actually takes the long time and tedious effort to actually audit code.
Wikipedia points out:
>In Facts and Fallacies about Software Engineering, Robert Glass refers to the law as a "mantra" of the open source movement, but calls it a fallacy due to the lack of supporting evidence and because research has indicated that the rate at which additional bugs are uncovered does not scale linearly with the number of reviewers; rather, there is a small maximum number of useful reviewers, between two and four, and additional reviewers above this number uncover bugs at a much lower rate. While closed-source practitioners also promote stringent, independent code analysis during a software project's development, they focus on in-depth review by a few and not primarily the number of "eyeballs".
And then there's the fact that Eric Raymond had the nerve to name and blame the "law" on Linus instead of taking "credit" for it himself.
And of course he also has the nerve to attempt to defend his "law", after we've just gone through three HUGE security holes in open source software that would have been discovered long ago, if only "Linus's Law" were true. On his blog, he constructs a straw man argument that "proprietary software is worse than open source software", which does not in any way support his claim about "all bugs being shallow".
Nor does he address many of the valid points that people raise, in the wikipedia article itself I just quoted, or that people raised in response to his blog posting.
To quote Theo De Raadt: “My favorite part of the “many eyes” argument is how few bugs were found by the two eyes of Eric (the originator of the statement). All the many eyes are apparently attached to a lot of hands that type lots of words about many eyes, and never actually audit code.”
The little experience Raymond DOES have auditing code has been a total fiasco and embarrassing failure, since his understanding of the code was incompetent and deeply tainted by his preconceived political ideology and conspiracy theories about global warming, which was his only motivation for auditing the code in the first place. His sole quest was to discredit the scientists who warned about global warming. The code he found and highlighted was actually COMMENTED OUT, and he never addressed the fact that the scientists were vindicated.
>During the Climategate fiasco, Raymond's ability to read other peoples' source code (or at least his honesty about it) was called into question when he was caught quote-mining analysis software written by the CRU researchers, presenting a commented-out section of source code used for analyzing counterfactuals as evidence of deliberate data manipulation. When confronted with the fact that scientists as a general rule are scrupulously honest, Raymond claimed it was a case of an "error cascade," a concept that makes sense in computer science and other places where all data goes through a single potential failure point, but in areas where outside data and multiple lines of evidence are used for verification, doesn't entirely make sense. (He was curiously silent when all the researchers involved were exonerated of scientific misconduct.)
Eric Raymond's standard technique is to stonewall and ignore valid criticism, while viciously attacking his critics. You can see that behavior consistently applied in most of his blog postings, comments, public statements and publications.
An archetypical example is his mean spirited name-calling defense of Russell Nelson, who was acting as President of the Open Source Initiative, a position that Raymond had just been kicked out of because his divisive in-fighting and life-long jihad against Richard Stallman was embarrassing them and damaging their reputation -- Russell was his replacement: OOPS!
After having been appointed President, Russell Nelson posted a blog entry entitled "Blacks are lazy", which of COURSE drew a lot of criticism, because it was obvious race baiting, and riddled with logical fallacies and racist bigoted presumptions. Russell expressed that it was "poor writing" and withdrew the blog posting, and resigned his position as President of the Open Source Initiative.
The original blog posting can be seen in the comments section of his wikipedia page, so you can draw your own conclusions:
Russell agreed with his critics that his blog posting was wrong -- although as to HOW wrong, he still disagrees with most critics, and he's generalized his argument to "everyone is lazy" and has characterized the criticism of him as "slander":
So, now that I've explained the background, I will demonstrate what I mean by how Eric Raymond typically constructs his arguments to support his political agenda, by ignoring valid criticism and turning it around on the critic by calling people names:
Even though Russell agreed the article was badly written, took it down, and voluntarily resigned, Eric demanded that OSI not only SUPPORT Russell, but waste their precious money, time, energy and reputation FIGHTING a BATTLE based on his own extreme right-wing libertarian "principles" against Russell's critics (who he called "FOOLS" and "THUGS"), which had NOTHING at all to do with open source software:
“The people who knew Russ as a Quaker, a pacifist and a gentleman, and no racist, but nevertheless pressured OSI to do the responsible thing and fire him in order to avoid political damage should be equally ashamed,” Raymond said. “Abetting somebody elses witch hunt is no less disgusting than starting your own.”
"Personally, I wanted to fight this on principle," Raymond said. "Russ resigned the presidency rather than get OSI into that fight, and the board quite properly respected his wishes in the matter. That sacrifice makes me angrier at the fools and thugs who pulled him down."
Since both Eric Raymond and Russell Nelson lost their leadership positions as President of the Open Source Initiative because of their bigoted, racist, divisive and very public beliefs, you can guess which side of the Brendan Eich controversy they came down on:
"My first thought on hearing of the resignation of Brendan Eich as CEO of Mozilla: Congratulations, gay activists. You have become the bullies you hate." -Eric Raymond
And this from the same self avowed "cheerful gun nut" who threatened Bruce Perens:
"Damn straight I took it personally. And if you ever again behave like that kind of disruptive asshole in public, insult me, and jeopardize the interests of our entire tribe, I'll take it just as personally -- and I will find a way to make you regret it. Watch your step."
Anyway, back to the criticism of TAoUP and Raymond's other technical claims to fame and self aggrandizing Autohagiography:
Everything he writes is deeply tainted with his one-sided partisan ideology and narcissistic self promotion, which is not just limited to Microsoft bashing, but to extreme right wing libertarian politics and guns. You can see that by the changes he made to the "Hacker's Dictionary", and you can see that in everything else he writes.
I've posted some typical reviews of the book in the other message.
Well thank you for taking the time to give critique. I was hoping to return to the book at some point and work through the suggested readings in more detail. I will definitely approach it with more caution a second time round and even more so when recommending it to others.
As an aside, how do you know so much about Mr Raymond? :) That's an awful amount of very specific detail.
It's pretty well known stuff in the Free Open Source Software Dramatic Political Soap Opera Scene, since he always tries so hard to get attention by writing outrageous bullshit on his blog and throwing tantrums on mailing lists, like this: http://www.redhat.com/archives/fedora-devel-list/2007-Februa...
I've known Eric Raymond and Richard Stallman since the 80's, and I can confirm that Raymond has always been that way, isn't anything like the "hacker" he claims to be, and instead of writing or auditing code, he has made his career by self aggrandizing himself and tearing down Richard Stallman, who is and will always be a much better and more successful person than he is.
But the he got even crazier, and went off the deep end after 9/11.
His "many eyes" law that he shamelessly promotes has given many people and organizations a false sense of security in open source software, and that's led to many huge commercial corporations taking the free stuff without contributing any money or time back, and building all kinds of critical internet infrastructure on top of software like OpenSSL. And you know what that led to.
When he claimed that the mean old gays were bullying poor Brendan Eich, and his friend Russell Nelson took up and defended his argument, it made me recognize a pattern that explains their motivation very well:
Eric Raymond and Russell Nelson and Brendan Eich all served as the head of major free open source software companies: Eric Raymond was the first president of the Open Source Initiative, Russell Nelson was the second taking over when he resigned, and then resigned himself shortly thereafter, and Brendan Eich was CEO of Mozilla.
All three of them made bigoted statements and performed bigoted actions, and as a consequence of their own speech and actions, and of their high visibility leadership positions of free open source companies, they each felt compelled to resign from their jobs, and now feel very sorry for themselves because of how other people reacted, not because of how they acted.
It's disgusting how Eric and Russell are now whining about the mean "intolerant gay bullies" who didn't respect Brendan's right to be intolerant of oppressed minorities. They're just projecting from their own experiences as ousted bigots. The mean old community just couldn't tolerate their bigoted divisive beliefs which were embarrassing and damaging the free open source software movement.
That's their standard operating procedure, of ignoring criticism and calling their critics names instead of responding with logical arguments. So Karl Popper is a baby thinker in Russell Nelson's mind.
So naturally Raymond and Nelson both feel sorry for Eich. They're birds of a feather, cut from the same cloth: they all lay down with the same dogs, and wake up with the same fleas.
So what does it say about Eich that two of his most sympathetic and vocal defenders were also kicked out of their leadership positions in a Free Open Source Software company, because of divisiveness and bigotry, just like he was?
Eric Raymond and Russell Nelson are far beyond redemption. And Brendan Eich finally put himself into the same boat as they are, and it's all his own damn fault, so of course those guys are his biggest advocates and defenders, and they all deserve each other.
And now Eich's legacy is not just tarnished by his donation to support Proposition 8, but by the fact that two of the most notorious douchebag in the free open source software community have come out of the woodwork screeching a full throated defense of him.
I think it's wonderful that the community refused to be led or fooled by people like them, who would pay money to make TV commercials demonizing gays and destroying sex marriages, or race bait and call the victims of their own bigotry bullies for standing up for themselves and exercising their right to free speech.
I had a read through the PCSX2 emulator recently, that was quite interesting: https://github.com/PCSX2/pcsx2 it's a complex project in what was surprisingly readable C++ code.
I'm fascinated by concurrent programming. I find that reading classes from Java's java.util.concurrent package gives me very good practical insights as to what goes into building a concurrent class. My all time favorite is ConcurrentHashMap :)
I made this tool: http://codingstyleguide.com to improve the way I code for different languages and not get lost with too much programming information and it's helping me a lot.
Reference based compression! We create a set of reference sequences and simply record the start location of each identified subsequence - of course this relies on the assumption we can correctly recognise all genes and their variants... ahem.
The original source code to Zork in MDL. It doesn't matter if you don't know MDL. It's such beautiful code that it just explains itself to you. And if you've played Zork, it's like being invited to explore the underground backstage areas of Disneyland.
It's not great code (though I'm working to make it so), and perhaps not the intent of this question - but if you want to looking at a 25+ year old codebase that's being refactored, check out LibreOffice, especially the VCL component:
For PHP, I've been very impressed by Phabricator's code (and the related phutils library). It's worth looking at the git commits as well to see just how clean and structured commits can be.
I'm much more impressed by it than by any PHP framework code I've read (and I've read Zend, Symfony2, li3, codeigniter as well as custom frameworks)
I'm interested in this as well. Working on tooling to make this a more efficient process is something that I daydream about often, so it would be interesting to hear some perspectives.
Do you generate tags and hop around functions as you discover them? Do you do a high level overview of the important bits first, and then dive in later? I am sure it is different for everyone, but would still be interested to read about others approaches.
I've had this idea for awhile about bringing up a project in a methodical way in its source control history, and adding new features in a similar manner, so that it would be easier to introduce a codebase to those starting out in it. The problem with lots of large codebases is that as time goes on, configuration details or special cases are added on that obscure the meat of the program. If it were trivial to see the early version, and its evolution, it would be easy to pick out the mental model of what the code actually does.
In the .NET world, shanselman has a series of Weekly Source Code blog posts and most recently posted a list of seven 'interesting books about source and source code'.
When you read any C++ codebase remember that there are many dialects and so just because something is a good idea in one doesn't mean that it is in another and that you should prefer that way without understanding the reasoning behind it.
I"m not sure why this is being downvoted, other than the fact they are JavaScript codebases. JAshkenas did a wonderful job on these two libraries and they are incredibly well documented.
I highly recommend checking out http://voxeljs.com for some beautifully factored JavaScript npm packages, that implement a lot of Minecraft and more in the browser.
Max Ogden's talk (the first video on that page, also here: https://www.youtube.com/watch?v=8gM3xMObEz4 ) about how voxeljs and browserify work is inspirational, and his energy, motivation, deep understanding and skill, thirst for learning, reading other people's code, building on top of it, and sharing what he built and learned, is extremely contagious!
You may want to pause the video frequently and take notes -- there is so much great information in there, and he covers a hell of a lot of amazing stuff.
And the source code is really nicely broken up into lots of little npm modules that you can plug together to make all kinds of cool stuff.
This stuff is a great fun starting point for teenagers or students to learn how to program and create their own games and web applications, or master programmers to learn the node.js / npm ecosystem and idioms. There are some great ways for new and non-programmers to get into it.
He says "Everyday I work on it I get more motivated to work on it" -- and you will too!
What you will be benefitting from by watching his video and reading his code, is the fact that he actually did a survey of a HUGE amount of code, and took the best, read it, learned from it, rewrote it, and built on top of it.
"So many people have written voxel stuff, that I should just copy them." He used github search, and searched for minecraft, filtered by javascript, and went through ALL 23 PAGES of projects! He cloned ALL the repos he found, and read the ones that seemed promising, cloned them, got them running, understood how they worked.
A lot of them were the classic genius programmer projects, really impressive visually, super hard to understand, a giant lib folder with 50 files, everybody writing their own 3d engine.
Then he found out about three.js, and learned that, and combined all the stuff he had seen on top of it, including a PhD project in computational geometry that showed how to efficiently implement minecraft with three.js, for removing interior faces, etc.
So he learned from and built on top of all that great stuff, and made voxel.js and an insane amount of demos. Now the community has written a whole bunch of nice modular node.js npm modules and demos, that browserify can combine them together into a package that runs in the browser.
My only trivial beef with it is that their style guide says not to use trailing semicolons! That makes emacs very irritated and it breaks out in a rash.
But other than that, the code is very clean and modular and comprehensible, and opened my mind to a lot of stuff that I didn't realize was possible.
Just pick one and force yourself to use it to the exclusion of other editors. Future you will thank you later, because you'll still be using it 20 years from now. "We are typists first, programmers second" comes to mind. You need to be able to move chunks of code around, substitute things with regexes, use marks, use editor macros, etc.
https://www.tarsnap.com/download.html How to write C. Study the "meta," that is, the choice of how the codebase is structured and the ruthless attention to detail. Pay attention to how functions are commented, both in the body of the function and in the prototypes. Use doxygen to help you navigate the codebase. Bonus: that'll teach you how to use doxygen to navigate a codebase.
You're not studying Arc to learn Arc. You're studying Arc to learn how to implement Arc. You'll learn the power of anaphoric macros. You'll learn the innards of Racket.
Questions to ask yourself: Why did Racket as a platform make it easier to implement Arc than, say, C/Golang/Ruby/Python? Now pick one of those and ask yourself: what would be required in order to implement Arc on that platform? For example, if you say "C," a partial answer would be "I'd have to write my own garbage collector," whereas for Golang or Lua that wouldn't be the case.
The enlightenment experience you want out of this self-study is realizing that it's very difficult to express the ideas embodied in the Arc codebase any more succinctly without sacrificing its power and flexibility.
Now implement the four 6.824 labs in Arc. No, I'm not kidding. I've done it. It won't take you very long at this point. You'll need to read the RPC section of Golang's standard library and understand how it works, then port those ideas to Arc. Don't worry about making it nice; just make it work. Port the lab's unit tests to Arc, then ensure your Arc version passes those tests. The performance is actually not too bad: the Arc version runs only a few times slower than the Golang version if I remember correctly.
== Matasano crypto challenges ==
http://www.matasano.com/articles/crypto-challenges/ Just trust me on this one. They're cool and fun and funny. If you've ever wanted to figure out how to steal encrypted song lyrics from the 70's, look no further.
== Misc ==
(This isn't programming, just useful or interesting.)
Don't fall in love with studying theory. Practice. Do what you want; do what interests you. Find new things that interest you. Push yourself. Do not identify yourself as "an X programmer," or as anything else. Don't get caught up in debates about what's better; instead explore what's possible.
> How to write C. Study the "meta," that is, the choice of how the
> codebase is structured and the ruthless attention to detail. Pay
> attention to how functions are commented, both in the body of the
> function and in the prototypes.
I just had another look at the tarsnap source code, and while I know
Percival is a great guy, and I can't imagine him suing over "mis-use",
the bulk of the code is under a pretty restrictive lisence:
"Redistribution and use in source and binary forms, without modification,
is permitted for the sole purpose of using the "tarsnap" backup service
provided by Colin Percival."
That is, except the code under "libcperciva" which appears to be under a
traditional 2-clause BSD license.
So, for example using the bsdtar.c file to teach yourself how to handle
command line arguments might be a bit dicey, as it's entirely unclear
which part of that file is under the BSD license, and which part you're
not allowed to distribute.
It's one of the reasons why I'd which Percival made the entire thing
available under a dual license (eg: BSD) and simply required people to
use the official client for the tarsnap service.
Then again, I'm not used to audit closed source software, and therefore
probably extra scared of what might happen if I accidentially learn
something from reading said source... ;-)
On the other hand, Percival does publish quite a lot of stuff that's
[ed: entirely free], such as spiped https://www.tarsnap.com/spiped.html .
Actually, it's an important point. One way to learn a given coding technique is to copy-paste it into your own project, modify it until it works, and then delete it and re-implement it yourself. (The "delete and then reimplement it yourself" is the important part. Don't just copy-paste code!) That way you can build up an understanding of the code as you go, without having to understand it in its entirety from scratch. Understanding it in its entirety from scratch is the better way, because it gives you a deeper understanding, but people learn in different ways.
Anyway, illegal is illegal, and copy-pasting from Tarsnap would be illegal. I didn't know Tarsnap was under a restrictive license. It's a shame that people will have to be so careful when learning from Tarsnap, as it's a paragon of modern C best practices, but maybe the license is shrewd.
Though I wish the mentality of "my competitors might steal the code!" will die its deserved death, since evidence thus far suggests it's just paranoia. For example, the codecombat guys open sourced everything and have been fine: https://news.ycombinator.com/item?id=7015126 (But of course that's easy for me to say when I have no competitors to worry about!)
Though I wish the mentality of "my competitors might steal the code!" will die its deserved death, since evidence thus far suggests it's just paranoia.
Well, Tarsnap is online backup for the truly paranoid... ;-)
Seriously though: I never thought this was likely to be a problem, but given that I was spending a significant chunk of my life on this, I wanted to eliminate obvious risks to my livelihood. I like not starving.
But I've done my best to put the "reusable for a purpose other than building Tarsnap" code into the separate libcperciva tree. And I'm sure that merely reading code to learn from it is not a copyright violation -- there are no laws about copying information into your brain, thankfully. So please, read the code, and email me if you see any bugs; I wouldn't be offering bug bounties if I didn't want people to look at the code!
I suspect that your suggested use of the code would fall under the fair use statute in 17 U.S.C. § 107 and the guidelines from Folsom v. Marsh since it seems explicitly for pedagogical or research purposes.
IANAL, but if this were me it seems safe enough that I'd be willing to do it and fight Colin in court if need be. :)
When I first started programming, a lot of the licensing rhetoric seemed like much ado about nothing. "Look at all these people squabbling about who's allowed to do what with text! It's text! Sheesh!"
But one quickly realizes that the concept of software license restrictions is a fundamental reason for the strength and momentum of open source software. Therefore, it must be true that a central tenant of being a good developer is to respect software licenses. Not merely respect them to the letter of how they're written, but to respect their spirit as well. If an author wishes you don't do something with their code, then you don't do it. There are plenty of reasons why this should be the case, but the most compelling for me is that it'd be lame to ignore the author's wishes while using their work, even if the author won't ever know about it.
If people are still feeling tempted to lift code, well... Remember that you only get to destroy your reputation once.
>Do not identify yourself as "an X programmer," or as anything else.
Yet every single programming job post these days is basically looking for someone with X years of experience in a long enumeration of specific technologies. The usual reason being "They need to be productive. Now."
The rest of us just say that we are an 'X' programmer when applying for a job that requires the 'X' programming language, and then during the interview mention that we also know a bunch of other languages and know how to use the correct tool for a problem ;)
Not at first. But work with people you like and respect, and learn from them. Then ask them when looking for your next job.
Any place worth working isn't looking for the absolute best candidate for X at any cost. A good fit with someone with a good work ethic who wants to learn always works out better than just raw expertise. And a reference from someone you both respect is the fastest most reliable place to find them.
Whenever I hear this, I find I don't recognize the world the poster lives in. Is it a thing to tell coworkers when you're looking for a new job? If not, how many jobs do you need to have and leave before you have a big enough network for this to be a viable strategy? How do you get those jobs?
It really depends on the co-worker and the culture of the company. Sometimes I would tell co-workers, sometimes I wouldn't say anything because I was concerned they would leak that to management and I would be asked to leave before I was ready.
I have found user groups meetings a great place to network though. Everyone is helpful when someone says they are looking, even if it is that persons first time at a meeting.
> Any place worth working isn't looking for the absolute best candidate for X at any cost
That's so true. If they absolutely need the best candidate, they are about to run a very risky project.
Good management means, among others, to ensure that new people have the opportunity to learn and to "get into" the project. If management requires the "best" developers, it really means that the management is bad and the developers are supposed to make up for that.
(Having said that, there are also people who are simply a sunken cost. That is, managing them requires more resources than the value they produce. However, that's the other end of the scale, and a separate topic.)
When you're reading this code, be aware that it's a bit of a mongrel -- the code in /libarchive/ and most of /tar/foo (but not /tar/foo/foo) is from the libarchive project and is not mine. So if you're looking to learn "how Colin codes", look at /libcperciva/, /lib/, and /tar/foo/.
s/foo/*/ in the above. Autoitalicization breaks path globs...
> There are at last count something like twelve thousand people who have reached out to us for our free crypto challenges [...] Every damn one of those people is an email exchange that me, Sean, or Marcin had to have directly, on our own time, with no compensation.
Maybe the first one or two challenges could be available online, and requests should be accompanied by the answers to those. Maybe that would reduce the volume somewhat.
Response time for me on getting sets 1-3 has ranged up to months.
It got better when I remembered we're a Matasano client so I shouldn't feel so guilty about harassing them. But it's an awesome free thing they're doing, so I still do feel a little guilty.
I meant specifically the "discussions", as mentioned in the 'course information' page:
>>> "Please use Piazza to discuss labs, lectures and papers. We will look at Piazza regularly and answer questions (unless one of you answers first); the entire class can see and benefit from these exchanges."
I don't think you'll be able to access these. My university uses Piazza as well and you need a university .edu email address to get access to the Piazza boards.
Fortunately, if the way MIT uses Piazza is anything like the way my school does you aren't missing much. Usually people just use Piazza to clarify something that was glossed over in lecture or to get help on some aspect of the homework.
Wow, if you done all that, I want to see your code :) Particularly the distributed systems in Arc. Mind sharing a link to any code you have online (even if it's not that)?
I took SICP a long time ago... all the pure functional stuff was great and elegant. I never really got how you can program "real world" / stateful stuff like distributed systems in Lisp dialects. When I look at Lisp code with hash tables and so forth it just looks horrible to me.
Well, the point is to self-study, so I'm reluctant to share my code because people will wind up crippling themselves if they fall back on whatever solutions I came up with. If they read my code before attempting their own solution, then they risk overlooking a simpler solution or one better suited for their own purposes. But... I guess I'll share a side-by-side go/arc comparison of lab 1. Lab 1 itself is pretty simple; most of the work was to port some of go's primitives to arc, like channels.
(Note: These html pages are rendered incorrectly on mobile. In particular, the spacing is wrong due to vimscreenshot not inserting instead of actual spaces. Sorry about that. Try viewing these links from a desktop computer instead.)
I was interested in seeing how much code can be eliminated by using Arc instead of Go. Turns out: quite a lot. The server is almost 200 lines of Go code, but about 65 of Arc. The client is almost 120 lines of Go, about 35 of Arc. The unit tests are pretty similar in length (~480 vs ~410) but the Arc version is easier for me to read since there's less visual noise.
OT question, what's up with programmers and "Zen and the Art of Motorcycle"? I usually finish all the books I start reading and this was one of the exceptions. After an idea saying something like "we can't define quality ergo everything is quality" or something of the sorts I had to put it down, nothing made any sense and it all seemed to be a grandiose philosophical scheme based on random half-baked ideas.
I think you're interested in PuTTY and that's reason enough to study it. :) Copy what you like, discard what you don't. But do whatever is fun. I wish I'd emphasized fun more, since having fun is one of the best ways to maintain motivation and interest.
I never seem to get enough motivation to read deeply into random "grand" code bases like Lua or SQLLite, but some months ago I got into the habit of always studying a bunch of projects that use a given technology before I use this technology, and it greatly decreased the amount of time it takes me to get to a "idiomatic" coding style. So instead of diving in a random, I would recommend making researching existing code-bases related to what you are currently doing an integral part of your workflow.