Hacker News new | past | comments | ask | show | jobs | submit login
You Don't Read Code, You Explore It (dadgum.com)
158 points by doty on April 18, 2014 | hide | past | favorite | 56 comments



It is an interesting observation. I expect it is the difference in expertise though. The author compares reading code snippets in Dr. Dobbs (which dates them probably 10 or 15 years ago) to reading stories.

Here is the thing, everything you read in a story is supposed to convey imagery of things like what you may have already experienced, they are already internalized, you "see" them when you read as if you were there.

Allow me to use another popular space as an example, music. When you are first reading sheet music, you see notes on a stave, key signatures, different shapes representing different durations. At first you mechanically take that understanding and laboriously turn it into actions on your instrument. But after a while, if you do it enough, the shapes become recognizable as rhythms, the tones in the staves become tones not symbols, and then you stop "reading" music, you look at it and you can hear what it will sound like. And by that time you can make your instrument do what ever you hear.

Coding is not entirely different, at some point you don't see syntax, you see algorithm, you see inter-relationships of data structures, you see flow. After a number of years of coding I got to the point where I could see what code was doing pretty easily (except for obfuscated code which is always jarring on first look). I stop seeing code loops and start seeing iterative processing, if statements are branches on a path.

Anything in words or symbols, is code for something else. Whether its a murder mystery, a symphony, or a sorting algorithm the words and symbols are there to express the idea inside your head you can understand it, I think it is all reading though :-)


Code is, however, considerably more tree-like than stories or sheet music. Sheet music has something resembling minimal control structures, to be sure, and there are examples of kinds of stories that are more non-linear than usual (and are usually considered mind-benders for that reason).

I think that's what the GP is getting at. Even with domain knowledge, really understanding a code almost always requires diverging into other parts of the 'text'. You can defer it if your thought process is structured enough, but you still have to do it eventually.


Sheet music sounds like assembly.


Yes, once you know more about a domain, you see larger, more organized chunks when you look at data in that domain. There were studies that showed how amateur Chess player see individual pieces where Chess masters see piece-arrangements. Etc.

I would still say that programs are quite different from any natural language text in that they require a sustained effort to learn "what really happens here" as opposed to "what general kinds of things are being done in the different parts of the program".

A murder mystery is a genre where "one little detail" often is where the whole direction of the story is said to go. But that's usual a single discreet gotcha, added for theater. A program often consists entirely of such things, added not for theater but because computation can't help but work that way.


Have you learned Haskell yet? I think you'd find it's a very rich way to see and think about code if you did.

Starter guide here for any interested: https://gist.github.com/bitemyapp/8739525


I've got the book 'Learn you a Haskell for great good' and I've gone through it. But to be honest I've I have always resonated more with structural languages than functional ones.


LYAH is slow and doesn't communicate the compelling parts of Haskell at all. There is no functional/structural dichotomy. Structural isn't a category of programming languages. There are no programs without structure. You may be mistaking syntax ( {} blocks vs. s-expr ) for having something to do with semantics.

It's also possible somebody has told you Haskell is a declarative language a la Prolog. It is not.

Try this: https://gist.github.com/bitemyapp/8739525

Specifically: http://www.seas.upenn.edu/~cis194/lectures.html


I think it goes for any FP/declarative language, especially Lisp.


Nope. I'm an ex-Common Lisp and Clojure user. Especially prolific in Clojure at work and in open source.

Do not pass go, do not collect $200, go straight to Haskell.


This feels very much like the original motivation for design patterns - raising the level of communication and recognition. It's shame that patterns were largely sneered at as stating the obvious, when that was sort of the point.


The difference is you don't read sheet music to look for errors (tonal imbalance, wrong notes, what have you), you play it and listen if it sounds "right".

Much the same you can't read code (code review) to look for errors, you'll miss most. You need to run and test the code to really find the errors.

I think that is the take away from the OP.


I've pretty much come to the conclusion that you don't understand code by reading it. Understanding someone else's code is almost always a reverse engineering exercise. It's often necessary to actually run the code repeatedly to understand it. This should really come as no surprise. It's likely that the guy that wrote the code didn't write it all at once, but rather wrote it in increments testing along the way. If he couldn't write it without testing every little bit, it's unlikely that you can read it without doing the same.


Good observation and I completely agree. On a slight tangent I've been spending time thinking if I had the ability to change the high school curriculum, how would I do it? So far the thought that has had the most appeal to me is to introduce "reverse engineering" as a core subject. I don't mean reverse engineering in the software or hardware sense, I mean it in the sense of a collection of logical techniques for figuring out the logic behind something phenomena or problems. In some sense "forward engineering" requires reverse engineering from the conceptual finish of a product to its current nonexistent state. Science is the reverse engineering of the laws of nature. But rather than have schools teach any of those skills, they teach encyclopedic knowledge. Knowledge is useful, but without the skills to analyse and use it, it's just wasted effort and brain space.


I think it's entirely possible to write readable code, i.e. code that you don't have to single-step through to understand. But that takes a conscious effort, and is in any case more difficult and more work than writing code that is merely correct.


You Don't Write Code, You Develop It.


Completely agree to this! One very rarely comes across a completely self contained piece of code such as sorting or searching. The code you are reading or exploring is usually a piece of a larger scheme of things; held together by scaffolding so to say. And unless one actually runs the system (not just that specific code) end to end it's really difficult to understand.

I've also found it very useful to slightly modify the code and compare my expectation with the actual result of a test run.


And this is one of the main reasons I massively prefer type annotations, statically analyzable languages and IDEs over dynamically typed languages, very loose languages and plain text editors.

Not being able to see what types code deals in, jump to definition, and find usages makes me feel crippled when exploring a new codebase.

One of my wished for programmers everywhere is that tools like Github and BitBucket start analyzing projects and letting you navigate better. I think that could save thousands of engineer hours.

This topic alone is a huge reason I started using Dart and eventually joined the team. Trying to figure out how a very large JavaScript codebase works is so incredibly painful, doing so in Dart is incredibly easy. This is also where very reflective libraries like Guice go wrong, and why, in my opinion, meta-programming should be used very carefully and sparingly.


Code navigation is hard in plain text editors, but definitely doable. I think the cost of setting up code navigation and jump-to-definition in Emacs is lower than the cost of dealing with how slow and bloated IDEs are. But, hey, to each his own.

Although I think Light Table is trying to be like what you're saying, incredibly liberating in it's ability for code navigation. But it's still young.

As for type annotation and static analysis ... you don't need to have static analysis for what you're talking about. Yes, static analysis works best for static languages, but you can have code auditing in dynamic languages. An auditor can dig through and profile your stuff while tests are running and learn about interactions between various entities in your code. And if you're doing anything in a dynamic language properly you're doing a lot of tests.


Having to analyze a running program increases the work and complexity for index code by an _incredible_ amount. When you can simply design a language in such a way as to be analyzable in the first place, I don't see why you'd want to require so much more work out of your tools, and especially your tool writers.

Take my wish for Github to be more navigable. In a statically analyzable language they can run the analyzer and update their index on every commit. If they have to run tests then they have to basically add an entire continuous integration service which is a _lot_ more work, resources and security risk. Why make it harder than necessary?


You're right. I guess I'm just a huge sucker for the whole dynamic languages with a REPL thing. I mean, Ruby is just so goddamn neat. The stuff that takes me 2 lines in Ruby can take me 10-15 in Java or C, and the difference only gets bigger as the program gets larger.

Maybe I should try out static languages "done right" (type inference, etc.) before I give up on them.


One of my ideas I've been kicking around for the last year or two has been to write a book on learning to read code properly. I would submit that one of the greatest weaknesses of your typical programmer is that they don't know how to read other peoples' code.

On another note, being able to read other peoples' code is one of the strengths of Haskell. The Functor/Monad/Monoid/etc stuff becomes a way to know, based on a common vernacular, exactly what kind of interface is being exposed and what sort of data structures you're working with.


I know one book about code reading: "Code Reading: The Open Source Perspective". I have just read the first chapters but it's mostly C/C++ I think.


I've read a lot of code (both source and otherwise) and found that in many cases one of the biggest barriers to understanding the system as a whole is actually abstraction/indirection -- or more precisely, the often-excessive use of such. Following execution across multi-level-deep call chains that span many different files in different subdirectories feels almost like obfuscation, and all the pieces that are required to understand what happens in a particular case (important when e.g. looking for a bug) are scattered thinly over the whole system such that it takes significant effort to collect them all together.

In my experience, the majority of existing codebases I've worked with tends to be this way, although there are exceptions where everything is so simple and straightforwardly written that reading them is almost an enlightening experience.


Program comprehension seems like a fascinating research field if you enjoy figuring out why we do the things we do as software developers. It seems that we do indeed explore code, but we do so in somewhat systematic and predictable ways, which in turn might give us ideas about how to write our code to be more readable.

If anyone is interested, I suggest the publications of Anneliese von Mayrhauser and A. Marie Vans from the mid-90s as a possible starting point. They did a lot of work to reconcile earlier theories and paint a more unified big picture. Václav Rajlich is another name to search for, with several interesting publications in the early 2000s.


This is related to something that I pondered about yesterday. In main stream languages, imperative programs require you to read the whole thing, while functional programs allow you to look at a function and derive its meaning only from its implementation: there can't be another influence on the outcome, as there are no side effects and types are -- in ML/Haskell family members -- precise. So referential transparency, in a sense helps you to read code depth-first, only looking at the relevant bits now.

As soon as you branch off from basic type systems and add in, for instance, subtyping or type classes or existential quantification, this ability is getting weakened. Now, in order to understand what a piece of code does, you need to understand some context: 'which implementation is it?', 'what do the possible implementations have in common?'. The effects of this small complexities add up until there's too much information to be kept in biological memory at the same time, the oldest thunk of information gets purged.

I do think that typed and pure languages have big advantages here: the information that I need is available immediately from looking at the type of the reference to it. If the types aren't funky, I can assume that it will terminate, throw no exception, not suffer from data races -- I can exclusively think about what it does, not how (btw, I'm not having one specific reference language in mind right now, I'm just thinking about what would be possible).


Here's a challenge for somebody: create a code review/pull request tool that helps you really understand the code changes. Some IDEs do an ok job of this for static code (IntelliJ for Java, Emacs haskell-mode), but I've never seen a good tool for giving insight into how a diff is changing a program at the structural level.


Might I suggest looking at https://sourcegraph.com? Currently it's just code search, and doesn't do code review. But it does understand the code at a function/class/module level. And that makes it fundamentally different from a lot of other code reading tools out there that just treat code as text. For code search, that means it can do things like show you how other people actually call a given function. For code review, it would make it possible to reason about changes to functions/classes/packages instead of lines of text. (Disclaimer: I'm one of the creators of Sourcegraph; would love to hear your feedback)


"For code search, that means it can do things like show you how other people actually call a given function."

i just tried it and that is a pretty big claim.

e.g. https://sourcegraph.com/code.google.com/p/go/symbols/go/code...

Am i wrong or isn't this pretty much what Intellij calls "Usage"? At least I don't see the difference, could you please explain it?

general feedback: the website is too slow. it takes me a couple of seconds to load each page, rendering the page pretty much useless.


Thanks for the feedback; we're working on site speed as a top priority. The examples are similar to the usages feature from your IDE, but 1) the examples are drawn from all the open-source code we've indexed online and 2) we support non-statically-typed languages like Python, JS, and Ruby.


Instead of commit messages, commit line-annotations.


I think for most codebases, this is far too granular and would take far too much time to read and write to be of any practical benefit.


Part of my difficulty reading large code bases is there's often no particularly good entry point to start from. Towards the end of my time in OO languages I was struck by Reenskaug and Coplien's DCI [1] - many people just don't model a system's use cases as first class concepts. The most important part of a system - what it actually does! - doesn't get explicitly mentioned in a lot of people's code.

[1] http://en.wikipedia.org/wiki/Data,_context_and_interaction


I rewrite code to explore and understand it. This actually started as a bad habit - "Oh, I can't believe this person did X, I'm going to change that to Y". Do that enough and you quickly figure out exactly why they did X, and they were either smarter than you or much more familiar with the problem domain. But now you know a little more, you have a little more respect for the code, a little more humility, etc..

Nowadays I take it as a given that I'm probably wrong, but I start rewriting anyway. Worst case (and most common case) I have to toss the code. But I learn. Plus there's a different place your brain goes when you feel like you control the code vs. looking at it behind glass.


I see rewriting code to understand it as basically the ultimate form of taking notes while reading something, since you can run your "notes" and verify that they're actually correct, and you can't get away with just glossing over things you don't get.


This is exactly how I work as well. My feeling is the only way to truely understand a piece of code is to try solve the problem yourself and not simply read someone else's solution. But, half the battle is then understanding _every problem the code is solving_. Often this seems undocumented. Further, even the simplest high level problems often have a bunch of corner cases that are only documented in a solution's implementation, and it's on encountering these corner cases in your own solution that it becomes obvious why the code was written in its original manner. Coming to a full understanding of the problem and solution often leaves me happy to throw away my own attempt. Sometimes I'll simply end up adding some comments to the code about the problems it's solving.


I'm baffled by the absence of tools for diagram generation from code and things like contextual highlighting in, well, every code editor. Even code folding seems pretty primitive. Bret victor & the LightTable team are proposing some significant innovations, but most IDEs make me feel like I'm trying to explore a room through a keyhole.


I find it surprising that in 2014, most of the tools we use to read code treat it mostly like any other text. With the exception of some IDEs for some languages (e.g., Eclipse and Java), very few apps for browsing code actually understand its structure, i.e., the hierarchy of symbols and namespaces and the implicit graph defined by module imports, function calls, type references, etc. We're relying more and more on external, often open-source libraries, and are therefore spending more and more of our time reading through other people's code. Yet the tools don't seem to have caught up.


Code is easy to read if you understand the domain.

Code is nearly impossible to read if you don't understand the domain.


My CS teacher has a similar saying: "Easy is what you know"


Yup. This is really all that needs to be said. We need tools that can help bridge this gap. Good code visualization could help here. I'm surprised there's such little activity in this area.


My technique for reading code is to find out where execution starts and go from there, following what the code does at runtime.

If you do it any other way, it won't necessarily make sense. This is really the only way to do it. (Though I'd be interested in hearing other perspectives.)

This was a hard-won lesson for me because we programmers tend to make the control flow of our programs start at the bottom of source files.


There are many things that you read that are neither enjoyable, nor easy to understand. Especially at a cursory read. That doesn't make the word less appropriate, nor does it make the word explore any more appropriate. I don't explore a quantum physics textbook, nor do I explore a journal article on tubulins. I read, I jot down notes, and I read some more.


Don't professional readers (i.e. writers, lit critics, or just close readers) also read literature differently than casual readers?


I'm not in any of those professions, so I can't say for sure, but at the very least I assume they understand the medium itself at such a deep level they automatically break down any story they read to different stages. For example: the deeper, structural level, the broader social context the work was created in, the literary context (other works explicitly and implicitly referenced), and so foth.

I know something like that happens with me now when I look at art after going through art school - Micheal Parson's "Talk about a painting: A cognitive developmental analysis" is an excellent paper on the topic. AFAIK it's only available behind a paywall though:

http://www.jstor.org/discover/10.2307/3332812?uid=3738736&ui...


I like Tim Daly's viewpoint that we shouldn't just be writing code, but we should write manuals - giving high level overviews of our problems and specifying their implementation details inside the manuals. His talk "Literate Programming in the Large"[1] covers why.

The example he gives in his talk is the Axiom algebra system[2], which was revised to use literate programming style - the source code, with usage examples, is contained entirely within the books.

[1]:https://www.youtube.com/watch?v=Av0PQDVTP4A, [sldes]: http://daly.axiom-developer.org/TimothyDaly_files/publicatio...

[2]:https://en.wikipedia.org/wiki/Axiom_%28computer_algebra_syst...


I wrote this a couple of years ago to help read and explore code:

http://sherlockcode.com/demos/jquery/

It's recently seen a spike of interest and I've started working on it again. The beta sign up link is still active if you are interested in getting updates.


Yes, yes, yes! I agree wholeheartedly. Exploratory programming is my new favourite weapon for learning about new codebases, new languages, new everything.

I just started a new job at a really interesting agency. I got put on to a 12 month old project, a huge web application, that started life overseas, moved back here to Australia, and according to git-blame has then moved through the hands of nearly 15 developers, a solid 70% don't work here anymore (most were contractors).

So, the codebase is a mess. But, with Xdebug and a neat client for it that gives an interactive console when you hit a breakpoint, two weeks later I'm already understanding the twists and turns far better than I ever hoped for!


I agree, I think we need better tools for exploring code. [1]

I'm currently envisioning (and trying to build) something where the types/func definitions are hyperlinks, and they jump to definition in an overlaying window similar to when you navigate in Spotify (the web based player). So you can quickly explore something without losing context.

[1] https://twitter.com/shurcooL/status/156526541214457856


RapGenius for reading code?


I've heard a lot about it, but I never knew what it was. I just looked it up. Pretty neat. Yeah, definitely a similar idea to some degree.


For him has been in the field more than thirty years, this author speaks too easy about some important figures. I stop 'reading' or 'exploring' and start writing —all the CSworld is a scene and MS and Int, merely players. However, I downgrade the article.


I think there should be a university course that teaches how to read code; the good, the bad and the ugly.


You Don't Read Code, You Debug It.


The way to explore a bunch of codes is really dependent to how much time and effort one can pour into:

- To grep it

- To debug it

- To read over it

- To rewrite it


We know. It's just an expression.


The road to fail is paved in goto intentions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: