I don't think most researchers ever expect anybody to read their code. Woe to th...

ogrisel · on Nov 3, 2010

That must change. Science must be reproducible. Other researchers should be able to dive into each others code quickly to understand the impact of implementation details.

Avshalom · on Nov 3, 2010

Well yes and no, if they're doing their job right they describe the method in such a way that you don't need their code to reproduce their results.

Code should not be Documentation.

Further nobody trusts anybody's code anyway unless it's just a couple of trivial calls to a pre-vetted software package like IRAF, AIPS (to name some astronomy related one), or LAPACK. So generally they don't want your code. the exception is grad students trying to apply your old work to new data because they aren't in a position to be trusted with completely original research yet.

Yes it'd be nice if every one had great readable code and handed over the 2 terabyte data sets that it needs without batting an eye. but in practice code quality is pretty low on the ladder of "things that get in the way of collaboration"

njharman · on Nov 3, 2010

>Code should not be Documentation.

Code is for humans to read, that it compiles/interprets to a program is a side effect. Otherwise we'd all be passing around binaries (or byte encoded files) with our thick stacks of documentation.

mfukar · on Nov 3, 2010

What's your take on the multitude of software that you buy together with all the README files, Word or PDF documents describing how to use the software, what does it do, and all that jazz? Do we (humans) get to view all that code and see what Microsoft Office Word 2007 can do for us?

njharman · on Nov 9, 2010

> multitude of software that you buy

Why do you assume I buy any software?

The claim "code is for humans to read" does not logically lead to claim "code is the only thing for humans to read". There are different kinds of humans, programmers, maintainers, end-users, and idiots are some. You're a member of the later.

xiongchiamiov · on Nov 3, 2010

> Do we (humans) get to view all that code and see what Microsoft Office Word 2007 can do for us?

Well, we should be able to, but no, we can't, precisely because we don't get code - we get binaries.

mfukar · on Nov 4, 2010

The hypothetical universe in which 'code' is interchangeable with 'natural language' does not concern me because, as explained, we don't live in it.

Or maybe not just yet.

leot · on Nov 3, 2010

Python, unlike any other language I've dealt with, lends itself very nicely to producing stuff that's reusable and easy to understand. I chalk this up to

* lack of elitism in documentation (e.g. there are always plenty of examples)

* lack of elitism in conventions for code use: everything "just works", generally without any boilerplate

* installing libraries is a snap, and the whole module organization system is intuitive and elegant

* assumption that anything that's not a script is a library

* documentation conventions (doctests, e.g., are a nice stepping stone to good documentation _and_ code testing)

* the "there's only one way to do it" attitude

* large standard library

On the other topic: you are describing the way research works "today", which is actually pretty poorly (why, e.g., does all data need to be surrounded by so many words of introduction and discussion? why can't I just add something to someone else's work like I can add to an open source project?). This model of research will change, at one point or another, to resemble the much more efficient, effective, and fun, open source project model.

ogrisel · on Nov 3, 2010

If your system is complicated enough (which can be the case for complex machine learning or NLP algorithms), an 8 pages paper (common limit for many conferences) cannot describe all the implementation details but those implementation details might be very important to be able to reproduce the results.

Hence code should be both published, well documented and readable.

Avshalom · on Nov 3, 2010

Fair enough. When I think of scientific uses of python I think astronomy, atmospheric physics, finite element analysis and linear systems using existing techniques...

Existing techniques in general really. Fields where the interest is the data and the implications of the data. Fields like ML and NLP where the algorithm/technique is the thing of interest then yeah sure the code is important.

ogrisel · on Nov 3, 2010

I agree publishing datasets is very very important too. Often more important than code.