Hacker News new | past | comments | ask | show | jobs | submit login
Why Python rocks for research (washington.edu)
210 points by agconway on Nov 3, 2010 | hide | past | favorite | 98 comments



Python is awfully close to being a superior (and free) replacement for Matlab, but there are a few annoyances that keep preventing me from switching forever. Unfortunately, these are mostly not bugs but bad design that is believed to be correct by the core developers, so it is unlikely to ever change:

- Matrices are a pain. The r_[] and c_[] operators could be a reasonable replacement for Matlab's elegant matrix construction syntax, but they do not work as expected (as smart hstack and vstack), instead doing something completely different and inconsistent for vectors and matrices.

- Tensors are a bigger pain. Matlab has a very well-defined semantics for operations like permute and reshape; in NumPy these operations sometimes create just a view, at other times they reshuffle the memory contents. I know the idea was to "protect" the user from having to know the memory layout of data, but this idea is bad.

- Ipython is great in every way except when it comes to reloading parts of your program. After any tiny change to your code, the only safe thing to do is to quit ipython and start it again. All the other options (run, reset, reload...) make some secret and wrong assumptions on what you want to reload. In contrast, this works flawlessly in Matlab.


In the end, it's all about the ecosystem. Perl wins for bioinformatics because there are boatloads of scientists already using it, with all the neat libraries and resources that brings. Equally, Python wins for, say, prototyping in robotics because of libraries, support and so on.

There's nothing intrinsically science-apt about Python/Perl, but Ruby and friends can't compete when it comes to the programming environment; that's what counts.


As a language I prefer Ruby but the Python ecosystem for this kind of thing is definitely a huge advantage. You can actually do quite a bit with Ruby + GSL but it's still not really competitive.


Yep, same with me. I do prefer Ruby, but it's just not feasible at the moment. Languages that get behind in a certain area enter a terribly vicious circle: There's no scientific community backing Ruby, so why would I develop something that'd make Ruby more competitive?


I'm in the same boat and it seems like several others as well. I'm fine moving to Python for the moment, but I wonder how the Ruby community will ever know if there's enough of a demand for scientific tools to merit their development?


I prefer Ruby generally, but honestly Ruby and Python are sufficiently similar that I'd rather smart programmers put their time to good use doing something other than reimplementing Python's science libs.

I think that making some tools available in a totally different language (maybe something functional) would be much more useful, because it would allow for a very different approach to the problem if needed.

In a perfect world, we could also have the option of using light wrappers around OpenCL matrix libraries, and push the linear algebra to GPUs that eat matrices for breakfast.


Agreed. There's only so much skilled labor available for this kind of thing.

This is exactly why I'm focusing on Python lately. Ruby is a great language but I don't want to be pigeon-holed as a web guy forever. I've already done over 10 years of web dev and I'd like to try out a couple of new problem domains before I kick the bucket.


I'll have to agree as well. The only problem for me is: if not on the web, how are we going to make GUIs that aren't severely limited to our platform? Wasn't web design supposed to solve the "platform question"?


There are a number of cross-platform GUI toolkits. For instance, QT is pretty nice wherever you put it, and KDE's been putting a lot of work into making their libraries and such work on Windows.


I'm not a huge fan of Qt simply because I can't get it to feel natural on Gnome, my desktop of choice. I've looked into Gtk+, but I'm not so sure I want to commit to it yet.

Are there plans to make Qt feel more natural on Gnome and OS X?


Python is very accessible to casual programmers, so this might be one of the reasons it has been adopted by the scientific community.


This article makes some fairly convincing arguments that Python is a more flexible tool than Matlab or Perl, but I can't help but come away with the sense that the author hasn't tried many other languages.

There are an awful lot of languages that provide iterators, a powerful set of data structures, extensive libraries and facilities for structuring and maintaining large codebases. .Net languages (maybe F# would be good for this?), Java or most of the emerging languages for the JVM stack, Ruby (which is generally considered to be "different but equivalent" to Python), and so forth.


I only scanned the article very briefly, but my impression is that the important comparison is vs. Matlab. The other players aren't really in the author's game. It's a question of the use case and the community and the library support.

In theory, .Net could displace Matlab or Python as the canonical platform for scientific researchers. And in theory Python could displace PHP as the canonical platform for classic CRUD web apps. In practice neither is likely to happen, no matter how much we might or might not wish it to.


I've been a Matlab user for 10 yrs, recently switched to python because of the increased flexibility. never going back to Matlab ever again. Python math and science libs can be a little rough around the edges, but the flexibility more than makes up for it for me.


I'm a Matlab user myself, but I try to use Ruby whenever possible if for nothing else but to just get away from all the god damned matrices. I didn't realize Python had similar capabilities, I look forward to trying it out. But does Python have "Index exceeds matrix dimensions." errors? I just can't imagine life without seeing a few of those every day. That said, the workspace is really handy, is there some equivalent GUI in Python?


Check out EPD: http://www.enthought.com/products/, particularly the ETS framework. It contains almost all the popular scientific/numeric libraries in python and is free for academic use (and much of it is open-source).

(disclaimer - i work for Enthought).


As infinite8s points out, EPD from enthought is really quite nice.

Personally, I rather like Python XY ( http://www.pythonxy.com/ )though. It is totally free, open source, and for small little scripts I am a huge fan of the Spyder IDE. It lacks some of the features of bulkier IDEs so I also use Eclipse from time to time. But Spyder is light and much faster than Eclipse with every feature I would want when working on small projects that can be contained in just one or two files.


When I use certain tools, I just want them to be popular enough and couldn't care less about the "canonical" way of doing things.


It's all about Numpy. Mayavi, IPython, mlab & friends are great for when I need to plot things or look at data, but Numpy is the workhorse I keep coming back to day in and day out. And also, the thing I wish for most when I have to use other languages. The combination of the speed of C and the elegance of, well, anything that is not C, is hard to beat. Once you get down the basics of array broadcasting, types, etc., it's possible to do some amazingly elegant things in Numpy, and quickly too. The numpy library has seemingly every array function I have ever wanted. If I were Matlab I'd be scared :-)


> There are an awful lot of languages that provide ...

Still, how many of them have a fast interactive interpreter ("command line") with a decent usability? How many of those provide good libraries for numerical as well as symbolic math? With an API that is easy to write, to understand and to extend?

Python may not be the only language with those qualities, but there aren't many languages (and ecosystems around them) which can compete on all those areas.

Python seems to be one one the few "best fits" for scientific applications.


There are Ocaml, F# and Clojure with some combination of great tools, speed (clojure addressed this recently I think but i only have visual experience with clojure), light syntax, books and documentation, repl,excellent platform, wide libarary choice and or decent interop with C. Also F# is doing some really cool stuff to do with datasets awareness in the language.

Haskell seems a perfect fit for mathematical use and while I haven't used it in a couple years, I would hesitate to suggest it due to a lack of mature library options, difficulty of FFI and perhaps a steep initial curve.

Scala is a good language for an entire application but provides too much scaffolding for scientific applications.

R is fairly widely used but is also itself very quirky.


I'm a long time Python programmer and I am currently using it for scientific programming (for my PhD).

I've shopped around a lot for alternatives to Python and looked at Ocaml, F#, Boo, Clojure, Common Lisp, Haskell and Scala.

There are a couple of things still keeping me with Python:

* The REPL, especially IPython,

* Numpy & Scipy,

* Networkx & igraph,

* jpype for fairly seamless JVM integration (this way, I can interact with Cytoscape), rpy for fairly seamless R integration,

* ZODB,

* IPython's parallel processing framework.

The .net environment probably comes the closest to providing everything here, but the REPLs need a lot of work and the graph libraries have more complicated interfaces which make them a pain to use on the command line.

Python is not without its warts but when I recently had to spend time with Matlab again after a few years, I was reminded of just how nice the Python ecosystem is in comparison.

Update: I should add that if you rely on one of Matlab's toolboxes, you might not find any decent alternatives outside of Matlab. You can always use a bridge like MLabWrap to access Matlab from Python.


I am with you on Haskell.

It counts on Freedom, Readability, Documentation System (including lhs2tex which will turn your "integrate f 0 a" into \Int_0^a{f dx}), High-level vs low-level, Standard library (including hackage/cabal), Data structures, Module system, Calling syntax, Default arguments (currying), Multiple programming paradigms (there was a saying that Haskell is the best imperative language). It partially counts on most other points.

Myself, I choose Haskell for my research project, as it was best language on (expressiveness times safety) scale. Strong type system certainly helps sweeping out errors.


I have been using ocaml for about 5 years and earning moneys with it for the last 2 years. I love the language but for scientific work its hard to beat python. Ocaml just doesn't have the libraries or the community support to even be on the radar.


SciPy, NumPy, Matplotlib, Cython, Sphinx, PIL. The article should've also mentioned NLTK.

IMHO, languages matter less than the available libraries, and in my experience only Java matches the depth of the Python ecosystem.

That Python is a nice language to work with, that's just a bonus.


I agree about Python's libraries. Normally, I'm a Ruby programmer, because I like Ruby's syntax and the Rack/Rails/Sinatra web stack.

But if I need to do scientific or linguistic programming, Python is absolutely amazing. SciPy, NumPy, Matplotlib and NLTK support a rich and deep ecosystem. And it's vastly nicer programming environment than Matlab and Octave.

(Of course, GNU R is also pretty useful if you're doing pure statistics.)


Which is why we use Python. We're a web app (Django) doing big data aggregation/processing (Numpy).


Same here. I used to do a lot of Python programming, but moved to Ruby because of Rails. When I had to work on a machine learning paper, I spent weeks looking for a equivalent to NumPy or Matplotlib in the Ruby world, but nothing comes even close!


Also, (basic) GPU operations with cudamat (or even finer, gnumpy) at your fingertips.


Do they all have equivalent libraries to python's scipy and matplotlib? I think that is why the author could move to python as these provide a pretty large subset of what matlab has and makes the transition less problematic.

Java probably does and .NET may have something like this but I don't know of them or their amount of documentation.


.NET has various things in this space, but I think a lot are commercial products. .NET is not used as much in academia, so most things are targeted at professionals. With that said, you can probably get free/cheap licenses for many/most of commercial products.

But I think Python probably has the most active community as its been used as glue for a long time in this space.


None of the languages you mention is easy enough to be grasped in a week from someone who never did programming in their life. Believe me: most of the people who suddenly has to do data analysis for their phd or postdoc projects hardly know how to use excel, so easy of use is essential.

Also, you want a non compiled language. Most of the time you do interactive programming and change parameters on the fly, according of the result of the analysis.

Finally, matplotlib, one of python's most complete graphic library, is a breeze to use. Making graphs in an interactive way with java or .net is simply impossible.

I am a neuroscientist and most people in the field use Matlab. I use python (in fact I use ONLY open source software, by choice). It's amazing how many advantages python gave me on my daily life.


actually, he doesn't make any assertions that Python is more flexible than Perl (which would be rather doubtful), only that it is more readable (which, as a perlista, I'm sad to say is probably true).

but I also get the sense that this is the first time he's seriously delved into a dynamic programming language. much of what he's saying about Python is exactly what bioinformaticists were saying about Perl in the late 90s / early naughts.


The thing about readability of Perl vs. Python ... I dunno. I don't think it's the syntax itself that causes Perl to have some readability issues. Stuff like `$foo =~ s/bar/baz/g` or `join '-', split( q{ }, $a_str)` is very readable.

Context and corner cases maybe makes readability suffer. For example, what does `m///g` do in scalar vs. list context? I don't recall.


you might say that Perl's lack of (obvious, first-pass) readability is a two-headed dog from hell:

- it's a (very) good thing, in that there's -a- -lot- of expressive power hiding in those corner cases (esp list v. scalar context). in fact perl is nearly unique in the degree to which it embraces, rather than seeks to quash the potential for nuance and flexibility to be found in the various eddies and whirlpools that lie "between" non-whitespace elements of code. this is something that goes to the deepest intellectual roots of the language (to be more like the way the human brain thinks, rather than the way machines think).

- it's a (very) bad thing, in that the same "power" for expressiveness just represents little more than an endless stream of banana peels to most first time users (understandably sending many of them running into the arms of the other major dynamic programming languages). and that it just makes it too damn easy to write (barely) functional, pothole-laden frontline scripts (such that Perl is partly responsible for "scripting" being such a dirty word, in many quarters).

another aspect people don't like to talk about its nearly coprophilic fondness for not just nuanced and context-sensitive, but intentionally obtuse syntax (the impossible to remember "special" variables such as $[, $;, $' etc being probably the worst examples).

this also (sadly) has a lot to do with the community's deference to the aesthetics of its original authors (and is also plenty understandable, given the extent to which unabashed ugliness -- as personified by makefiles, shell languages and macro-laden C and C++ -- ruled the day at the time).

unfortunately it also blinds a lot of Perl folk to the fact that beloved muse just happens to look awful darn cluttery (or worse) to many reasonably intelligent people who come from more "modern" programming backgrounds (or who at least came onto the scene comparatively recently).


And the corollary: Why do researchers never respect the PEP8 when they write python code?

Yes I am a bit overreacting since the blog post is very well written and I actually agree 100% with the content. But please people: respect the PEP8 [1]. It makes your readers feel at home while reading your code. It is very important if you want to get new contributors to your project. See [2] for instance.

[1] http://www.python.org/dev/peps/pep-0008/ [2] http://www.dataists.com/2010/10/whats-the-use-of-sharing-cod...


I wasn't aware of pep8 when I started, most science people arrive at python from a different path. What I mean is, for a long time I knew much more about numpy than about python itself.

there are some things in pep8 that are bad for science, the spaces around operations, and also the 80 chars to a line... scientific expressions are often long and complicated, yes you can do it while adhering to pep8, but its kind of a PITA


The 80 chars limit has a justification: expressions that don't fit on 80 chars (or two lines using parens) are not readable anyway. In such a case temporary variables with meaningful names would both help respect the 80 chars constraint and make the expression easier to understand by the reader.

Furthermore having 80 chars is great to have vertically split editors with the code on one panel and the tests or the documentation on the other panel.


> The 80 chars limit has a justification: expressions that don't fit on 80 chars (or two lines using parens) are not readable anyway

I disagree. There are many cases, especially after 2-3 levels of indentation, where 80 characters is an unreasonably narrow space. I don't have a strong preference for reading code within 80 characters. And I'd much rather comments use 80 characters plus indentation rather than worry about whether I've got my screens vertically split.

> Furthermore having 80 chars is great to have vertically split editors with the code on one panel and the tests or the documentation on the other panel.

Having lines here or there that go beyond 80 characters doesn't completely prevent you from doing this, and having an entire statement on a single line of code makes line-based tools like grep or kill-line more effective.

Limiting to 80 characters is a good idea, but it's easy to see that there are a few significant tradeoffs, and that someone trying to get something done is not going to want to bother.


> > The 80 chars limit has a justification: expressions that don't fit on 80 chars (or two lines using parens) are not readable anyway

> I disagree. There are many cases, especially after 2-3 levels of indentation, where 80 characters is an unreasonably narrow space.

Some people would say that after 2-3 levels of indentation you should be looking at refactoring your code. Probably to pull something into a separate function/method.


I don't think many people would say 2-3 levels is the threshold for refactoring in Python. 3 levels is 1 class, 1 def, and 1 other control structure. Then you have 68 characters left.


I'm not saying there aren't reasons to have short lines.

But in science, we have longer and more complex expressions in general.

Our priorities are different.


Because researchers have never heard of pep8, and in general don't give a shit about domain specific politics unless it's their domain.


From PEP 8:

> The preferred place to break around a binary operator is after the operator, not before it.

I'd be interested in hearing the justification for this rule. I think that leading a continuation line with the binary operator makes it super-clear that it is a continuation line. What is the benefit of the preferred style? Compare:

  if (the_result_of_this_function(on_this_arg) == 10
      and this_overly_descriptive_boolean):
      do_stuff()

  if (the_result_of_this_function(on_this_arg) == 10 and
      this_overly_descriptive_boolean):
      do_stuff()
To me, the first one is quite clearly a continuation line (no statement can start with "and"). The second requires closer inspection.


I would write:

  if the_result_of_this_function(on_this_arg) == 10 \
  and this_overly_descriptive_boolean:
      do_stuff()
Indenting the second line of the if statement would, at first glance, indicate that it's part of the block instead. Then again, it depends. If it was the header of a def statement, I would follow the PEP, e.g.

  def __init__(self, width, height,
                     color='black', emphasis=None, highlight=0):
On a side note, I once did the "Art & Logic challenge" [http://www.artlogic.com/] and they use guidelines that apply to several languages, e.g. you would use the same formatting style for C++ and for Python, if at all possible. Much of it flies in the face of PEP 8.


I agree on this specific case but consistency with conventions shared across projects is more important.

But I don't think anybody will complain if you use either of the them whereas 160 chars long expressions with no spacing between operators and funkyCamelCasing all over the code are just show-stoppers when I want to contribute a patch to a project.


PEP8 is wrong on several counts. It even understands this the first section (after introduction) is "A Foolish Consistency is the Hobgoblin of Little Minds" which is about the spirit of pep8 readability and consistency and explains some situations when you should violate pep8.


I don't think most researchers ever expect anybody to read their code. Woe to the graduate student who years later actually needs to use the code.


That must change. Science must be reproducible. Other researchers should be able to dive into each others code quickly to understand the impact of implementation details.


Well yes and no, if they're doing their job right they describe the method in such a way that you don't need their code to reproduce their results.

Code should not be Documentation.

Further nobody trusts anybody's code anyway unless it's just a couple of trivial calls to a pre-vetted software package like IRAF, AIPS (to name some astronomy related one), or LAPACK. So generally they don't want your code. the exception is grad students trying to apply your old work to new data because they aren't in a position to be trusted with completely original research yet.

Yes it'd be nice if every one had great readable code and handed over the 2 terabyte data sets that it needs without batting an eye. but in practice code quality is pretty low on the ladder of "things that get in the way of collaboration"


>Code should not be Documentation.

Code is for humans to read, that it compiles/interprets to a program is a side effect. Otherwise we'd all be passing around binaries (or byte encoded files) with our thick stacks of documentation.


What's your take on the multitude of software that you buy together with all the README files, Word or PDF documents describing how to use the software, what does it do, and all that jazz? Do we (humans) get to view all that code and see what Microsoft Office Word 2007 can do for us?


> multitude of software that you buy

Why do you assume I buy any software?

The claim "code is for humans to read" does not logically lead to claim "code is the only thing for humans to read". There are different kinds of humans, programmers, maintainers, end-users, and idiots are some. You're a member of the later.


> Do we (humans) get to view all that code and see what Microsoft Office Word 2007 can do for us?

Well, we should be able to, but no, we can't, precisely because we don't get code - we get binaries.


The hypothetical universe in which 'code' is interchangeable with 'natural language' does not concern me because, as explained, we don't live in it.

Or maybe not just yet.


Python, unlike any other language I've dealt with, lends itself very nicely to producing stuff that's reusable and easy to understand. I chalk this up to

* lack of elitism in documentation (e.g. there are always plenty of examples)

* lack of elitism in conventions for code use: everything "just works", generally without any boilerplate

* installing libraries is a snap, and the whole module organization system is intuitive and elegant

* assumption that anything that's not a script is a library

* documentation conventions (doctests, e.g., are a nice stepping stone to good documentation _and_ code testing)

* the "there's only one way to do it" attitude

* large standard library

On the other topic: you are describing the way research works "today", which is actually pretty poorly (why, e.g., does all data need to be surrounded by so many words of introduction and discussion? why can't I just add something to someone else's work like I can add to an open source project?). This model of research will change, at one point or another, to resemble the much more efficient, effective, and fun, open source project model.


If your system is complicated enough (which can be the case for complex machine learning or NLP algorithms), an 8 pages paper (common limit for many conferences) cannot describe all the implementation details but those implementation details might be very important to be able to reproduce the results.

Hence code should be both published, well documented and readable.


Fair enough. When I think of scientific uses of python I think astronomy, atmospheric physics, finite element analysis and linear systems using existing techniques...

Existing techniques in general really. Fields where the interest is the data and the implications of the data. Fields like ML and NLP where the algorithm/technique is the thing of interest then yeah sure the code is important.


I agree publishing datasets is very very important too. Often more important than code.


I worked in projects where different used slightly different coding styles and I did not find it getting in the way too much -- you just match the style of the code you are working with. I am surprised there are people who will not contribute to a project because of this.


This is one of the many reasons I love Go, gofmt takes care of almost all the silly style issues, and there is no need to learn any style guide, just run your code (or anyone's code) through gofmt, and you are done.


I've wasted most of my professional life tweaking various unix software to make it work. However, the typical scientific python setup proved to be too frustrating to install on OSX. The recommended solution is to just buy the Enthought distro. If I'm paying for software anyway, why is Enthought better than Matlab?


disclaimer I work for enthought

I did my whole phd in matlab.

EPD is much cheaper and is free for academics

even if it weren't free, I would use it anyways.

but it really isn't why is EPD better than matlab, it's why python is better than matlab. matlab is a domain specific application with a domain specific language. It doesn't work well with things outside of its domain.

python is a general purpose language (And as such, has good general purpose constructs) but it happens to have excellent scientific and mathematical libraries. This is useful when you actually have to apply your research and build an application.

numpy is also better for large data, because slicing arrays does not create copies of them (you can make it do so if you want to, but it doesn't by default) in matlab, slicing large arrays can cause you to run out of memory.

Cython makes it really easy to start out with python, and then optimize your code down into C.

with python you can run your calculations over a massive compute grid. Use messaging libraries like PyZMQ to distribute your data and result, and build real time GUIs to consume the final results.

- a matlab cluster is quite expensive

- chacko - another enthought python library which is free and open source is great for real time datavisualization, matlab does not have anything equivalent.

- python has a large number of messaging libraries, with matlab I think you're stuck with MPI.

Matlab always made me feel limited. I would work on a problem, and then reach a point where Matlab could not do what I needed to do.

That rarely happens to me with python.


Thank you, that is the kind of response I was looking for. I will take a look at Enthought.


you WILL get frustrated by some things - some of the matrix concatenation operations are less convenient, some of the libraries are less polished, it's been worth it for me. msg me if you need help.

use IPython, not just python shell for interactivity.

also checkout 3d datavisualization with mayavi, that stuff is really awesome.


> use IPython, not just python shell for interactivity.

or bpython


Is there a mayavi tutorial somewhere? It looks pretty interesting, but I could never figure it out.


http://conference.scipy.org/scipy2010/tutorials.html

there was a mayavi tutorial, and the files are available at the link


> - chacko - another enthought python library

Ah, you mean "Chaco" -- it's easier to find with the correct spelling. :) http://code.enthought.com/projects/chaco/


Is there something similar to Simulink available for Python yet? That's pretty much the only killer feature of Matlab for me these days. Any other number crunching I do in Python.


not that I know of


Thanks for the reply, I am checking out the academic enthought lib. now


I've never had any problems putting a research-grade python setup on OSX - just use the .dmg installer files files available for python (2.6.X for compatibility), numpy, scipy, and matplotlib. But I agree that the Enthought Python Distro is also a good alternative (if a little bloated for my needs), and it's also free for academics.


When I tried to use easy_install with the EPD it tried to take me to their repository, for which we don't have appropriate permissions? So I sucked it up and installed some from .dmg's and others from source (matplotlib was difficult). And I found out just because it's Python doesn't mean it's portable - for instance NumPy/SciPy is hardly ever installed on other machines (it's not trivial, as is written here many times), and I find that administrators won't update their Python installation so I've had to re-write bits of my code to accommodate Python 2.3-2.4 at times! But all said, it's a nice tool and hopefully these installation/ version issues will work itself out.


For a basic setup on OS X, I found scipy superpack to be an excellent choice: http://stronginference.com/scipy-superpack/


I've found macports provides a ton of python libraries, like numpy, scipy, and matplotlib, all easily installed from the command line.


Another option is to just install Sage. It "just works" to install. Though it is less about numerical computation than symbolic computation. (Though both are targets, there isn't really equal focus, in my opinion.)


I learned Python on the fly specifically for research. I used Django to build out an enterprise reporting/analytics system to support a customer experience (survey) program for the cost of time (huzzah open source). We had bids on this project upwards of 80k. We generate ~100k surveys / month and are able to get targeted, meaningful, automated insights directly to front-line management. Python FTW.


Python + Django FTW.


Python rocks, but Python + R + bash rocks way harder for research


R is definitely powerful and a good part of any scientific data analysis toolkit.

I use python, ipython, matplotlib, numpy and R. I call my R scripts directly from python using rpy.


Agreed. I use Python for heavy shell-scripting and text-processing (though R surprisingly does have respectable facilities for all but the most overwhelming of these tasks) and R for the rest of the analysis. I've thought about switching to NumPy/SciPy as it's part of Python to integrate everything, but R's data frame, factors, and reshape, plyr, and lattice packages makes you think very differently about how to approach the data - and hard to go back to lower-level manipulation of arrays; not to mention all the stats/graphics packages which are very easy to install and apply. And documentation of its functions is superb.


I feel this article is somewhat unbalanced in its single minded rejoice for a certain tool/environment. So in the same spirit here come a couple of reasons not to switch from Matlab to Python, all stemming from my experience when I decided to try to switch from Matlab to Python/C

- installing all these packages on (any) system is painful. Different versions don't play together or don't work (yet) on some platform and or architecture. This stems from my own experience of getting a version of python to work with numpy, scipy, matplotlib, opencv and PIL on a windows, mac, and linux machine. No 100 percent success yet on any platform.

- central and consistent documentation. Even for very simple cases, I got a bit of a headache. I encounter a python print statement for the first time that obviously differs somewhat from its c printf cousin. I google "python print syntax" only to find that the first xx hits, including the official documentation, do not cover the full specification of this statement. I fear the moment I might actually need detailed information on something less trivial.

- Numerical integration is more accurate in Matlab.

- Visualization capabilities of matlab are more powerful. But who knows, perhaps there is yet another package floating around :-)

- Matlab may not have advanced data-structures, but it is a rapid prototyping tool, for testing ideas. If I need to write an actual application, I will use a tool and language geared for that task.


> central and consistent documentation. Even for very simple cases, I got a bit of a headache. I encounter a python print statement for the first time that obviously differs somewhat from its c printf cousin. I google "python print syntax" only to find that the first xx hits, including the official documentation, do not cover the full specification of this statement. I fear the moment I might actually need detailed information on something less trivial.

http://docs.python.org/reference/simple_stmts.html#the-print... lists eveything about `print` - it takes stuff in, converts 'em to strings, and sticks in on stdout. It doesn't refer to prinf-like formatting because that's for strings in general. If you weren't aware of this, you probably should have been going through a basic Python tutorial, rather than just jumping into the middle of things.

Python's documentation is the best I've encountered so far, and I find good docs to be an important value in the community, as well. I guess YMMV, though.


install is painful - enthought python distribution does make it pretty painless, but its not free for non-academics

agreed on documentation

actually I think python's visualization capabilities are more powerful, have you looked at mlab? the 3d capabilities there are insane

I use python because I can do rapid prototyping, and turn it into a full application with the same code base.

did you ultimately go back to matlab?


Well, I only got it all the work ( on OS X ) like an hour or two ago, and am currently happily learning&exploring.

I wanted to venture beyond Matlab because for what I am currently doing the environment and language is to limited, yet I do not wish to prototype in C++. Python together with some libraries seemed to be a deal in heaven. I also thought it would function as a better stepping stone towards an actual application.


Installation of the PIL pre-built packages on OSX is notoriously difficult, since it relies on system c libraries to do some processing.

Moving your development to a linux machine will clear up all of those issues.

Also 'easy_install' should get you all of the packages you want.


Python is good, you could also consider Maxima.

A single example:

f(x):= x^2+3x+7;

Maxima provides: Symbolic computation, blas and laplack integration for numeric algebra, 500 pages manual in several languages, a complete library for statistics, differential equation, calculus, series. Graphics with matplotllib. Also maxima language is not much complicate that python:

for i in range(10):print ii versus for i:0 thru 9 do print ii;

[i2 for i in range(10)] versus makelist(i*2,i,0,9)

But Matlab libraries are greater than python and maxima.


Maxima rocks for symbolic math. I prefer it to Mathematica, which is saying a lot. In contrast, octave always feels like "almost-Matlab" and I still prefer the latter.

Also see wxMaxima, which will (among many other things) produce LaTeX for you.


If you haven't already, you might also want to check out Sage (http://www.sagemath.org/) It uses python to "glue" together other free math tools (e.g. Maxima) into a unified system with a nice interface.


Python has sage and sympy for doing symbolic math. Although admittedly they're quite primitive compared to maple and mathematica (haven't used Maxima, so I can't really compare)


Sage actually includes Maxima, for ultimate convenience.


Often languages are chosen based on already existing tools used in a specific research project. That's why it's good to be able to quickly pick up new languages. For example scripting languages integrated in some frameworks, like Scheme in the Festival speech synthesis system. In the end, this often results in projects involving things like Python, Scheme, bash, R and a bit of tcl ;-).


> In MatLab everything is flat – all functions are declared in the global namespace. However, this discourages code reusability by making the programmer do extra work keeping disparate program components separate. In other words, without a hierarchical structure to the program, it’s difficult to extract and reuse specific functionality.

I completely disagree. Reusable Matlab code has been my holy grail for the last couple of years. The key is to break out specific functionality as subfunctions. When these are abstracted and generally useful elsewhere, then they become new tools for the toolbox. The subfunctions also make great starting points for repurposing code. This layout results in much less work.


It's absolutely true that using more functions makes your code more maintainable. It's also absolutely true that nearly every other language on the planet does this better than Matlab.


Yes; in particular, Matlab's restriction of ONE outside-callable function per file is a huge pain.


For those interested in scientific frameworks and existing packages: Scientific track of EuroSciPy 2010 [1]. Everything from seismology to visual programming.

http://www.euroscipy.org/track/870


One more powerful combo is R+Java+C. UI, integration and massive data processing in Java, R for prototyping, plotting and model fitting, and C if you have numerical simulations.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: