Hacker News new | past | comments | ask | show | jobs | submit login

What's with the Matlab hate?

Sure, it's closed source, expensive, and can be clunky, but it is not a horrible tool for many jobs that involve a mix of signal processing and other data analysis.

Scipy is often a viable altenrative, but if the company already has a bunch of Matlab code, rewriting things in Python probably wouldn't be worth it...




>Sure, it's closed source, expensive, and can be clunky

And slow. Don't forget slow. Closed-source, expensive, clunky, difficult to read, and slow.

(The first time I took machine-learning in grad-school, our assignments were in Matlab, and testing them required me to stay up until 04:00 in the morning at least once a month. A validation run just would not take anything less than five to six hours. The second time I took the class, I took it in the computer-science faculty, and the professor gave the assignments in scipy/numpy. Validating and debugging that was easy.)

(Just this year, I actually went and rewrote a probabilistic program in Haskell from a probprog package based on Python. Interpreted languages are fucking slow for numerical jobs.)


I'm a little skeptical. Yeah, interpreted languages are slow, but under the hood numpy / Matlab call out to BLAS/LAPACK. There's a constant overhead per function call, but if you "vectorize" your computations to work on arrays instead of single values, the difference is not that great.

Seriously, Matlab = MATrix LABoratory. It's a DSL for matrix math, and it's really pleasant to use for that.

Also, my Matlab code is a joy to read. It's not Matlab, it's that most people who write Matlab code aren't professional programmers / don't care about making it pretty.


Yeah, I'm not sure I believe that either--it's calling the same libraries python calls. It is possible to write matlab code that runs incredibly slowly--using eval to make anonymous functions calls to (badly-strided) individual elements of an array--but decent matlab and numpy code should be pretty comparable.

> It's not Matlab, it's that most people who write Matlab code aren't professional programmers

This, 1000x, this. There's also the time-honored tradition of a quick one-off script "to see if it works" that somehow becomes permanent.


Eh. It's more that Matlab code can either call out to fast linear-algebra libraries or go through an interpreter, and you have to memorize which language constructs do which. I'm definitely a professional programmer, but I still can't write fast Matlab code for that reason.


The slowness and the pan-everything global namespace for the primary libraries are the two biggest problems. The interface for performance-targeted optimizations (mex files) is awful and doesn't hold a candle to Cython. These are downsides that would prevent me from using an open source and free tool ... so when you add that it's closed source and expensive, it just simply does not ever make sense.

If a place finds themselves having a lot of MATLAB code, it means that earlier they didn't refactor and retool when the choice was cost effective to do so. That's strong evidence that for whatever they are working on right now they also will not refactor it to at least try to avoid the future costs of current poor designs. That's a huge red flag.

I guess if they paid an insanely high salary or gave you some other type of assurance that you personally valued, you could trade it off against the red flag evidence. But generally these places also tend to just hire maintenance engineers, since they know they can't offer interesting work. It's best just to avoid and work for places that bend over backwards to refactor and evolve their tooling over time specifically to prevent this problem.


> If a place finds themselves having a lot of MATLAB code....That's a huge red flag.

Serious question: do you feel that way about other older languages too? Would you balk at a C++ shop (which isn't doing high performance or low level stuff?)


I would only balk at a couple of things, MATLAB, VBA, and extensive server-side Javascript being the main red flags of poor tech culture.

I have some additional skepticism about (a) shops that very quickly adopted Go and then display a cult-like dogma about how super perfect and awesome at all things networking that Go is and everything else isn't; and (b) shops that use Scala or Clojure purely as "better Java" and often have thin, entirely unfunctional layers wrapping bad legacy Java code -- in such places, the Scala and Clojure often exists specifically because they had a hard time recruiting people to be legacy Java maintainers, and having them do it through one layer of indirection, such as Scala, let them tap into a more vibrant labor market with a lot of people who just naively believed that e.g. usage of Scala == adherence to good quality standards.

The age of the language has nothing to do with it, and I actually quite enjoy C programming. I'm not a big fan of C++ code bases in which some of the more esoteric language features are greatly abused, but generally feel like C++ is a great tool and would in fact enjoy the chance to dig more into in a future job.

I also quite like code maintenance and legacy code extension, because when the company takes it seriously and values that kind of work, it involves a lot of cool abstraction, working with interfaces, design tests a la the Feathers book on testing in legacy code. I think that stuff is quite fun, and a healthy culture of refactoring makes it interesting to work on legacy systems.

While it's not an iron-clad rule, certain kinds of legacy choices, however, are not really compatible with a spirit of quality, legitimately valuing maintenance or refactoring, etc. MATLAB and Excel VBA are by far the strongest indicators that it's not a good situation.

The items above are just language choices, too. There are also many other things to worry about. For example, when a place uses MATLAB, it often means some of the work they do is scientific / quantitative prototyping. Do they enforce good code standards even at the prototype stage so as to minimize the distance between research prototypes and production? Failing to do this is a seriously gigantic red flag. It can mean that the management chain values the domain science more than the quality of the implementation, and often this means that domain scientists are allowed some of the following

- to manage their own working environments (leading to lots of awful "but it works on my machine" errors)

- to write every thing as giant, messy, linear scripts that start off with 200 lines of boilerplate data loading and model setup code that should be factored into a library but is instead copy/pasted and finish off with 200 more lines of custom plotting code that also should be factored into its own internal library for standardizing reports and charts

- to turn things into short-term fire drills for production programmers solely by virtue of them being "someone who uses MATLAB, not <some real language> that we use in production", and get management support for this.

- never bother to learn object oriented or functional programming principles -- google a design pattern and then just paste some ill-conceived bastardization of it wherever they feel, and then argue with production engineers that you can't refactor it "because it's a design pattern"

- ... I could go on.

When I see places that heavily use MATLAB or R for research systems, I break out in a cold sweat, since those languages are entirely unsuitable for professional software design. They are good for ad hoc linear algebra, optimization, model fitting, plotting, and statistics. But the software design underlying how those things are carried out in MATLAB and R does not translate at all to a system where the scientific code is maybe 1% and the business reporting code, customer-facing services code, etc., are the 99%. And putting the scientific code at 1% is generous even for a company that is solely about scientific computing or quantitative services.

Anyway, there is just a large difference between organizations that work from a systems engineering perspective first, and build that way, and mercilessly require non-programmer PhDs to get up to speed on actual, principled software development even for doing ad hoc domain specific work.


> management chain values the domain science more than the quality of the implementation

That is the crux of the issue. Many Matlab and R users don't see themselves as producing software per se. They are producing "recommendations" or models or papers; the code is just a means to that end, and so who cares if it is terrible? No one will do this exact thing again anyway... Obviously, that is pretty myopic. If you have good "infrastructure" code, it makes writing the one-off parts easier, faster, and less buggy--which it makes it easier to turn the one-off code into infrastructure.

Out of curiosity, how could I convince you that joining our group (which does have a ton of matlab code) wouldn't be awful? Or what would convince you to hire someone whose last job was predominately matlab?


Oh really. Is this Haskell program anywhere online?


If the resultant paper gets published, the repo on github will go public and the link will be in the paper.


I ask since I work on probabilistic programming in Haskell and am always looking for more use cases.

Good luck and I hope the work is published.


Hey, can you tell me what probprog system for Haskell you work on? I might have been meaning to write a sampler for you.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: