Hacker News new | past | comments | ask | show | jobs | submit login

I've been considering Clojure recently for data science. In your experience, could you tell me how Clojure stacks up against the likes of R and Python?

I guess the performance should not be much better (thanks to JVM), and the learning curve is a bit steep. Oh and one question about Lisp macros: in your line of work, have you ever confronted a situation where a macro saved the day?

Thanks.




I am not the addressee of your question but I use clojure as well for DS and I thought I extend on a few aspects of my sibling post.

- Performance 1: It's really fast iff you avoid boxing, reflection and so on. Even without this optimization it's mostly fast enough. However, 1 out of 10 times, it get's too slow and you need to change a few things in the code so it gets faster. It's easy and never requires a lot of time (after the first :P)

- Performance 2: Memory consumption is sometimes quite high, in this cases you should use arrays and records instead of vectors of maps. Also JVM args (e.g. -DXmx8G) is your friend.

- macros I don't ever use.

- filtering, aggregating, .. is a breeze and usually a line of code. E.g.

  (->> a-vector-of-records
      (filter #(> (:id %) 100) ;; all ids above 100
      (remove (comp #{:a :b} :category) ;; without categories :a or :b
      (map :value) ;; take the value
      (reduce + 0) ;; and sum it
- Tools: I use incanter.stats (for statistical things I didn't yet implement), incanter.charts for the visualization and incanter.optimize for linear models. Other stuff (GLM, FFT, ...) I implemented myself or directly use the libs incanter uses.

- For report generation (notebook-like) I typically use clj-pdf which is usually the first deliverable of every DS task I have.

- For learning: I came from Javaland, and I benefited heavily from the 4clojure koans for the practical stuff and "The Joy of Clojure" for the "clojure way of thinking".

- For using it: I had to "learn emacs" (and paredit and so on) at the same time, it complicated everything and also was a drain on my motivation. It's good if you have a colleague who uses the setup already, alternatively youtube has many videos on this. Today, I'd never switch back because just using eclipse (or something else) makes me feel that I need 10s to execute a thought instead of keyboard shortcut. When you're there you also might want to switch to i3wm (if you don't have it already).


Sure,

RE performance: it's JVM vs R interpreter vs Python interpreter. JVM takes it by far.

RE ETL: lists and maps are basically the main data structures in Clojure and resultantly you have a very large collection of functions for working with them efficiently. ETL generally is easy as a result. I don't think either Python nor R can compare in terms of expressiveness and LOC efficiency. Further, lazy sequences make it really easy to work with massive datasets because it doesn't require memory buffering. Working in parallel is really easy in Clojure. In particular fold,pmap and async channels are extremely easy to use and understand.

RE analysis: Clojure is not vectorised like R for example. Meaning syntax is not as terse for sub-setting, aggregating, etc. It doesn't support formulae as first-class citizens like R does.

E.g. filtering in R vs Clojure.

X <- D[D$type == "Seismic",]

vs

(def X (dt/where (comp #(= % "Seismic") :type) D))

E.g. aggregating in R vs Clojure.

X <- aggregate(count ~ type, data=D, sum)

vs

(def X (dt/aggregate :type #(reduce + (map :count %)) D))

Having said that it's functional, and doesn't require one to loop or to mutate, it's just not as terse.

Clojure is good for stats (because of Anglican) and good for ML (because of Java and https://github.com/thinktopic/cortex).

RE viz:

Clojure sucks at Viz. There's no compelling notebook solution (Gorilla REPL...) that supports mod-cons like exporting or easy sharing.

However, I really like orgmode + babel. It makes it easy to grab data from Clojure and push it into an R block which generates graphs for example and it exports into everything.

RE macros: I haven't written any custom macros at the moment tbh. If I do in the near term it'll probably be for logging or timing. I find that the function format is extremely flexible.

RE general: Clojure has a steep and short learning curve. As in you cry for a few days but then suddenly understand loads. I also think the way of thinking it encourages is more mathematical. If you stay away from swap!, refs, agents, etc (which you can), then your world is immutable which makes testing so much easier. True error messages suck but as soon as you get a handle for how to write the code interactively with the REPL it's pretty easy to track down what causes an issue.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: