Hacker News new | past | comments | ask | show | jobs | submit login

I tried switching to Clojure, and while I love the purity, simplicity, and logic of it, two things always bothered me:

1) Immutable data structures are always going to be slower than their mutable cousins.

2) "Clojure code is beautiful" should be changed to "Your OWN Clojure code is beautiful". When I finished writing a compact piece of logic or data transformation, I was often struck with the beauty of it. When I tried to read someone ELSE's Clojure code, however, I couldn't even begin to make sense of it.

I am ever open to being proved wrong, however. Any Clojure programmers reading this, please reply with some code that is readable, elegant, and performant, to provide a counterpoint to my pessimism.




> Immutable data structures are always going to be slower than their mutable cousins.

Not always. It depends on usage. Mutable data structures can actually entail more work if the data needs to be accessed from more than one place. Immutable data structures can be shared and reused much more freely. (For example, adding an item to an array you got from somebody else means either the caller or the callee needs to have copied the whole array, while adding an item to an immutable list you got from somebody else means creating a single cons cell.)

At any rate, even if a destructive solution would be faster, the functional solution will probably still be faster in Clojure than the destructive solution would be in Ruby. And if not, you can still do something destructive — it's just not the default.

> "Clojure code is beautiful" should be changed to "Your OWN Clojure code is beautiful". When I finished writing a compact piece of logic or data transformation, I was often struck with the beauty of it. When I tried to read someone ELSE's Clojure code, however, I couldn't even begin to make sense of it.

I believe this is largely a matter of experience (not completely — some code just is hard to read!). Most people get that feeling when they're working with a relatively unfamiliar language. I find that the more familiar I become with Clojure, the better all other Clojure programmers seem to become.


> I find that the more familiar I become with Clojure, the better all other Clojure programmers seem to become.

That could be. However, I believe it's more to do with the fact that idiomatic Clojure encourages continual definition of nested layers of abstraction. Which makes the code beautiful from a certain viewpoint. However, it's also very similar to essentially writing a new language and adding features to it. If you buy this comparison, then reading your own Clojure code is like reading code written in a language you wrote, which is tailored to your own ideas about aesthetics and interfaces.

Which is great. However, that would then mean that in order to read anyone else's Clojure code, you would have to learn a new programming language every single time. I would buy that this gets easier the more you do it, because my second programming language was definitely easier to learn than my first.


That's true to some degree, but for well-written code, I don't think it's necessarily as severe as that description makes it sound. Any Ruby on Rails codebase is also chock-full of pseudo-DSLs (controllers, scopes, formats, etc.), but you don't hear the same complaints about Rails code being unreadable very often. It's possible to go overboard and turn your codebase into the Clojure equivalent of Negaposi, but from my (somewhat limited) experience, it's not exactly endemic.


> That's true to some degree, but for well-written code...

This is sounding painfully close to No True Scotsman argument. Also, for the number of times I've heard Clojure programmers say this, you'd think I would have at least ONCE been shown some of the programmer's actual code, in an effort to provide a concrete example of how elegant Clojure is.


> This is sounding painfully close to No True Scotsman argument.

Well, excuse me for trying not to make over-broad claims. Would you like me to go find some Negaposi to prove that other people's Ruby code is unreadable? The thing about "No True Scotsman" that makes it bad is that it makes a categorical claim, but it defines the category in an arbitrary way. I'm not really making a categorical claim — I'm just saying that, much like any other language, Clojure authors can write readable or unreadable code. Clojure gives you slightly more powerful tools, but overly "clever" use of the tools available in Ruby can produce unreadable code just as well.

> Also, for the number of times I've heard Clojure programmers say this, you'd think I would have at least ONCE been shown some of the programmer's actual code, in an effort to provide a concrete example of how elegant Clojure is.

Take a look at any of the more popular Clojure projects (https://github.com/languages/clojure is a good place to start). I find that they tend to be fairly readable.


As someone who has breathed Clojure for the past three years, I very rarely come across code that is truly difficult to understand. It does tend to be more information-dense than other languages (especially Java) but that just means I only have to understand a couple hundred lines split into 3 files instead of thousands of lines across dozens of files.

That said, there is definitely some nasty Clojure code out there, but I'm not sure that's avoidable in any language. Fortunately, the conventional wisdom to only use macros when necessary has started to catch on and that's made the situation a good deal better.


> As someone who has breathed Clojure for the past three years, I very rarely come across code that is truly difficult to understand.

I'm not really arguing that Clojure code is "truly" difficult to understand, if that is taken to mean reasoning about its performance and effects. Rather, I'm talking about the fact that idiomatic Clojure code encourages writing functions upon functions upon functions, which continually builds up layers of abstraction that inherently make code more and more difficult to read. Example:

console.log("Hiya.");

console.log("How's it ");

console.log("going?");

vs.

printFirstString();

printSecondString();

printThirdString();

function printFirstString() { ...etc }


I don't know anyone who writes code like that. Unnecessary wrapping of functions is an antipattern.

Idomatic Clojure for what you just wrote is:

(println "Hiya.")

(println "How's it ")

(println "going?")

If a Clojureist wanted to get fancy and build an abstraction (as they probably would) they'd probably write a debug function instead of using 'println' directly:

(log/debug "Hiya.")

(log/debug "How's it ")

Which is, granted, an abstraction, but isn't any harder to read.


Or define a function called `printlns`. Then call...

    (printlns "Hiya." "How's it " "going?")


Actually the default `println` already does that. ;)


`println` inserts a space between its arguments. I was showing we could define one that inserts a newline instead.


I could say the same thing about Java. Clojure/Lisp code just tends to have functions that do mostly what their names indicate. Java code tends to be broken up into billions of classes, and understanding an algorithm tends to involve trying to collect together all the pieces of it that are spread out as methods in a bunch of classes.

Take something like code generation from an AST. In Clojure, you might do it in a single file with multimethods. In Java, the code generation algorithm would be spread out over dozens of different AST classes, each with a visitor method.


Clojure has some very clever immutable data structures based on Phil Bagwell's work [1] that help mitigate the slow down.

I think you are totally right with regard to reading other people's clojure. Sometimes it's easy and pleasant, sometimes inscrutable.

I think the issue at hand is that it is so easy to go the DSL route in clojure. I call that an issue because when you invent a DSL to solve your problem, you've essentially created a new language that only you understand.

So now, if I'm reading your code, I have to understand raw clojure, and then I have to try to figure out your DSL.

[1] http://lampwww.epfl.ch/papers/idealhashtrees.pdf


Re-posting at a higher level of the tree since the branch I posted this in earlier might not be visible (sorry about the breach of etiquette)

Here's a microbenchmark I just threw together in a REPL, since you ask: The operation is to create a map data structure, then, ten million times, add/assoc a key value. The key is the .toString() of a random integer between 1 and 100,000. The value is the integer.

Therefore, the resulting map will end up with a size of 100k elements, and be modified 10m times.

Except for the map implementation used, the code is identical.

I ran each test six times, discard the first three (to allow the JVM to warm up) and took the average result time of the other three.

The results on my 2010 MBP:

java.util.TreeMap: 8573ms

java.util.HashMap: 3243ms

Clojure immutable hashmap: 7909ms

Clojure immutable treemap: 21248ms

Clojure hashmap (using transients): 5113ms


> Clojure has some very clever immutable data structures based on Phil Bagwell's work [1] that help mitigate the slow down.

This is true, and I've read some of the papers describing said structures and was blown away by the cleverness. However (and I don't have a source here, so please forgive me if I'm just totally wrong) I have watched many of Rich Hickey's hour+ long talks, and in one of them, I distinctly remember him saying that his vectors were "Within an order of magnitude of mutable vectors performance-wise, which is good enough for me".

I was... put off by that.

EDIT: Now that I think about it, it might have been Maps that he was talking about. I are not good at remember.


So that was a pretty early talk. I don't have benchmark numbers in hand, but I'm reasonably sure that the performance gap isn't that wide any more.

Also, there's a new(er) feature called transients that can allow you to use mutable data structures where needed for performance without changing the structure of your code: (http://clojure.org/transients).

And you can drop down to mutable Java data structures, if you need to: they are fully supported by Clojure, just not the default.

In general, Clojure's approach is to be thread safe and purely functional by default, but it's not too difficult at all to drop out of that for particular parts of your code you've identified as bottlenecks.


Honestly, one of the things that bothers me the most is how Clojure enthusiasts get hand-wavey when people talk about performance. They will often "prove" its performance characteristics instead of providing solid real-world examples, and will often dismiss other peoples' benchmarks as having something to do with the quality of the Clojure code on which the benchmark is run.


Whoa. I didn't dismiss any benchmarks, and I'm happy to back up any of what I've said with real-life experiences.

For example, I was working on a system a few weeks back that involves loading several-hundred GB CSV files. I wrote it in a functional style, with several map and filter operations over each of the rows. I was able to program the logic viewing the entire CSV file as a single lazy sequence to transform.

After profiling I ended up using Java arrays instead of Clojure vectors as the primary data format for records in the system: given the fact that this app was IO-bound, this resulted in about 20% improvement for that particular use case.

However, because Clojure is capable of treating arrays as sequences polymorphically, the change required very few modifications to my code and I was able to maintain a functional style.


Also, here's a microbenchmark I just threw together in a REPL, since you ask: The operation is to create a map data structure, then, ten million times, add/assoc a key value. The key is the .toString() of a random integer between 1 and 100,000. The value is the integer. Therefore, the resulting map will end up with a size of 100k elements, and be modified 10m times.

Except for the map implementation used, the code is identical.

I ran each test six times, discard the first three (to allow the JVM to warm up) and took the average result time of the other three.

The results on my 2010 MBP:

java.util.TreeMap: 8573ms

java.util.HashMap: 3243ms

Clojure immutable hashmap: 7909ms

Clojure immutable treemap: 21248ms

Clojure hashmap (using transients): 5113ms

Data!


I know that, in the Java world, you guys probably don't really understand this fact, but 3x slower is not only bad, it's so bad as to be completely unworkable in the real world.

Source: I was a Google engineer for 5 years.


Mutable vectors are about as fast as it gets (sequential writes to contiguous memory), so within an order of magnitude is still pretty fast. And for the default, catch-all data structure, I'm not convinced that being as fast as possibly possible is ideal. The C++ STL takes this approach, going to great lengths to be as fast as C arrays, but I think it suffers a lot for it in terms of usability.

There is a cultural distinction to be aware of here between Lisp programmers and C/C++ programmers. C/C++ programmers tend to be very "layer oriented." They assume everything will be built on top of built-in constructs like arrays, so they try to make it as fast as possible. But Lisp has historically been more of a "bag of stuff." It offers lists as the default "go-to" data structure, which has a lot of positive qualities. Mutable arrays are a separate data type that you can use if you need the speed. I think Clojure inherits some of the same thinking. Raw JVM arrays are always there if you need them, but the defaults are optimized for things other than raw speed.


Clojure's immutable data structures allow you to scale out across many cores without needing as rigorous locking semantics as a destructively modified version of the same work would need; copy on write presents no contention until you want the new value to be the canonical value.

Would I trade a 1000% performance hit on write ops in exchange for the ability to scale out over 10000% as many cores simply? Every day of the week.

Single threaded execution is a computational dead end. If you want to go faster, you have to parallelize, be it on a single system or on a cloud service. Clojure's persistent data structures ease this. That the persistent data structures also have canonical serializations ALSO ease this.


Give me a break, we're talking about vectors here.

I like Clojure, spent some time messing around with it a couple years ago and will one day actually use it for something, probably involving complex configuration where code-as-data really shines along with concurrency/performance.

But if you're talking about working a ho-hum vector with 100-10k entries, a linear scan over a mutable, contiguous array will typically be faster than the most clever multithreaded code you can come up with, and take up less of the CPU while it's working. 10 cores are a Bad Idea for that kind of work.

Amdahl's law tells us we should look at larger units of concurrency in our architecture rather than getting all excited about some auto-parallelized map function. At that point, it starts being important how fast the (single-threaded!) individual tasks run.


Well, no. A linear scan over a large memory array is going to crap all over the CPU caches if you have to do it more than once.

Break into blocks < CPU cache size, perform multiple stages on each block.

Having all that handy control-flow stuff makes it easier to get the block-oriented behavior you need to maximize performance, which in these cases is all about memory bandwidth.


Do immutable data structures data structures really let you scale out so easily? I thought that was something of a myth...


"so easily" is poorly defined, but when you are talking about collection data subject to concurrent modification you have a few options for correctness:

1. Read lock on the data for the time a thread is using it. This ensures that it is not destructively modified while iterating over it. This is a terrible option if you're using any kind of blocking operation during the lifespan of the read lock. The thread who obtains the lock runs quickly without any kind of penalty. After it obtains the lock, which might have taken an extremely long time. Especially if some OTHER thread was blocking with a read lock held.

2. Read lock long enough to copy the data. Work with your copy in isolation. You have to incur the cost of the linear copy, this might or might not be less than the cost of performing the work you actually want to do, but if it's close, your run time just went up 2x.

- Brief caveat: Any explicit locking scheme subjects you to risks from deadlocking. This is where complex multithreaded applications develop explicit locking orders from, and what can make large codebases difficult to work in.

3. Don't read lock, hope for the best. This can work better if you have a means to detect concurrent modification and restart. You might even get the correct behavior for 99% of cases.

4. Work with immutable data structures that cannot be destructively modified out from under you. Immutable data is computationally cheaper to read in the face of parallelism than every other option. It is more expensive to write to. What do your read and write loads look like?

- Also please keep in mind that while Clojure provides immutable persistent data structures out of the box and its reader generates them for all the canonical serializations, it does not take away Java's destructively modifiable types


> Would I trade a 1000% performance hit on write ops in exchange for the ability to scale out over 10000% as many cores simply? Every day of the week.

And this is, in my opinion, the best possible argument in favor of using Clojure. Off the top of my head, I remember Rich Hickey saying something about how he developed Clojure due to his irritation at working on a huge project that wrote code to handle thousands upon thousands of interconnected nodes. That makes perfect sense.

However... writing a web app with Clojure, at least to me, doesn't.


> However... writing a web app with Clojure, at least to me, doesn't.

Why not? I'm serious here, if you're serving thousands of clients in a web app, why wouldn't you want parallelism? I mean, sure at small loads you don't need it, but what about scaling up?

Also, I find using compojure for web app development to be an absolute dream. I might need to get out more (my day job is java web apps), but I love the ability to iterate rapidly in the repl on a web app without having to restart my JVM.


> I'm serious here, if you're serving thousands of clients in a web app, why wouldn't you want parallelism?

For the same reason you don't do parallel by default for every loop you write in java. It's overkill. You can successfully argue that multiple service calls across a network need to be parallel, but this is relatively little to do with a desire to serve many clients and a lot more to do with responsiveness. (a noble goal)

> but I love the ability to iterate rapidly in the repl on a web app without having to restart my JVM

HTML and mixed services > (any combination of java technologies you can dream up to serve web apps)


I totally understand. One thing to watch out for though is "premature optimization" as I've heard it called here on HN.

It may very well be that you need that order of magnitude, but if you don't then you might be missing out something you might have really enjoyed otherwise.


> Immutable data structures are always going to be slower than their mutable cousins.

True in general, but only if you just care about single core performance. A data structure that allows your algorithm to scale gracefully to more cores is actually more performant than a data structure that's faster for a single thread, but is extremely hard to scale in concurrent settings. Clojure sacrifices single-thread performance for multithreaded scaling, and it's the right tradeoff given current hardware trends (for example, consider the design of Clojure's reducers, and how it focuses on multi-core scaling).

My main concern with immutable data structures (and most data structures designed to work in concurrent settings, actually), is their heavy reliance on garbage collection, and on old-gen garbage collection in particular. Old-gen GC still has some ways to go, at least in HotSpot.


Reading other Clojure code is weird. I agree that it can be hard to get an eye for other peoples code, but the flip side is that everything is just functions and data. When I decided to go look at ring or compojure it was incredible how simple the code really is. I've looked at plenty of other clojure libraries and I have this revelation every time. Its awesome.

This is in comparison to reading an OO library in a language I'm much more familiar with but where inheritance / mixins mean you have to dig through many files (often not obvious which ones) to understand a piece of code.


I also struggle reading Clojure code that isn't my own. It reminds me of a quote by Brenton Ashworth on the subject: http://goo.gl/yCIWI


It seems like that mentality would make building a community of any kind difficult, as programming communities (companies, shared libraries, etc) are all about being able to read and improve one another's code.


I interpret this more as saying that it is ok if there is a barrier to entry. If it takes you six months of programming Clojure before you can easily read idiomatic code written by other people for example - maybe he's ok with that.


Interesting. One of the things I've come to love most about Clojure is that I find it easy to read and reason about code I did not write. Your comment demonstrates that is not universally true.


Have you used transients in Clojure? http://clojure.org/transients

They allow mutation as long as it's localized.


> 1) Immutable data structures are always going to be slower than their mutable cousins.

This would be ok, if immutable data structures weren't also much harder to reason about, implement, and actually use in your algorithms. Nothing is stopping you from doing functional programming and using immutable data structures in an imperative language (the reverse is very much not true), but why would you bother if it's easier to think about (and prove properties of, if you're into that thing) mutable data structures?

People recommend Okasaki's Purely Functional Data Structures all the time, but the main result of that dissertation aren't the clever immutable data structures, it's the fact that Okasaki was the first person to come up with a somewhat workable way to do asymptotic runtime analysis of immutable data structures. IMO it's not pretty.


I don't understand why you think that immutable data structures are more difficult to reason about and prove properties of. Perhaps you would enlighten me.


Likely at least in part because they've been studied far less than mutable/imperative data structures, and thus there are significantly less tools for reasoning about them. Plus the most efficient functional datastctures rely on lazy evaluation, adding a layer of complexity to reasoning about their behavior. In fact, in his 10-years PFDS retrospective Okasaki noted exactly that:

> In 1993, I ran across an interesting paper by Tyng-Ruey Chuang and Benjamin Goldberg, showing how to implement efficient purely functional queues and double-ended queues. Their result was quite nice, but rather complicated, so I started looking for a simpler approach. Eventually, I came up a much simpler approach based on lazy evaluation, leading to my first paper in this area. (When I say simpler here, I mean simpler to implement—the new structure was actually harder to analyze than Chuang and Goldberg's.)

or

> , I went into a frenzy, playing with different variations, and a few hours later I came up with an approach again based on lazy evaluation. I was sure that this approach worked, but I had no idea how to prove it. (Again, this was a case where the implementation was simple, but the analysis was a bear.) [...] The whole way home I thought about nothing but how to analyze my data structure. I knew it had something to do with amortization, but the usual techniques of amortization weren't working.


That's the passage I was thinking of. Compare the implementation and analysis of queues in Purely Functional Data Structures with any introductory data structures book if you don't believe me.


He means reason about performance/memory properties.


Immutable data structures are harder to implement yourself, not to reason about.

Anyone who has spend a few days with Clojure realize how much more easier it is to reason about them and to work with them.

Immutable data structures combined the software transactional memory that Clojure provides are a godsend.

In a lot of case they're also much faster then the good old copy-on-write and because they're immutable you have lockless concurrency. This is huge.

But it gets better: if you really find a performance bottleneck due to the use of immutability, you can fallback to mutability / place-oriented-programming.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: