While I definitely agree Rust is a much faster language than Clojure, I would be interested to see benchmarks on your code that show just how much faster your Rust code was on the same data.
I also noticed that you mentioned avoiding lazy sequences is not idiomatic in Clojure. I disagree with this since using transducers is still idiomatic. I wonder if you could've noticed some speed improvements moving your filters/maps to transducers. Though I doubt this would get you to Rust speeds anyway, it might just be fast enough.
I just moved a medium sized codebase from clojure transducers to JS, and after having used clojure for 7+ years, and done so professionally, I don't wanna go back, ever. The JS solution is shorter, faster, and easier to understand.
I'm thankfull for the insights into reality and programming clojure has provided, but highly optimised clojure is neither idiomatic nor pretty, you end up with eductions everywhere.
Combine that with reaaaallllyy bad debuggability with all those nested inside out transducer calls (the stack traces have also gotten worse over the years, I don't know why, and a splintered ecosystem (lein, boot, clj-tools)) I'd pick rust and deno/js any day for a greenfield project over clojure. sadly.
Yup, it's like the leadership is actively hostile towards community building.
* Prismatic Schema, immensely popular, was "replaced" by spec, which is not yet complete and still in the research phase
* leiningen (one of the best language tooling out there) was "replaced" by Clojure CLI that can't do half of what leiningen can
* transducers (a brilliant concept) are not easy (as in close at hand) because the code is quite different to normal lazy-sequence based code (I wrote a library [1] to address this)
I still prefer Clojure for all my side projects, but it is very clear that the community is tiny and fragmented.
Schema and spec do not target the same functionality.
In general I agree that it would be best if a small community does not spread itself too thin, but on the other hand, can't Hickey, Miller et al work on what they are interested in? They have published rationales for why those "splintering" works are of interest and they make sense. It seems incongruent to happily use a tool that was born out of an opinionated design, and then complain when the authors keeps pushing their opinionated PL design.
I have limited exposure to Clojure transducers but I spend most of my time writing JS/TS and I've found thi.ng/transducers[0] a pleasure to work with and super elegant for constructing data processing workflows.
That's a very common scenario for Clojure users to go through and it's one of the reason it has so many abandoned/unfinished libraries (although 7 years was a lot). After they have gather all the insights they can from Clojure and its ecosystem (which is a worthy endeavor IMO), they go back to their big ecosystem mainstream programming language because of all the benefits you get from it even if that programming language is worse. It also doesn't help Clojure the fact that JS 2020 is way better than JS in 2010 and that you can easily bring all your Clojure insights/concepts to JS.
Every time I need to use another language outside of Clojure, it feels like most other languages are... confused. A bad case of "designed by committee" experience.
> After they have gather all the insights they can from Clojure and its ecosystem (which is a worthy endeavor IMO)
Just out of curiosity, what do you mean by this? I've never used Clojure, but have done a fair bit of hacking in other Lisp dialects. Do you (or anyone with an opinion) think there's some insight benefit to Clojure specifically vis-a-vis Racket/Scheme/etc?
Production implementations of persistent data structures in an industrial VM plus abstractions for state management, polymorphism, concurrency, the sequence abstraction, etc... It just gives more things for day-to-day programming. While Scheme gives you good foundations, you have to build a lot stuff yourself, it's too primitive (haven't follow Scheme since R5RS). But it's really mostly about the literal data structures and leveraging them anywhere you can to represent information, it's maps everywhere. Data oriented solutions is the common term use in the community. This answer by one the Clojure maintainers sums it up better than me: https://news.ycombinator.com/item?id=25377022
Racket extends Scheme with useful stuff too for everyday programming but Clojure's immutable data structures with its big library of functions for manipulating them in a nice abstract generic way, with the fact that in runs in the JVM gives it a big edge for "real world" programming IMO. You do need strong knowledge of Java and the JVM for critical services.
Almost all your knowledge of Racket and Scheme will transfer and be valuable for Clojure, so you already know most of it and have a big head start if you plan to learn it.
Also one of the amazing aspects of JS ecosystem is TypeScript - structural type system on top of an open object system is such a flexible and pragmatic tool it's amazing.
Last time I used Clojure (probably 5 years ago to be honest) the lack of static typing combined with the functional nature made complex imperative code (which you're sometime forced to write, and there are examples of such code in standard library) almost impenetrable.
How is the experience with transducers? I did not try it but I fear that you'd first see the transducer being constructed into a heap of anonymous functions and clojure implementation details, and then afterwards you can step into how that gets executed in each element of the stream. I have had unpleasant experiences with clojure debugging... so I feel what your parent post is saying.
Perhaps it's just a different way of doing the same thing, but I never feel the need to reach for a debugger in the presence of the REPL and tools like Timbre. Just wrap what you want to see in (timbre/spy ...) and you're good to go.
"Debugging clj / cljs is hardpartly due to transducers" -> "I can step though my code with cursive" -> "how does cursive handle transducers?" -> "you don't need a debugger"
I doubt it will bring much. If properly implemented, there is nothing that makes generator-like laziness slower than transducers, and since it is pretty central to clojure I doubt you will see much speed gain by using transducers.
In scheme, the srfi-158 based generators are slower than my own transducer SRFI (srfi-171) only in schemes where set! incurs a boxing penalty and where the eagerness of transducers means mutable state can be avoided.
Now, I know very little clojure, but I doubt they would leave such a relatively trivial optimization on the table. A step in a transducer is just a procedure call, which is the same for trivial generator-based laziness.
When I was reading the article, I thought the author of the post was probably pointing more in the direction of Clojure's immutable data being slower, rather than laziness specifically.
IME (admittedly in a different context, doing UI development) Clojure's seqs and other immutable data can be a huge performance drag due to additional allocations needed. If you're in a hot loop where you're creating and realizing a sequence immediately, it's probably much faster to bang on a transient vector. Same with creating a bunch of immutable hash maps that you then throw away; better to create a simpler data structure (e.g. a POJO or Map) which doesn't handle complicated structural sharing patterns if it's just going to be thrown away.
Transducer's would help in the author's first case to take the map/filter piped through the `->>`, which is going to do two separate passes and realize two seqs, and combine it into one.
I stand by what I said (even though I am partial to transducers): there is no reason for lazy sequences overhead to be much more than a procedure call, which is exactly what a step in the reduction in the case of transducers is. At least when implemented as in clojure or srfi-171.
I understand that there might be some overhead to creating new sequence objects, but removing intermediate sequence object creation should be simple, at least for built in map-filter-take-et-al.
Edit: I seem to be wrong. People are recommending transducers over clojure's lazy map-filter-take-et-al because of overhead, which to me seems off, but there might be something about clojure's lazy sequences that I haven't grooked.
I probably didn't make it any more clear in my reply. Transducers don't win based on allocations IME but because it removes the number of iterations required.
Take the case: (->> (for ,,,) (map ,,,) (filter ,,,))
`for`, `map` and `filter` will all create lazy sequences, holding a reference to the previous one. The problem here is when the `filter` seq gets realized, it will first realize the `map` seq, which will realize the `for` seq. Each sequence will wait for the next one to realize before iterating over it. So in this case it will iterate twice; once for the `map`, and then again for the `filter`.
As you know, transducers combine these steps together so that you only iterate over the initial collection once.
My other comment was making the point that the author has conflated "laziness" with "immutable data" AFAICT. The lazy seqs in the first example they give will be slower w/o transducers, but the other problem is that the overhead from all the allocations required for creating a bunch of immutable hash maps that are then destroyed immediately after is also non-negligible, and seems to be a source of the authors performance problems.
I think what bjoli is getting at is that in an ideal world with lazy sequences you don't have any iteration over any of the intermediate "collections" until some step occurs that requires the entire collection. I put collections in quotes because they aren't really collections, they're generators that produce an element on demand.
So you never really iterate over more than one collection; you only have one iteration where you successively ask each generator to produce a single element and then apply a function to it. For example, if you only asked for the first element of a lazy sequence formed by a series of maps, you would (in theory, in practice see my note about chunks) never iterate over any of the intermediate sequences and only ever examine the first element of each of them.
However, the act of asking a generator to produce an element (that is unwrapping a thunk) has overhead of its own and that's the overhead that a transducer would be removing (not iteration itself in the case of lazy sequences). This can have far more overhead than a procedure call because of bad cache locality (in the absence of fusion optimizations you're pointer chasing and potentially generating garbage for each thunk). Clojure tries to get around that by often (but not always) chunking its thunks, in which case we do have multiple rounds of iteration on smaller chunks, but never the entire sequence (unless the entire sequence is smaller than a chunk).
What I am saying is that lazy sequences in my world should mean you don't have to realize any intermediate collections. In the case of the srfi-158 generator (gmap square (gfilter odd? (list-generator big-list))) the overhead for getting one element from the generator would be 3 procedure calls. Without any intermediate steps. The same transducer would have one procedure call less, but would be in the same vicinity.
Does clojure's sequences not work similarly? That seems like a very lax definiti
Yes and no. There's real production cases where even default Clojure can result in the same or faster performance than Java. One hypothetical case is a system that does a lot of complex in-memory reads and a few writes to persistent data structures. That kinda of system could be faster than the Java equivalent, out of the box, in Clojure.
A write-heavy system would benefit Java mutable collections out of the box. Clojure can get pretty close to Java with transients and a good dose of type hinting in all the right places.
When we say "faster" or "slower" it's equally important to specify "faster" or "slower" when and where. It's a complex question with no easy answer.
> One hypothetical case is a system that does a lot of complex in-memory reads and a few writes to persistent data structures. That kinda of system could be faster than the Java equivalent, out of the box, in Clojure.
Why would Java be slower in a read-mostly regime? Your hypothetical is not convincing. Btw, you mention "real" and then move on to "hypothetical" as an example.
Are there actually OSS "production" cases of this subset of systems where Java lags behind Clojure in performance?
> When we say "faster" or "slower" it's equally important to specify "faster" or "slower" when and where. It's a complex question with no easy answer.
These sort of subtle distinctions only matter to langauge wars and debates like this. For actuals systems that need to do work and need to be maintained, we can in fact have metrics on efficiency.
That said, fundamentally, Java affords much greater facilities to "optimize" and approach white hot performance than Clojure.
> Why would Java be slower in a read-mostly regime? Your hypothetical is not convincing. Btw, you mention "real" and then move on to "hypothetical" as an example.
Because of Clojure's default persistent in-memory data structures which allow you to obtain a stable reference to your in-memory data even when the data is being updated live. With Java's default mutable data structures, you'd have to use locking or copying to obtain a stable reference, not to mention the huge complexity of those solutions. I said real because I've built similar solutions in Clojure. Substitute "hypothetical" with "example", sorry for the word confusion.
> That said, fundamentally, Java affords much greater facilities to "optimize" and approach white hot performance than Clojure.
Taken to the extreme, the resulting Clojure would not be idiomatic, it would be effectively Java-in-Clojure, but Clojure has all the facilities that Java has, by definition.
In read-mostly regimes, the performance considerations are typically systems-level consideration. Data locality, page swaps, and cache-line misses are typical concerns of high performance systems engineering.
Regarding "complexity", not sure what you mean. IIRC, someone, possibly even me ;), may have pointed out to Rich Hickey in the early days that Java code could also use those (Java or was it Scala) STM libraries. So, there is your "complexity" behind an API, just like Clojure.
> Clojure has all the facilities that Java has, by definition.
Ergo, a situation where Java can -not- be made as fast or faster than Clojure code seems unlikely.
I think all we have here is you claim of "complexity". I remember Rob Pike making a comment along the lines of "prefer libraries over language semantics" or something like that in context of Go language design. I find it a very compelling argument, based on my overall experience to date.
Transducers are definitely idiomatic. They are more general over "similar things to transform in steps" (including sequences, messages and so on), so you can apply them to collections ("I have the whole data in advance") or channels ("I get the data piece by piece") and so on.
Another idiomatic way to improve performance are transients[0]. From the outside your function is still a function, but on the inside it's cheating by updating in place instead of using persistent data structures. See the frequencies function for a simple example[1].
Clojure and Rust are both very expressive languages and even though they both can be considered niche, they have _massive_ reach: Clojure taps into JVM and the JS ecosystems, Rust can also compile to WASM or be integrated with the JVM via JNI.
The big difference between the two, and why I think they complement each other nicely, is that Clojure is optimized for development, and does its best at runtime, but Rust is optimized for runtime, and tries its best at development. (A similar take in the article). In other words: they both achieve their secondary goal well, but resolve trade-offs by adhering to their primary in the vast majority of cases.
i like how in the end the system was replaced by a database solution
every language need easy access to a query-ble database
many problems are a lot simple when solved declaratively
as a query-ble database
the relational model, is functional, and is a very good solution to a wide range of problems
i think the Sqlite engine should be integrated in the standard library of every language, and either use sql, or the language can provide a native sql alternative in the original language itself, or we can create a new standard language (because yes, sql can be improved upon)
I think Chris Date D language can be a place to start to investigate SQL alternative , or as a language that can be more easily emulated in other languages
Also, as I recall, the Notion of RelVars made it sound functional
Each Relational Operator, returned a RelVar
So you were passing RelVars between operators until you get the result you want, which was a RelVar
Anyway this is from memory, so I maybe be wrong on many things .. but again from memory D or tutorial D sounded a lot less DSL-ish and lot more functional than SQL , which was an improvement in my mind
A bit sad there was no profiling done, or at least the article doesn't mention it. Maybe optimizing Clojure wouldn't have been that hard, could have been only a few places needed tweeking. In any case, Rust is obviously targeting high performance in a way Clojure isn't. Rust is faster than Java, and Clojure can only ever match Java in performance, not exceed it. But still, it's not clear if the author tried to optimize the Clojure version or not?
> Rust is faster than Java, and Clojure can only ever match Java in performance, not exceed it.
The Rust vs Java question translates to the age old C++ vs Java argument, where the counterpoint is that Java can be faster because JVM has no significant disadvantage in code generation but JIT and GC can be faster than AOT and malloc, and then there are many back and forth arguments and nobody changes their mind.
In another sense, ease of use and HLL properties of languages can in practice give performance advantages. Given the same amount of time, the programmer of a more expressive high level language might have more time to iterate and to do algorithm work that end up being much bigger effects than the relatively small differences of compiler code generation.
(The word performance of course also has meanings other than code execution speed...)
> and then there are many back and forth arguments and nobody changes their mind.
except that people routinely rewrite java code in C++ in 2020 and run around the Java code in circles, even when tuning GCs etc etc, à la https://www.scylladb.com/2020/10/06/c-scylla-in-battle-royal... or Minecraft Bedrock edition (C++) vs original Java Minecraft
How many times have there been rewrites from C++ to Java that ended up being faster ?
How many times have there been rewrites from C++ to Java that ended up being faster ?
A couple of times as far as language implementations go (eg JRuby vs CRuby, even before the whole Graal/Truffle thing). However, that's a niche thing, and any JVM implementations have yet to replace the standard runtimes.
As a rule of thumb, I agree that C++ should generally be considered the 'faster' language.
I would be surprised if successful C++ to Java rewrites yielding better speed were rare as we move to multicore and correcness & safety are the bottleneck problems there. But I'm not a Java programmer or follw the scene closely.
Also the narrower niche of C++ means that it often makes sense to rewrite a small part of a Java app in C++, but it rarely makes sense to write a small part of a C++ app in Java. This domain difference can also explain the relative frequency of public "made it faster with C++" posts, since those small self contained uses make good posts that aren't entangled with the bigger application.
Multicore & multithreaded architectures only prove even more problematic as you do context switching and have sync io. That's precisely why it's not just the language (C++); it's also moving to a highly async architecture with shared-nothing, non-blocking architecture to get the best advantage out of your multi-core, multi-cpu machines. (Disclosure: I work at ScyllaDB.)
I agree with you, but I think the assumption here is that we're comparing two code bases that are both trying to be performant. The Rust defaults will probably start off more performant, and the ceiling will be higher as well.
A lot of the performance comes from the different paradigms though, so it's not always an apple to apple comparison. But I also think that's an assumption being made when talking about a Clojure Vs Rust implementation. In the latter, you're most likely implying using mutable collections, fixed size structs, primitive types, and a tighter memory allocation surface. And not surprisingly, those are the same changes you'd make to your Clojure code base to speed it up (most likely).
You're technically correct, but the typical Java program making heavy use of threads has inefficiencies (and incorrectness) that would be avoided with Clojure's higher level async APIs. As it's easier to write idiomatic, performant C than the "faster" ASM.
Not very clear what the diff tool is attempting to do.
Just looking at the Clojure code, I feel there's better approaches in Clojure to achieve the same or better results.
More clarity would be helpful.
Also, transducers are a big performance win for long sequences of values.
You cannot even begin to guess where your performance problems are until you use something like YourKit (https://www.yourkit.com) which is an excellent tool. With very little effort and a few type hints you can sometimes more than double your Clojure performance.
I agree that Rust has some really great concepts. The example code makes heavy use of traits for example, which are very ergonomic and provide a dynamic feel.
The context is a rewrite AKA runtime optimization. So the result is already understood. A great use-case for Rust is top-down implementation.
Also the code doesn't show any of the more painful cases. From the article:
> There are some inefficiencies visible here, and they're probably the most important spots for performance improvements. But they're still there as fixing them was too hard and/or time-consuming for me.
Resolving these "inefficiencies" is where Rust really shines. Because it _can_ resolve them and does it internally consistent on top of it. But at the same time, this is where you really _slow down_ in development and need to think about the more complex and intricate concepts such as lifetimes and borrow semantics.
As someone who has been writing a lot of Rust, I think that's only apparent... there's lots of things you can't do in Rust that you can in a GC-based language. The restrictions Rust imposes on you (mostly about the borrow checker, especially when you have mutable values) makes it much harder to write code... if you make it easier by cloning happily, you might end up with worse performance than in the GC-based language. If you don't, I guarantee your code won't look pretty with all those lifetime annotations.
Great post! I like articles where the author has practical experience in two languages and compares them. Helps me make better decisions for future projects.
> After switching to Rust, I had to implement more complicated logic that resolved dependencies between the rules I was diffing. This became complicated enough that even with a static type system I could only barely understand it. I doubt I would have been able to finish it in Clojure.
I have no idea what the author's trying to say here.
Sure, but having used static vs. dynamic languages extensively on gnarly problems, it’s pretty clear what they’re saying. There comes a point where dynamic languages can be quite hard to debug compared to language with well deigned types, especially when working on tricky problems with many constructs and elements. Less rules does not equal greater productivity.
No it's not clear, this is another rehash of the static vs. dynamic debate, and you're welcome to your side. It's ludicrous to claim that highly complex systems cannot be written in a maintainable fashion with a dynamically-typed programming language. Anyway this gets to what bugged me about the post, the author is plainly not comfortable with Clojure and wants it to be Haskell-y. So of course they're going to be happy in a language like Rust.
I think it comes down to the specific codebase for the Clojure version vs. the Rust version. In this case, after rewriting the code into Rust it was easier for the author to understand and reason with.
Not that you can't do it in Clojure or any given language.
In the end, they're happy with Rust. I really like what little Rust I've done. I still reach for JS/Node first only because I can get something working faster often because of what's in the box and in npm. It's far from the most performant, but often fast enough. C# tends to be my second level, though as I become more familiar with Rust, it may displace this.
You can't be an expert in everything and sometimes a given language will lend itself to the way you think in idiomatic terms.
The article is an anecdote on a personal experience, not an assault on Clojure or the ecosystem. Even if it is critical at a couple points.
Okay, why is there this attitude about dynamic typing vs. static typing that we can't have debates about it? As if it's an inherently subjective debate, not worth discussing? How is it ludicrous to say that for big programs, rules that allow the computer to automatically verify things for you can never be proven useful?
Modern statically typed languages are not the abtrusive, verbose dinosaurs of yesteryear. They can be used without type annotations quite often, and have almost no expressiveness limitations compared to dynamic languages.
I love clojure. It's almost my favorite language, the syntax is amazing and the experience of using paredit and the many bespoke control flow structures that make your problems easy to model make it feel like God's language. But yeah, debugging is massively painful. Stack traces are useless, and the lack of types makes errors pop-up in places you wouldn't expect.
That's not a subjective thing. It's a weakness of the language. There's no drawback to having optional static guarantees. Can we stop pretending this debate is not worth having?
I love clojure and I wholeheartedly agree with what you say, if that serves anything.
I don't understand, for example, why stacktraces do not fold by default everything from clojure.core. Can't we assume that the language implementation is right for 99.999% of the world's stacktraces, and only show functions in other namespaces? In case some day someone finally triggers an error in clojure, you toggle a parameter and get the full stack.
The tooling does support customizable stack trace filtering pretty well I think (at least in Calva and CIDER, probably Cursive too?).
I wouldn't want to filter out clojure.core frames because you can click on those and see the line of code in the stdlib that is calling back into your code when you use higher order functions, it's useful.
This should be a default at the language level, not require tooling to filter out verbose noise.
Apart from the principle of the language being not dependent on external tools, on practical terms: if you get a stacktrace in a production log, or on a terminal, you don't have that tooling available, for example.
This is precisely what mistersys was denouncing. This is an objective issue, that pushes newbies away, that consistently gets discussed in the annual survey, in HN, and in basically every conversation about clojure. Spec has improved somewhat the error messages, but still core functions are not specced officially, and still the error messages are hugely verbose and expose implementation details that should not be visible. We should at least acknowledge it instead of pretending it's solved, or not an issue due to tooling workarounds.
On your point about seeing the clojure.core stacktrace: the implementation of a core function shouldn't be something that you need to inspect to know how it's working, that should be clear from the public interface. That's part of the clojure philosophy, have small functions that are obvious and compose easily in your head, so you understand every piece easily. In any case, I've been using clojure on all my side project for about 8 years now and I've never had to do what you discuss, could you give an example of when would you need this?
I'm just arguing about the default case. If you need to introspect the standard library that can be a flag / global variable toggle / whatever, but that's not typical.
Sometimes the architecture/algorithm matters, and sometimes the architecture/algorithm needs to align with the language. Absent seeing the broader code base [1], I'm inclined to think that the author's larger design led to these expensive functions existing as they did [2].
Pure speculation on my part, but if one has a lot of experience with imperative, mutable languages, one might design a system that ends up being not so great when written in a functional, immutable language. If so, then seeing improvements when directly porting to an imperative, mutable language might be not so surprising.
Tangent: Regarding the power and importance of code structure, I highly recommend watching "Solving Problems the Clojure Way" by Rafal Dittwald at Clojure North 2019 [3].
[1] I didn't see a link, but if it's available, I'd love to take a look.
[2] The `rule-field-diff` function for example seems to be burdened with some odd choices, e.g., taking in two "rules" as arguments, (which seem to be collections of rules keyed by field), then using two hard-coded "operations" (also keyed by field), and yielding a map whose values are sequences by field (I think). Off the top of my head I don't see why this fn needs to work across multiple fields in the first place (i.e., any field-specific "loop" should be in a surrounding context. Ditto for `diff-rules-by-keys`.
That Rafal Dittwald video is excellent. It gives a small but illustrative comparison of procedural, oop, and finally functional... and using javascript (thereby making it accessible to non-lispers).
It’s a fair comparison because many developers, including the author, would have to make a choice between languages, some of which would be GC and while others are non GC.
I also noticed that you mentioned avoiding lazy sequences is not idiomatic in Clojure. I disagree with this since using transducers is still idiomatic. I wonder if you could've noticed some speed improvements moving your filters/maps to transducers. Though I doubt this would get you to Rust speeds anyway, it might just be fast enough.