Programmers should know their trade-offs. It's not so much that STM should be avoided, but that the use cases it was designed for don't often present themselves in production. Same with agents, core.async, metadata, etc.
So I also get a bit twitchy when I hear for wholesale use of Timbre, Component, Schema. Every one of these libraries has tradeoffs. And those tradeoffs should be understood before adopting them. As an example, if you have a need for Schema, perhaps your data model needs some refining. Maybe you need Schema at the edges of your API, but you should define why you need a library before you just adopt it wholesale.
Likewise with Timbre, often we need tight integration with Java tools, so perhaps my project really does need a "native" java logging library.
Perhaps "understand all the trade-offs" should be the mantra of programming, in any language.
Perhaps "understand all the trade-offs" should be the mantra of programming, in any language.
Yes, yes, a thousand times yes! (Perhaps too close to James Joyce, but that's not the point.) Understanding cost/benefit and making good decisions is the basis of winning a lot of games, the game of programming included! A programming shop needs to be good at the task of optimizing its programming, obviously, and as in any optimization task, cost/benefit comes into play.
A lot of his points say to offload to services such as a database or queue. It makes sense that if all you are doing in your Clojure application is some simple business logic then you won't need the more sophisticated tools for concurrency and shared state, but if you were trying to write those services in the first place, or if you need higher performance so that sending everything over a socket to a database isn't realistic, then you may need them.
This really only makes sense for service-oriented software. Not all uses of clojure attempt to saturate use of available resources, and here STM is a win.
But the OP implicitly assumed everyone was writing "production" software, assumably as part of a commercial service offering.
Although he didn't reiterate it in the STM section, up front he was pretty explicit about what he meant by "production" software, really a specific kind of production software. So the recommendations might not apply if you write a different kind.
To make my biases explicit, I mostly write webapps and data analysis on servers in the cloud. If you use Clojure in significantly different applications, these recommendations might not apply to you.
Global atoms provide a good way to confine state into a single place. In my applications they serve as a in-memory database while everything persistent is written to disk in the form of a log (kafka, file append, you name it).
Agents provide a good way to serialise events akin to a transactor. Not every problem requires the parallelism or the concurrency gained by smearing it out into a nondeterministic pile of communicating processes that is a nightmare to test.
In those cases having determinism and centralised state is a godsend. I see them as the atoms bigger brother.
In cases where you need some concurrency or parallelism futures are fine. They provide a simpler interface and can even be canceled as first class citizens, unlike CSProcesses which potentially linger around indefinitely without having a first class handle to kill them or see their status.
Core.async is heavily overused. If your application doesn't contain a multitude of moving parts that all need to execute concurrently I wouldn't touch it with a ten feet pole. The go macro is a clojure to continuation passing style compiler, and don't get me wrong it's a marvel of engineering, but it's also a sausage machine that makes code impossible to debug. The fact that processes aren't first class citizens makes it impossible to build erlang style resilient systems, and creates tons of zombie processes when using figwheel. People immediately grab it when building cljs applications even though a log/queue would give them much more reliable performance guarantees, much better debug-ability and better tool compatibility. (Re-Frame for example ditched core.async for their own queue implementation.)
(If somebody is sharing that sentiment and looking for a kafka style log for cljs, I wrote one and it's been used in production for a while :) https://gitlab.com/j-pb/franz)
Last but not least, transducers make it onto my list of things to avoid. I'm not sure if they or core.async are clojures million dollar mistake.
They are incredibly complex, even brought the spawn of the devil, volatile, into the language. Just for a bit of speedup and core.async integration.
> Global atoms provide a good way to confine state into a single place.
I think it's simpler to pass state as an argument to your functions instead. That's less complex and easier to test.
> Last but not least, transducers make it onto my list of things to avoid... They are incredibly complex, even brought the spawn of the devil, volatile, into the language. Just for a bit of speedup and core.async integration.
Transducers are not just for performance and core.async. They're a hugely useful tool that allows the various collection functions to be applied to sources that are not easily seq-able.
For instance, say I have some source of events, and I can register a callback to receive events. With transducers I can still use all the standard collection functions, even though there isn't an obvious collection to use them on.
> I think it's simpler to pass state as an argument to your functions instead. That's less complex and easier to test.
I completely agree with that sentiment. But I prefer to execute those functions with swap! on my application state.
That way I can always inspect it in the repl during development and production. Which is not so easy when it's just bound in the local variable of some thread somewhere.
As for transducers, tbh I haven't encountered the use case you describe yet.
>> I think it's simpler to pass state as an argument to your functions instead. That's less complex and easier to test.
James,
I was wondering what is the best course of action ini one particular situation. I am using your excellent libraries to create an HTTP API that needs to have state. I usually have an init function and using the ring init feature (see below). How would you not have an atom that hold lets say the DB connection info for a function that renders a html page displaying the version of the database, accessing the dbinfo with @dbinfo (that was updated at the init). Would it be possible to pass in to this function the state? Is it any different if I am calling @dbinfo in the actual function that renders the page or the calling function (router, or app definition)?
Agents are effectively atoms jammed next to an unbounded queue or unbounded thread pool, depending on whether you use `send` or `send-off`. In cases where you have enough contention to actually need formal serialization, you don't want either of those.
If you have low contention, use an atom. If you have high contention, use atoms and/or pieces of java.util.concurrent.
Like you said an agent is a atom with a queue bolted onto it when only using send. ;P
Btw I think your libraries did the job core.async tries to do a lot better :) and to this date I can't figure out why they didn't received the same amount of attention, so thanks for those!
State-full things are always more complex, and I'd say their use is also niche (for example, state-full transducers where you can really shoot yourself in the foot since they grow infinitely in memory).
It's sad to see that STM is not recommended. It is hugely important in Haskell. I jump to believe that this is an artifact of type-driven STM, but it may be something else instead.
I've recently moved from Component to Mount. It is less powerful but just a vast syntactic overhead reduction.
Absolutely avoid metadata. The worst thing ever is using it like how Reagent does in ClojureScript where it actually provides important program semantics. This is a Terrifically Bad Idea.
Schema is wonderful. Schema is terrible. It is nothing resembling a replacement for static types. The more you use it the more obviously it is deficient for such things.
Absolutely just use Clojure.Test. I don't even understand why this is a question. What is up with cutesy testing APIs?
> Absolutely avoid metadata. The worst thing ever is using it like how Reagent does in ClojureScript where it actually provides important program semantics.
Eh? Reagent does not require any metadata, ever. Much less metadata what provides "important program semantics". So I'm pretty puzzled by your comment. Either you have never used Reagent, or our definitions of metadata is really different, which would be odd, given ClojureScript is pretty clear on what metadata is and isn't.
Optionally, as a convenience, Reagent allows "key" to be provided as metadata but that's really a React thing (and available the same way in OM) and that's because it allows React to supply better performance in some cases, but even then you can supply key via means other than metadata if somehow that offends you. If this is somehow "the worst thing ever" I'd suggest you have lived an overly sheltered life :-)
You can avoid it's use through create-class, but it's freely supported and suggested to use it to access the lifecycle callbacks.
Given that there is a supported and suggested pathway to achieving core functionality of the library which involves the use of metadata to define the semantics of your components I stand by my statement. I'm happy to roll back on "the worst thing ever" if a bit of exaggeration is intolerable, but the fact that this exists in the API is a huge wart even given the fact that it's not required.
Weavejester below suggests that use of metadata like this isn't even such a big deal for Clojure, but I cannot agree. In a language with primarily immutable data structures it's just silly to trust metadata as semantics-bearing given that it's invisible, ubiquitous, and often discarded.
> Given that there is a supported and suggested pathway to achieving core functionality of the library which involves the use of metadata to define the semantics of your components I stand by my statement
Sorry, but you are just completely wrong. Have you ever used Reagent? Reagent doesn't require the use of metadata AT ALL, EVER. Much less in some fundamental way, as you claim.
I don't think I've ever ran across a situation where I've needed STM. Most of the time you can just use pure functions, and when you absolutely need state, an atom is usually the best tool for the job. STM strikes me as a very specialised tool, at least in the context of Clojure.
I don't think metadata is a bad idea. It's data about data. As long as you don't assume something will have the same metadata as the data it's derived from, you should be fine. I don't necessarily like the way Reagent uses it, but it's not out of character for Clojure to use metadata semantically.
Haskell noob here. How is STM used in Haskell? What is type-driven STM?
In Clojure, I don't recommend it because it's Just Another Place To Stick State. It's cool, but provides little semantic value over compare-and-set! on an atom.
If you're using an atom, you're already sticking state in a place; the problem with atoms is choosing granularity. If one atom has everything in it, it will become a point of contention.
If your compare-and-set returns false, what do you do?
STM resolves these issues by letting you coordinate updates to more granular, finer-scoped refs
The drawback here is that the STM (at least the Clojure implementation few years ago when I tested it) is orders of magnitude slower than simple atomic reference.
It actually does not have to be, but this is how it is (was) implemented. I built a simple STM (in Java) as part of my bachelor thesis and its performance was comparable to the atomic reference.
> but provides little semantic value over compare-and-set! on an atom
Can you compare and set on multiple atoms at the same time? If not, how do you make updates atomic without using either STM, a global lock, or re-implementing STM yourself with an incremental locking and rollback system.
Nope, atoms are based on simple CAS (Compare And Swap) operation that replaces the reference value. They are pretty much a sugar around AtomicReference in Java (in case of JVM implementation of course).
It is pretty much as like this
1. Dereference a variable and keep the old reference.
2. Create a new modified variable.
3. Use CAS on old reference and new variable
to update the reference to point to the new vairable.
In case of failure (some other thread made the update before) go to 1.
But the version of SMT implemented in Closure is very similar to the CAS, just now you can apply it to the multiple variables i.e. it is multi variable CAS.
It has the same drawbacks and advantages as atomic references: your code does not depend on lock taking order (as they are managed by the transaction manager) and you avoid dead locks as the locks are taken in the same order before applying the update, but now two different threads can start the transaction to update the same variable, but only one of them can finish as the first one and the other one should be cancelled and repeated.
In principle you can get away with a simple atomic reference (that is much faster) when the set of the updated variables is always the same - you make a pair of them and update them together.
Nah, you can't update two atoms and make sure they both change at the same time. That being said, 99% of the time you can have one atom, a map, with two keys, and that'll do it.
Solid advice (I've been writing Clojure for the last 6 years or so and I agree with pretty much everything). I would like to add, however:
* before you decide to use Component, take a look at Mount and see which one suits you better (I settled on mount for both Clojure and ClojureScript and I'm happy so far),
* use Timbre: yes. Definitely. In fact, pretty much anything by Peter Taoussanis will be a good choice (for example, Sente is a wonderful tool for writing client+server apps). If you use Timbre in a mixed Clojure+ClojureScript application, consider sending logging from client-side to server side via Sente. I did and it's fantastic to be able to log something in ClojureScript code and have it appear in your server logs. Just don't overdo it, or you'll be flooded with logs.
* avoid lein plugins: yes, in general. But I still use one, lein-git-version, which names output artifacts using `git describe --tags` and places a version.txt file in resources/. This is difficult to do in any other way.
We don't use Component in production, because it's way too heavy for us. Our web store uses Datomic, PayPal and Stripe, and a few AWS services, and none of these need to be started or stopped.
But we do take a tip or two from stuartsierra's "Reloadable" work flow (which he based Component on), specifically that there should be no global state, and all dependencies should be passed into functions that need them as a parameter.
In our case, we call it `env` since it's the "environment" that a function runs in or can affect, and contains things like the database connection, email service, etc.
Some of these are records which implement a protocol, so that you can have e.g. LiveEmailService in production and MemoryEmailService in development or while running tests.
All in all, I like and have used some of Stuart's ideas, but not necessarily Component itself. There's strong mindshare for it in the Clojure community right now, and it's tempting to look at it as The Solution™ for almost everyone's use-cases because of that. And the ideas it has are good, but that doesn't always mean it's the best solution for each person.
You are right. Params can be references to state, either as an atom or ref, or to a function which accesses external state (e.g. a database). The idea that params somehow obviate the need for state overlooks that.
no, they're just data being passed in. at least thats what it sounds like to me.
obviously this data has ways to manipulate the world, because an app that did nothing would be pointless. but it's no different than any other param.
what would be bad, which is pretty much the opposite of this situation, is if you had global resources that werent passed in as params and caused side affects and you had to access a global variable to get to it (because it wasn't passed in).
Difficult? I have a few lines in a namespace that grabs `git describe --tags` and a few other things and places them in a json file. This is served up internally by all http-speaking applications as `GET /_internal/build-info`.
No, but only because load-file doesn't play well with all of my dependencies. However, that still doesn't seem like job for a plugin, given that this works out of the box with leniningen:
The article has some fairly reasonable rules of thumb to follow. For whatever reason I rarely find refs or agents to be useful, and lexical scoping is more limited and predictable than dynamic scoping.
That said, I feel it's a little too harsh in places. Metadata is fine so long as you don't expect it to persist to derived data structures. If you think of metadata as being about a particular piece of data and nothing else, then you won't run into problems. One place I find metadata particularly useful is annotating event data, which could allow, for example, treating events from different sources with different priorities.
I find that there are a lot of specialised tools in Clojure. Most of the time it's good advice to avoid them, but in the rare cases where they are needed, you're glad to have them. Keyword inheritance, for instance, is something I've been using recently, yet I've rarely had need of it before.
Pretty good list that I mostly agree with. I think the only point of contention I may have is with using timbre--having tried it in the past, I found it didn't do much that clojure.tools.logging couldn't do for most use-cases, and added configuration complexity while not sparing me the trouble of dealing with logback XML configs and whatnot--Java logging seems to rears its ugly head no matter what you do, especially when using a fair amount of interop as our project does (and as any big project probably does to some extent). Seems like it's best to simply suck it up and learn how Java logging works. But, it's probably worth giving it the benefit of the doubt, and perhaps I should give it another shot--it's been a little while since I last tried it so maybe it's improved, or maybe I simply didn't understand well enough how to set it up effectively.
Otherwise, lots of great points. One other thing I will say about Schema (https://github.com/plumatic/schema): while it's a lifesaver in a lot of ways, its mere existence really does expose some of Clojure's deficiencies when it comes to the type system (or lack thereof). I'm probably in the minority in the Clojure world but I really wish that it had better static typing and a more sophisticated type system sometimes. I've mostly made my peace with it but once in a while I look longingly at ML-family languages from afar...
Author here. I agree, Timbre is new, and has some rough edges, but in my projects, I've found them worth putting up with. In my current project (18k LoClojure), I think the only serious dependency I have that uses j.u.logging or SLF4J is Datomic. Timbre has an interop library that will pipe java logs through timbre.
I agree knowledge of how Java logging works is still useful, but I'm not willing to be part of the problem anymore :-)
Agreed on Schema, and how it illustrates deficiencies in the type system. I still want core.typed, or something like it, but it's too immature for production use.
Totally understood re: timbre and not being a part of the problem of the Java logging ecosystem. I'll play with it again next time I have a chance and see if the interop lib can't solve some of the issues I had before.
One other thing I will say about Schema (https://github.com/plumatic/schema): while it's a lifesaver in a lot of ways, its mere existence really does expose some of Clojure's deficiencies when it comes to the type system (or lack thereof).
What are good examples of situations where static typing is better than schemas?
I agree, both in the "mostly agree with" and disagreeing with the list regarding timbre. Adding yet another logging library to the mix doesn't fix the mess of Java logging libraries. Especially given that timbre seems to share the logging philosophy which makes Java logging such a mess in the first place; e.g., bundling an e-mail appender.
I'd personally add the potentially controvertial "prefer transducers to lazy sequences." Lazy seq laziness is a big source of errors for newbies, and even for old hands since 1.7 Iterable-backed lazy seqs have surprising chunked realization behavior. Transducers take a bit more up-front effort to gain familiarity, but then yield fewer surprises.
I am not sure about this article. Using Clojure for ~6 years I know one thing for sure: use whatever gets the job done for you. Using X, don't use Y is just another way of saying, you do not need to get to know the tool you are using. Offering very little insight into why that decision has made is just not really appealing to me as an engineer who likes to know the internals of the tools he uses, does not matter if it is the linux kernel, a firmware of a high level programming language.
I agree with most of the recommendation in this article (thanks for publishing it!), but the advice about futures should be taken with a grain of salt. You need to be careful with futures because they swallow exceptions. However, the solution given (setting a global exceptions handler), while sound advice in general, doesn't actually solve the problem of hidden errors, as futures catch all exceptions and return them (so the caller can handle the failure in whatever way it prefers).
I think brevity is far more important than being explicit all the time. When we speak, we use a lot of context to infer what is said - no reason code shouldn't look like that.
The problem with that is that if `put!` or some other function is used inside `map` or some other lazy context, it may get realized outside the context of `with-read-txn`. Making `binding` part of your public API breaks referential transparency.
So what would you recommend that function look like? I think it is bad taste to say (with-read-txn db-name (put! db-name)) - no reason you should specify again what db you are dealing with.
There's plenty of reason: you can put any code you like inside of `with-read-txn`, including closures that are evaluated lazily. If you dislike the duplication, create a syntactic form that allows less, like this:
(transaction-> db-name
(put! ...)
...)
This doesn't lend itself to every kind of action, but it is narrower. Alternately, create a variadic version of `put!` which guarantees eager evaluation inside of a transaction.
The nightmare scenario here is not that the sequence will lazily evaluate outside of `with-read-txn`, because that at least will throw an error. Rather, it's that it will be evaluated inside a different transaction, without anyone ever realizing it. By leaving that possibility open, you're doing a huge disservice to the users of your library.
Ok I see your point. I am curious how that would happen though - essentially you'll need to bind the txn to a different transaction from the one intended and the whole thing locks up.
I am curious how to guarantee eager evaluation - dorun, doall and run! are all sequence-oriented right?
It's not that hard to imagine: map a `get` over a series of keys inside a transaction, return that lazy sequence, and use the results to do another series of operations within a different transaction.
Libraries typically can't guarantee eager evaluation, since that's a property of the top-level execution. That's why libraries shouldn't use `binding`.
I like the write-up, but I disagree with the reasoning to avoid STM. It sounds a lot like "avoid ConcurrentHashMap in Java". It's true that web applications mostly persist their data in a database or some other persistent structure, and obviously an STM or a ConcurrentHashMap would not do you any good for that. However; if you are working with in-memory data, then STM is a very valuable tool just like ConcurrenthHashMap is.
That was going to be my point as well. My “production” application runs light shows for electronic music events. The only state that matters is in-memory in the STM. There is no database. STM works wonderfully in the rare cases where there are multiple values which need to be changed in a coordinated way.
The main feature that Clojure provides for concurrency is immutability, not just the possibility of using existing data structures in an immutable manner, but an entire suite of immutable data types and data structures. This makes concurrency, even using Java's primitives, like Thread, (which was the original way Clojure was intended to be used anyway,) with Clojure's data structures is a win over using Java's if you want safe, predictable behaviour in a concurrent program.
Thanks, skimming through clojure's website it mentions that Clojure simplifies multi-threaded programming in several ways. Because the core data structures are immutable, they can be shared readily between threads. which is exactly what you said, I though its concurrency features where emphasized more on its refs stuff(refs, agents) or futures, delays, promises. Also reading that clojure being hosted is a feature then I guess leveraging the host is idiomatic, (using Thread or other concurrency APIs).
Can you elaborate on why using Thread was the intended way to use Clojure for concurrency?
Most of these are excellent pieces of advice, and practices I generally try to adhere to. I don't use schema as much as I used to, but timbre is fantastic, clojure.test is great, and all of the core points are excellent. I've never had cause to use agents or STM, and atoms are best used when enclosing some stateful function.
It would be great if the author of the post expounded on the good libraries to use. In the couple of false starts I have had with clojure I spent most of my time evaluating libraries instead of writing code.
Also: Seeing what clojure considers bad parts makes me laugh since I spend most of my day writing python and javascript.
More language orthogonality more problems I guess..
I would love to see the front-end equivalent. Recommendations for what to use and avoid in clojurescript. To OM, or Reagent or something else. Global front end satte or not and so on. I am much more befuddled there than in pure clojure.
I agree that using binding to avoid passing arguments is a bad design. That is not the only use of binding or more generally dynamic/special vars.
Clojure has naive implementations of pmap, send, send-off & future. They're useful in the REPL, but you need to understand their shortcomings before using in would-be production code. Promise & delay have limitations, but are not so hamstrung as former functions.
Even if you don't expect any function to preserve meta, metadata is still useful. E.g., I've used it on a compiler to provide compiler tracing and decompilation.
Component is not essential. It helps if you're accustomed to OO languages. It's arguably better than the procedural style of with-redefs & bindings.
As amazing as core.async is, I am not a fan. I'd rather use queues, logs or an actor library.
Sometimes it's for Java/Javascript interop or access to the large variety of libs in those languages, but other possible reasons you might pick Clojure instead of another Lisp are:
1. immutable by default
2. strong focus on concurrency/parallelism
3. well-thought-out design of core library (one of the most cohesive I've ever seen)
4. largest single Lisp community (just going by GitHub, Clojure has ~18k repos, CL has 10k, Racket has 5.5k, Scheme has 7.5k)
5. unified syntactic sugar for common data structures like maps/arrays
The above are just factors that distinguish it from other Lisp-based languages. If you're coming from Java/C/C++/Ruby/Python/PHP/etc, there's way more reasons than that.
Clojure scripts take several seconds to start for me, and even jars take almost a full second. Am I doing g it wrong? Do people actually find this tolerable, when they are piping several of these scripts together, with the later stages being started a new for each input?
I must be doing it wrong. Are these clojure build tools supposed to run as daemons, that take requests from... bash scripts?
The Thread/setDefaultUncaughtExceptionHandler code! I read that and went, Oh! Is that how you do it! Why is this not the default? I've found the silent exception raising in futures to be one of the warts of Clojure. I had resorted to a macro that I used instead of future that caught the exception and reported it.
Metadata was a really nice idea and I was exited about it, back in 1.0. Sadly because of clojure hosted nature, it was really hard to interduce. Their is to much native stuff that can not handle it very well. Its sometimes nice, but not often.
If your doing a new language, you should look into it.
If you're just looking for a starter kit with everything included, consider Luminus. It gather a bunch of libs together and shows you how they interoperate to handle web requests.
So I also get a bit twitchy when I hear for wholesale use of Timbre, Component, Schema. Every one of these libraries has tradeoffs. And those tradeoffs should be understood before adopting them. As an example, if you have a need for Schema, perhaps your data model needs some refining. Maybe you need Schema at the edges of your API, but you should define why you need a library before you just adopt it wholesale.
Likewise with Timbre, often we need tight integration with Java tools, so perhaps my project really does need a "native" java logging library.
Perhaps "understand all the trade-offs" should be the mantra of programming, in any language.