Rules for Writing Clojure Programs

Raphael_Amiard · on July 26, 2010

My naive 2 minutes solution :

    (reduce #(assoc %1 %2 (inc (get %1 %2 0)))
            {}
            "abcaaabcccaabrrcb")

EDIT: While it's shorter and (in my opinion) a lot clearer than the OP's solution, it's also consistently faster. Functionnal code and composition is cool, but i think the author got trapped into making his code beautifull, or maybe just into the wrong algorithm for the job. It happened to me before in clojure and other functional languages

Lozzer · on July 27, 2010

Or as mentioned in the comments

    (frequencies "abcaaabcccaabrrcb")

with some optional sorting if that's required.

KirinDave · on July 26, 2010

I and many other experienced people on #clojure@freenode looked at this.

Heard comments such as:

    “Huh.”
    “I can't believe -> and ->> are being demonized now.”
    ”I think if you're calling things rules you should be somewhat careful about how you phrase them...”
    “Someone is wrong on the internet.”

sqrt17 · on July 27, 2010

The real reason why you want ->> can be seen in the code example for rule #7: instead of mapping the result of partitioning, it (wrongly) reuses a (i.e., the original list).

If bugs result in irregularity, then making regularity explicit (by using ->> or other macros/higher-order functions) can be a good thing.

KirinDave · on July 27, 2010

Good catch!

That's a very good point, and a very hard bug to see when you "get lazy" and use a, b, c as step names.

alnayyir · on July 27, 2010

Sounds about right.

Since you're apparently familiar with both Clojure and Erlang, what are your feelings regarding the concurrency model in each?

KirinDave · on July 27, 2010

They are both good for different things. :)

The clojure model means state is easy to move around. This is awesome. The Erlang model makes the server-like nature of each component very explicit, which can significantly reduce the complexity of some types of systems.

alnayyir · on July 27, 2010

Cool, thanks for your thoughts.

I always felt like Clojure was attempting a permissive system, whereas Erlang really is pretty up-front about the environment it's designed to work in.

I keep meaning to fiddle with Reia but never got around to it.

shasta · on July 26, 2010

I was going to complain that parts of the article were disjointed and contradictory (Use -->, don't use -->, JK use it) until I noticed the name of the blog is "Two Guys Arguing."

Solution from the comments: (apply merge-with + (map (fn [c] {c 1}) “abcdaabccc”))

ataggart · on July 26, 2010

The problem with the latter solution is that apply isn't lazy. The more correct solution would be a simple reduce:

  (reduce #(update-in %1 [%2] (fnil inc 0)) {} "abcdaabccc")

praptak · on July 27, 2010

Apply itself is lazy. This: (apply f infinite-lazy-list) does not force evaluation of the infinite-lazy-list, if the f itself does not force it. My favorite example is (apply concat ...), which can be used to flatten an infinite sequence of sequences (concat is lazy wrt to its argument list, so this works.)

runevault · on July 26, 2010

I wonder if update-in or merge-with would be more efficient.

Another thing to consider is if you want to get fancy (persistent! (reduce #(update-in %1 [%2] (fnil inc 0)) (transient {}) "abcdaabccc"))

Still no REPL but this should work and take advantage of transients if you're doing it against a bigger dataset (which I assume you care about since you also worried about laziness)

runevault · on July 26, 2010

It always amazes me how many people either never noticed merge-with or simply don't use it. Having done something similar with a list of words (created via regex parsed over a file) I adore what you can do with it. Especially if you use transients.

Unless of course they just didn't think of the fact strings are seqs in clojure by default.

edit: I could be crazy, but one problem with using apply is you force the realization of your object, so while in THIS case it's fine, using larger amount of data would cause a problem. IIRC more space sane way to do this is use (reduce {} (merge-with + %) list)

and man I MUST be rusty, I can't remember if the default value comes before or after the function, and no repl handy. Hopefully the point is gotten across :)

jared314 · on July 26, 2010

Looking from the outside in, clojure has too many useful/amazing functions to learn in an afternoon, weekend, or a month. So, not using something you don't understand is perfectly valid in my opinion.

runevault · on July 26, 2010

Replying to myself. Just googled it and I had the code wrong shame

(defn count-words [coll] (reduce #(merge-with + %1 {%2 1}) {} coll))

praptak · on July 26, 2010

A tradeoff to be aware of: the introduction of sort raises the complexity of the solution from O(n) to O(nlogn).

aphyr · on July 26, 2010

I saw that and flipped out. It also requires an extra two scans through the list, in addition to the memory overhead for the temporary structures. Happily, the merge-with suggestion is much more efficient.

amalcon · on July 26, 2010

The enormous irony here is that this is one of the rare cases that you could actually speed up using a non-comparison sort, like a counting-sort... except that this defies the point, because the first half of counting-sort is solving the problem we originally set out to solve.

d0m · on July 26, 2010

Don't take it harsh, but if it takes all this for this simple theoretical problem, imagine for a real world problem. In my opinion, a pure iterative approach is really simpler:

(pseudo-code) :

<code> myset = {}

foreach c in mystring: if myset[c]: myset[c] += 1 else: myset[c] = 1

myset.sort(fn x,y: myset[x] < myset[y])

</code>

As I said, it's all about opinion (and I know there are million ways to simplify that or make it different, i.e. myset.get() or whatever), but I find the imperative approach way easier to gasp at first glance.

Compare that to:

(let [a (sort str) b (partition-by identity a) c (map (juxt first count) a)] (sort-by second > c))

First, I don't really like a,b,c as name variable (to be fair, myset and mystring are terrible too). Second, I find there are so many extra step to resolve this simple problem.. we need: partition-by, we need identity, we need juxt (wtf is juxt?). Third, wtf are first, count and second..?

I know high-level function are kewl, and working with lists are even cooler and using all of this in a 4 lines in a pretty clean way is again even cool-cooler. However, when I read this, even if there are less lines than the imperative way, it still takes more time to understand what the code is doing.

mquander · on July 26, 2010

Sorry, but I think we'll have to agree that this just depends on background. I can figure out what the second is doing significantly quicker than the first.

When I think about this problem, my mind immediately thinks "well, I want to take all the characters, group them into buckets, one per letter, and then count how many are in each bucket." I don't think "I want to make an empty map, starting with each character mapping to zero, iterate over every character and increment an associated counter in the map, and then read out the counters at the end." If I have to write the second -- because my language has bad list libraries or no higher-order-functions -- I have to "translate" the first into the second in my head.

The Clojure code expresses the computation in exactly the same way it lives in my head. The imperative code expresses that like a computer does it, and to even understand it I have to "step through" it in my head a little bit to make sure the computer is doing the right thing.

sanderjd · on July 26, 2010

I totally get this argument. I also think it's an old-dog-new-tricks problem more than anything else. Things like "foreach", "in", "myset[c]" are part of our existing vocabulary and things like "let", "partition-by", "identity", "map", "juxt" are not (and also, "wtf are first, count, and second..?" is pretty dramatic if you ask me - these are just what they sound like and not at all confusing). You have to learn a new vocabulary to be effective in these languages - maybe the size of the vocabulary is bigger, or the scope of each "word" is larger, but we deal with abstraction all the time; you have to trust that a method or a syntax feature does what the documentation says it does (or read the code for it yourself) regardless of the language.

s1rech · on July 26, 2010

well, you take more time because you are not used to it. It is a bit like Linq in C#, at first it looks very foreign but if you use it for a while you start replacing classical loops with this list based constructs. At the very least, it is a good way to get another perspective.

bitsai · on July 26, 2010

A friend and I are working through Project Euler as part of our effort to learn Clojure. My code is littered with the pattern of (map #(list % (f %)) coll) all over the place... Very glad that I read this article (and the comments), and I think it's time for me to start exploring the clojure and clojure-contrib API's in earnest.

herdrick · on July 26, 2010

(frequencies "abcaaabcccaabrrcb")

Avshalom · on July 26, 2010

as a note, the source is:

  (defn frequencies
  "Returns a map from distinct items in coll to the number of times they appear."
  {:added "1.2"}
  [coll]
  (persistent!
   (reduce (fn [counts x]
             (assoc! counts x (inc (get counts x 0))))
           (transient {}) coll)))

which I think is pretty much just Raphael's solution.

WilliamLP · on July 27, 2010

Can someone convince me what is gained by stressing out over making an easy solution to a trivial problem concise and elegant like this? You have a perfectly good imperative solution which is the way almost any child would do it (go through things one by one and keep counts of every bucket), so what is gained by messing with that? Is it not time to move on to the rest of the program?

derefr · on July 27, 2010

You're thinking like someone who most often writes code. Think like someone who most often reads code—a maintenance programmer, say—and the benefits become obvious. We get away in this industry with writing things that, upon being encountered again, immediately provoke "throw it away and start again" comments. This sort of code will not provoke those sentiments.

(That's not to say that it's actually good code—the article itself points out the problems—but those problems can actually be fixed by modification, not replacement, of the code, which is what we should all be striving for.)

gruseom · on July 27, 2010

This sort of code will not provoke those sentiments.

But it has provoked those sentiments in nearly everyone who's commented. Including the very comment you're replying to!

I don't see anything about the original example that makes it less prone to being rewritten. Nothing's going to stop a programmer who's hell-bent on rewriting something. And "modification" and "replacement" are not that different anyway, especially when the scale is small.

WilliamLP · on July 27, 2010

Are you really going to defend that this isn't trivial to read:

    str = "abcaaabcccaabrrcb"

    dict = {}
    for ch in str:
        if ch not in dict.keys():
            dict[ch] = 0
        dict[ch] = dict[ch] + 1

    sortedItems = sorted(dict.items(), key=lambda x: x[1], reverse=True)

_ivvf · on July 27, 2010

No need to pick on poor python here :). With the pysistence library, you can write code that mirrors other solutions presented in this thread:

reduce(lambda d, c: d.using({c:d.get(c, 0)+1}), "abcaaabcccaabrrcb", make_dict())

I don't know enough about HN to know why it doesn't display the two *'s that need to be be right after d.using(. I also couldn't figure out how to do keyword arguments in python where the key is a variable, not a constant, so if any python expert knows how, you could remove 4 characters from the above solution! :)

WilliamLP · on July 27, 2010

Yes, very cute and concise, and you know Python and I don't:) My point is to ask why to devote yourself to "rules" and optimizing conciseness of a trivial problem here. I don't think "readability" is a defense, because I believe my program (not being a Python programmer) is nearly completely free of guile, still fairly concise in terms of symbols, and instantly readable even to a new programmer.

So where's the payoff?

gruseom · on July 27, 2010

It's interesting and revealing that you're not getting a good answer to your question. In these types of conversations, most of the back-and-forth is really just name-calling -- where the names are things like "readability" instead of "poopoohead" -- on top of statements of personal preference. It's easy for us to debate reduce vs. merge-with vs. foreach, but difficult to have a meaningful conversation about why we prefer what we prefer. I happen to have a few minutes between finishing my coffee and getting on with things, so here are some reflections.

A lot of people who are into FP simply find the concise style more satisfying. That's not regarded as a legitimate reason for preferring one choice over another, so this kind of discussion often hits a what-you-can't-say dead end. We've all seen too-clever code that was just awful, and we all keep saying that cleverness for its own sake is bad, and that position dominates the discussion in a heavyhanded way right now. Perhaps that's because it's necessary to keep the pseudo-clever crowd in check. I doubt it, though, because nothing keeps them in check.

Like it or not, the truth is that one factor driving what programmers do is intellectual satisfaction. This has always been and will always be the case, no matter how many schoolmistresses (most of them, of course, programmers trying to master their own temptations) beat us with the cane of "business value is the only thing that matters" and "no one cares about your stupid clever code".

(I'm by no means arguing that the impulse to make things "cute and concise" ought to take precedence or have free rein, but rather that this factor needs to have a seat at the table before we have a chance of working out a balanced approach.)

The mathematicians have one over on us here. In their world too, the most important thing is getting the job done, but elegance is a close runner-up: it's esteemed for its own sake, and more importantly it's well understood that there's an intimate dialectic between elegance and getting the job done. If we started thinking that way, we might finally make some progress out of these language game dead ends.

Switching gears to the technical matter of what the concise functional approach buys you (apart from elegance, for those who find it elegant), the trouble is that there's no way to answer this question at the level of a code snippet because all approaches yield reasonable snippets in intelligent hands. The real question is: what are the impacts of these varying approaches on building whole systems? That question, as far as I can tell, remains completely unaddressed! Except for the inevitable platitudes about side-effect free code being good for concurrency and so on. Yet that is the question that matters, and our code snippet debates are just the drunk looking for his car keys under the street lamp, even though he didn't drop them there, because the light's better.

Unless you believe that the choice of programming language is irrelevant (which is a little like saying an artist's medium is irrelevant), it's clear that different languages have different classes of programs that are easy to write in that language, and thus different classes of programs that will tend to evolve in it. Each version of an evolving program conditions the next version, so this influence gets compounded over time. (Code snippets are at the rudimentary extreme of that process, which is why they have so little comparative value; what we really need to see is how those snippets evolve into much larger systems.) The language itself and the current version of the program written in it condition the very stream of ideas about what you might do next. So there's a profound influence here, but we don't know how to study it. Failing that, the best approach is to let a thousand flowers bloom and see what systems emerge, but even there historical accident plays a huge role in where the development energy goes and thus what systems ever have a chance to emerge.

Personally I find that the imperative style is easiest in the small and the functional style is easiest in the large, and the trick is to find a design that allows the two to be fused with some regularity. But then I work in Common Lisp so I would say that.

Excuse the length, no time to make shorter.

WilliamLP · on July 27, 2010

Thank you for the post, it was interesting and made me think. I especially liked this point: "but rather that this factor needs to have a seat at the table before we have a chance of working out a balanced approach."

I agree that I would like to see how functional approaches can benefit large systems, in concrete terms, since almost all of the examples I've ever seen involve micro-code like this article. I'm especially skeptical that it can make a dramatic difference for the kind of messy business code that today is usually solved by large object-oriented systems and modular libraries.

koenigdavidmj · on July 27, 2010

    import operator
    
    str = "abcaaabcccaabrrcb"
    
    dict = {ch: str.count(ch) for ch in str}
    sorted_items = sorted(dict.items(), key=operator.itemgetter(1), reverse=True)

The dictionary comprehension syntax is new in Python 2.7 and 3k, but you can say

    dict((ch, str.count(ch)) for ch in str)

instead.