Hacker News new | past | comments | ask | show | jobs | submit login
Purely Functional Data Structures (1996) [pdf] (cmu.edu)
364 points by debanjan16 on May 30, 2023 | hide | past | favorite | 96 comments



There's also a nice addendum on cstheory.stackexchange, "What's new in purely functional data structures since Okasaki?" - https://cstheory.stackexchange.com/questions/1539/whats-new-...


Since Okasaki's work there have been several advancements and new developments in the field:(Source: MirrorThink.ai)

1. PaC-trees: Supporting Parallel and Compressed Purely-Functional Collections - 2022: This paper introduces PaC-trees, a purely functional data structure that supports parallel and compressed collections. PaC-trees are designed to be efficient in terms of space and time complexity while maintaining the benefits of functional data structures.

2. Proving tree algorithms for succinct data structures - 2019: This paper discusses the development of tree algorithms for succinct data structures, which are compact representations of data that support efficient query and update operations.

These is some work in other fields too:

1. CyBy2: a strongly typed, purely functional framework for chemical data management - 2019-12-30: This paper presents CyBy2, a purely functional framework for managing chemical data. The framework is designed to be strongly typed and enforce referential transparency through the type system.

2. chemf: A purely functional chemistry toolkit - 2012-12-20: This paper introduces chemf, a purely functional chemistry toolkit that provides a set of algorithms and data structures for working with chemical data in a functional programming context.


I'm not so sure about cyby2, the persistence is based on "conventional" relational DBs. Judging by the paper, the main goal wasn't developing a specialised functional data structure to store molecules


A few discussions:

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=11056704 - Feb 2016 (42 comments)

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=7081191 - Jan 2014 (17 comments)

What's new in purely functional data structures since Okasaki? - https://news.ycombinator.com/item?id=1983461 - Dec 2010 (2 comments)

What's new in purely functional data structures since Okasaki - https://news.ycombinator.com/item?id=1713594 - Sept 2010 (1 comment)


I would also recommend Koen Claessen's simplified finger trees (https://dl.acm.org/doi/abs/10.1145/3406088.3409026) and Zip trees (https://arxiv.org/pdf/1806.06726.pdf) for purely functional skip lists.


RRB trees in particular allow you to benefit from some of the cache efficiency of regular vectors, while supporting operations like immutable update. They are used in languages like Clojure and Scala.


I'm reading this book right now. It's really great so far!

I've been working a lot with Trees in Clojure, and have been hitting serious limitations of my understanding.

I also found this YouTube video from a Clojure conference that reviews some different strategies for tree traversal in Clojure: https://youtu.be/YgvJqWiyMRY

I thought that learning a Functional Lisp would make it really easy to traverse trees, since that is that the language is doing to actually execute its code.

Turns out all the stuff I want to do is simply hard.

Functional data structures are really awesome though, it just seems to take a bit of up front investment.


This has been a topic I've wanted to get into for a few years now, specifically because of Clojure! So if you have any additional recommendations I'd appreciate it.

I really enjoyed Friedman's book `Scheme and the Art of Programming` because it filled in some pieces missing from "The Little Schemer" (and "Seasoned Schemer"). Building stuff like `kons`, `map` and all that `letrec` stuff.

But the big difference between Scheme and Clojure is that in Scheme, while it's "functional by concept," you get to escape with `(set! ...)` whenever you want to have a more traditional ds/algo (like say doing the n queens problem).

In Clojure you can kind of do that by either escaping to Java, using protocols (with atoms), or using transients, but I often feel like there's a "more functional way to do stuff that hasn't been taught to me."

I've opened up either the Okasaki dissertation or book or both, but I've always had trouble reading it, and then sticking with it. And some stuff like looking at Rosetta code and saying "to reverse a linked list in a lisp is easy... because it's a cons cell" seems like cheating. Almost like showing up to an interview, saying your "linked list" is implemented in an array structure and then calling `reverse()` on it.

Will watch that talk from 2014, must not have seen it before.

I guess, conceptually, day to day things in Clojure does feel pretty natural, even easier, and I think I have a decent understanding of it. But then when I look at leetcode type problems, or something more involved, it takes a lot of mental effort to translate to it. Especially things like `big O` gets thrown away in my mental model. I get it, persistent data structures and all that, but there's still a mystery there.


I feel like they are.. not so awesome: they are grossly inefficient due to all the pointer chasing and are pretty much guaranteed to be slower than the alternative since they trash your cache.


I wish I could remember the exact book, I think it's _writing solid code_. There was a long example about the excel evaluation engine back in the day when it was shipped on CD's and had to be perfect. The approach was, use one big dumb slow, but understandable and correct implementation, in parallel, use the lightning fast super whiz bang new implementation. Since both shared the same interface, they could be swapped out, or both run at the same time.

I think there is real value in starting with the pure functional versions, then swapping out when needed. One problem that seems fairly common in large codebases is using an object as a hash key. but as the code grows, that object sort of gets passed around and eventually somebody updates it without rehashing. That's a tough bug find. They are for me anyway.

This is one of those rare cases where you can actually make it faster later without trashing the overall design.

I'd encourage starting with the pure functional version first, every time. I'd go further and say, leave some opportunity to flip back to the pure functional version in dev builds for isolating heisenbugs.

Blah blah, grain of salt, free advice, your milage may vary, Games have different requirements than webpages, everything is contextual.

This is one rare case where it's always worth having the capability of swapping back and forth is worth it. Just use it, and think hard before giving it up is a really good default.


I wanted to verify the book reference, and I think you could be correct.

There is a website for it: https://writingsolidcode.com/

The Amazon page it links to includes a few Index pages in the preview. I saw these under E:

Excel number formatting code 163

Excel's dialog handling code 113


You feel this way but if you do the math and benchmarks you will be surprised.

You might not be running GHC's RTS on a resource-constrained target platform but you can generate the C code that could run on that platform from Haskell, leveraging the guarantees you get with using Haskell's advanced type system.

Update: point is that GHC can optimize code and some of those optimizations can avoid pointer chasing! But like most optimizing compilers, some times GHC can miss them, and you have to do a bit of work to tell GHC where it can avoid chasing unnecessary pointers (or where it should/should-not inline, give it some help to know where it can do fusion, etc).


Most of the problems I'm trying to solve require those pointers anyway.

Tracking diffs and versions of data over time: Version control systems and RDMS just don't cut it.

Bitemporal databases are interesting to me as well.


There are entire fields of research dedicated to studying cache oblivious persistent data structures, and they're almost always trees.

One of the key points of immutable datastructures is that they're easy to reason about given that the invariants are upheld. Designing an efficient memory representation that upholds those invariants is an entirely separate problem domain. Not every pointer dereference is slow, and not every tree is represented with pointers.


They're only grossly inefficient if you wouldn't otherwise need to frequently make copies of large mutable data structures. It all depends on what you're trying to do.


I agree that it's a great book; but I wouldn't recommend it as an entry point into understanding purely functional data structures.

Okasaki fleshes out a couple of examples and explains his thought processes and heuristics for developing such data structures quite well; but they're very much still notes on the early-stage exploration of the concept.

From a historical perspective it's still a fascinating read though and definitely recommend it if you want to read up on the origins of immutable data structures.


This is the canonical guide to reasoning about amortized runtimes. The exponentially growing ArrayBuffer (O(1) amortized append) is the classical data structure used to teach amortized runtimes, but the O(1) amortized functional queue Okasaki presents here gives a much better intuition for what amortized runtimes are all about.


There's a book which looks like it's based on this dissertation, which probably is a better source to read if you are interested in this topic: https://www.amazon.com/Purely-Functional-Data-Structures-Oka...


For C++ check this one out - https://github.com/arximboldi/immer and this talk from the author - https://www.youtube.com/watch?v=_oBx_NbLghY (CppCon 2018: Juan Pedro Bolivar Puente “The Most Valuable Values”)


It's weird that there are so many claims in here that the data structures and algorithms are perfectly performant yet there isn't even one look at generated assembly or any acknowledgement of the underlying system that is supposed to run the code. Proving things are Big O performant is neat, but at some point the code has to hit hardware.


> It's weird

It's not weird, this is standard in academic computer science. It would be weird to do otherwise. In a theoretical dissertation/paper like this you can't just randomly bring up compiled assembly, it's completely and utterly off topic, it's not any more on topic than bringing up if the code was ran by an interpreter, or JVM, or transpiled to Haskell, or ran on GPU etc...


"Computer science is no more about computers than astronomy is about telescopes."

-- Edsger W. Dijkstra

However, I do believe that astronomers put references to the actual instruments they used in their publications.


So do most algorithms papers that have benchmarks in them. It's not always useful though, because information about how this algorithm compares to some other one on a PDP-10 doesn't necessarily translate to modern machines.


I can't speak for other fields, but in statistics and machine learning, it's not uncommon to see at least some practical experiments or simulations alongside theoretical results, or at least some discussion of practical considerations.

There are definitely plenty of pure theory papers as well, and pure theory is still a contribution. However one would at least hope to see subsequent work that does focus on the practical aspect.


> at some point the code has to hit hardware

Yes, but that's not a concern of a computer scientist. Implementation and execution of the algorithm are up to the reader.

It's like complaining that engineers don't do enough novel computer science research; of course they don't! It's not their job.


Which profession is expected to make progress on actual software performance?


Computer scientists (and of course engineers) both care about real-world performance, but some computer scientists just care about the theory.


That's a very naive view of computer science. That is the attitude of people who have given up and decided that computers are too big for science now.


You're right, there are plenty of papers focusing on real-world performance. I chose not to capture the nuance because I wasn't sure how to express it succinctly.


How I see it: computer science as an academic field is roughly split between theory and systems. The theory folks tend to evaluate progress through proofs. The systems folks tend to evaluate progress through experiments.


I’ve vouched for this, because it brings up an interesting point (albeit, one that proven provocative and been expressed before). Also, many of your comments are dead.

Big O/algorithmic complexity is interesting in that it tends to abstract away the architecture of the underlying processor. A copy is a copy is a copy. Arithmetic is arithmetic is arithmetic. Regardless of what instructions/processing units the underlying processor has. Regardless of data access. Regardless of parallelization. Regardless of memory complexity. All we use are brutish units of “work” — despite not all “workers” being equal.

It reminds me a bit of scientific management: treating your “workers” as interchangeable units that carry out what has been decided to be the most efficient movements and processes — completely disregarding the individual characteristics of the worker. For a humanist example, consider the individual quirks of the physiology (not only in size, length, and composition of the skeletal system via the muscles, bones, tendons and so on that would necessitate there being specific movements and workloads that best suit them — but also in psychology; the brain as its own work-producing part that is uniquely suited and most efficient for specific workloads; rather than an all-encompassing abstract average “best workstyle” that makes no note of these characteristics, but simply decides to use external metrics like “time to pick up a box using X, Y, or Z form.”)

The same parallel can be drawn to computers: different processors and collections of hardware (what is essentially the “body and brain”) have different quirks and different workloads they perform best at. For example, GPUs are much more useful in the case of vectorized/parallel workloads where the operations are relatively simple (f.e matrix arithmetic). While you can run an iterative matrix multiplication algorithm on a CPU in n^3 time — your data access costs will be significant. Instead, running a parallel algorithm on a GPU, with RAM very close by, you can achieve log^2 n.

This is where CS research really shines: not running away from the realities of physical systems, but embracing them.


Other commenters have addressed the key point - it's not weird. CLRS doesn't look at generated assembly either.

One other thing I want to add, though, is that this book was published in 1996. The CPU landscape at the time was far more diverse, both in terms of ISA (which assembly?) and also in terms of capabilities. Even on the x86 line, plenty of people were still using 486s, which weren't even superscalar, and thus had pretty strikingly different performance characteristics than, say, a Pentium Pro (a "modern", OoO CPU). Tying your algorithmic analysis to a particular piece of hardware would be pretty useless; providing a satisfactory look at the performance on so many different (micro-)architectures could risk overwhelming the content.

Moreover, at the time, performance was, maybe, a little more straightforward, making your concerns perhaps not quite as important. Memory was slower than the CPU but not the way it is now. Caches were much smaller and less useful. Branch prediction was more primitive. CPUs had relatively weak reordering capabilities, if any at all, and simpler pipelines. There were few dedicated SIMD instructions. This was a world where linked lists still often made sense, because the hardware penalties you paid for them were smaller. In other words: in 1996 the on-paper performance wasn't at quite as big a risk of diverging quite so wildly from the real-world performance, so keeping things simple and focusing on the on-paper performance was less objectionable.


Yeah, Big-O notation is only part of the picture and yes, "galactic algorithms" that are theoretically efficient but only for astronomically big inputs do exist, but in many cases (especially if your algorithm isn't doing anything particularly deep or clever) linear is faster than quadratic which is faster than exponential on real world input, too.

Mathematics is the science of modeling and abstraction and that abstraction exists in order to make the process of knowledge gathering easier. It works very well for the sciences, so I would say the method has been proven. Of course, if the models turn out to be completely inappropriate, that would be different, but so far, the simple machine models used (implicitly or explicitly) in the analysis of algorithms seem to be rather reasonable.

The alternative that you suggest, trying to benchmark concrete implementations of algorithms has so many confounders that it would be very hard to execute properly. I'm sure people are doing this too, but the formal Big O proof has a different quality to it because it does not depend on those particulars.


I'd even say that maths is what appears when you stop willing to pay for concretion.


Also, while I haven't read any further yet, based on the table on page 3, their big O performance is pretty bad.


How helpful would it be to see 30 year old generated assembly and benchmarks for an i486 with a tiny cache, no prefetching, and relatively tiny memory vs instruction latency compared to today’s CPUs?


is this being upvoted onto the homepage based on upvoters actually understanding that this paper from 1996 is of contemporary relevance and interest or more due to keywords like "pure", "functional", "data" and "structure"?


I can’t speak for everyone, but I upvoted it from nostalgia, having read the book version over a decade ago. I happened to be thinking about ordering a copy for the office just yesterday.


I have a copy on my desk. The bit about designing data structures by analogy to number systems (and limiting carry propagation) is really fun.


It is is still the go-to textbook for immutable data structures. Worth the read.


This comment reads as if there is a clear, contemporary successor for learning pure functional data structures.

Is there? If so, please do share a reference.


I've recently used a number of structures that I learned from this book. Though I don't know if the text represents the state of the art in purely functional data structures, it's a pretty seminal work in the area.


Nowhere near the state of the art. Lots of improvements since this was published.

It is a good book for learning. It is a decent book for reference. When you want to really fly you will want to reach for more recent work.


Examples of more recent work for us non-specialists?



Missed it - thanks


> is this being upvoted onto the homepage based on upvoters actually understanding that this paper from 1996 is of contemporary relevance and interest …?

In my case, yes.


The book is on Amazon, but this submission had those keywords, plus it's a PDF.

Of course it is of contemporary relevance; functional programming (FP) is all the rage.

The tricky bit are questions like

How does this jive with existing JS functional constructs like "fantasy land," for example.

How to "recruit" more folks to FP, or even a hybrid approach of objects interacting functionally

Game jams using more FP-like data structures? Or more HN submissions like that.

The harder things to evaluate are a lot of other topics, news-like but investigative and curious, or sites that are essentially selling a service (versus teaching the mechanism behind it).

For SaaS stuff, since HN is about startups, I have to let it slide. But the hacking piece is when one person accomplishes something with persistent decomposition of sequential problems, or does something clever using tools or ideas from a different context.


My understanding is that the book is based on this PhD dissertation.


Oh.. then, yes--this was unabashedly a (positively) triggered reaction (fortunately or unfortunately).


Your comment is giving early 2010s hipster “you probably never heard of it” vibes.


Definitely read that headline as "Purely Fictional Data Structures". My disappointment is immense.


Purely Fictional Data Structures include:

  To Queue a Mockingbird
  The Call-stack of the Wild
  Tesselation of the d'Urbervilles
  One Flew Over the Cuckoo Hash
  The Data of the Woosters
  Brideshead Re-visitor
  The Catcher in the Trie
  Les Miser-tables
  The Nat-elim of Monte Cristo


> The Catcher in the Trie

https://en.wikipedia.org/wiki/Trie#History,_etymology,_and_p...

and the counterpart, Purely Fictional Languages:

   The Catcher in the *Try*


I would love to be educated. I've seen claims about the merit and value of functional programming throughout my (nowadays relatively long) programming career. In practice I've never once seen those values or merits come to fruition - just the same cycle all software goes through.

My very direct experience recently has been Scala + cats resulted in the same buggy nonperformant software it was meant to prevent. I understand that bad programmers produce bad programs, regardless of language, but I feel pretty strongly that good tools prevent some amount of the typical "bad" that makes bad programs bad (ignoring the obviously maliciously bad examples). So I don't really understand, and would like to understand, if, how and when pure FP (and I suppose FP in general) actually improve quality of code/life outside of toy examples.


The bottom line is that pure FP means that the same input to a function gives you the same output.

When you debug, you just give the program the same input which was problematic and you get to reproduce the error.

Persistent data structures make it less wildly inefficient to do so.


title typo: "Purely Functional Data Structure" -> "Purely Functional Data Structures" (pluralization)

reads a bit weird otherwise - sounds like it's discussing a particular purely functional data structure when it's actually a survey of many (pretty much the canonical survey of them, in fact).


Thanks for pointing it out. I totally missed it. Sorry.


What are software bugs that can be avoided by choosing data structures like these?

I'm making a broad, high-level presentation about immutability in technology. At my company we have folks who have heard of it in the context of ransomware-resilient backups, others who have heard of it in the context of infrastructure as code, and very few who have heard of it in terms of data structures (distributed and non-distributed). My goal is to showcase the concept in various contexts so that people can better understand its role as a key design choice in technology.

Personally I have no experience working on software that utilizes these, so if others here do, I would appreciate your input on how these make your software more reliable.

The emphasis on software reliability and bugs-avoided is because the audience works under the company's risk-management division.


Purely functional data structures are a means, not an end in themselves.

Programming with immutable values and the more declarative style they support does design out at least one infamous source of bugs: shared mutable state. That has become increasingly important in recent years, for example because modern chips support ever increasing numbers of concurrent threads in hardware and because a lot of the software we write is now part of distributed systems.

That programming style has other advantages in terms of readability and reasoning about code. If you can be confident that a particular name in the code always refers to the same value once it’s been defined, tracing the flow of data through a function or through an entire system often becomes much easier. Having more understandable code naturally tends to improve reliability, among other advantages.

If we adopt that programming style then we still need to represent collections of values and different relationships within our data, but we can’t always use “traditional” data structures to do that any more because many of them rely on mutability, which we have excluded. So we need other ways to represent structured data that are compatible with immutability and declarative style, and that is what purely functional data structures give us.


_Immutable_ data structures (not 100% the same thing, but a lot of overlap) avoid all kinds of concurrency problems, because you can safely pass them around and you'll never get a data race. You don't even need any locking (just to make things complicated, _lock-free_ data structures are another closely related but not identical concept).

Once you're running a distributed system, this kind of stuff comes into its own.


Careful with this wording. They avoid shared memory mutations. They don't necessarily change data races. Rather, they just change them since, by definition, every edit is now creating stale data.

At large, in distributed software, this is a distraction. Since most passing of messages around from one distributed piece to the other was already doing a copy across mediums. Such that sent data was already immutable from the senders perspective. (Granted, the preparation step can have you step on your own feet.)


As the peer comment points out, functional data structures force you to be deliberate about where/when state changes take place. In particular, they force you to be cognizant of which version of a data structure a particular piece of code is referencing.

Immutable structures can help significantly in reducing race conditions in parallel code, as it is impossible for one thread to modify part of a data structure while another thread is accessing it. Each thread sees a consistent 'version' of the data structure until the code explicitly replaces it with a new version.

Immutability can also help with memory optimizations: A node of one data structure can be safely re-used in another data structure. For example, if you need to retain older versions of a tree, you can share common nodes (this is how GIT encodes version changes).


Purely functional data structures are very common in purely functional languages like Haskell but are also used in non functional languages via libraries like immutable.js.

At a high level, immutability forces you to be extremely deliberate about state changes in your application. This improves reasoning/understanding, reduces bugs, and eases debugging.

An example of immutability that you might be familiar with would be react props/state. You don’t modify your state. This makes reasoning about state much more simple.


Disclaimer: Not a programmer - I do not have a CS degree. One of my minors as an undergrad was MIS and while I took a database programming course as a senior, my knowledge is limited.

The high-level explanation I recall reading years ago somewhere (Joel on Software, I think??) was that a) Functional programs have no side effects, leading to the notion that b) They can, without much effort, be adapted for parallel tasks.

The example here was MapReduce, which was the original guts of the Google Search algorithm.

E.g, Things are fast because we can send your query to many, many computers to find an answer, and chances are good that that machine is (relatively) close to you, which means low wait times to get your answer.


I think the point is that these data structures can be the foundation of a pure-functional style of programming.

You wouldn't drop these into an ecosystem where the programmers are used to dealing with mutable structures.

Now, whether using a purely functional language correlates with writing less bugs is a separate discussion, not sure what the modern consensus is.


> not sure what the modern consensus is.

Too good a phrase to pass up!

Modern consensus is done by constructing a replicated log - an append only data-structure shared across multiple workers. It's what you use Paxos for, and it's what Raft streamlines.


Okasaki got me interested in confluently persistent data-structures, way back in the 2000s.

They seem magical! To be able to combine data from the past with current data, efficiently!

They are almost always trees, with the exception of skip-lists, with all operations O(log(n)), .

After creating my own programming language Enchilada that is based on immutable data structures, I started considering what I deemed "next level":

Uniquely represented confluently persistent data structures

Combined with a Merkle tree encoding of such uniquely represented data structures (they are almost always trees), you can efficiently and incrementally authenticate them. Think 'block chain' on steroids, with incremental cryptographic hashes. Or torrents, if you are into that kind of thing.


Why would we want to use purely functional data structures? When do the pros of functional data structures outweigh the additional complexity? Are there scenarios when a project would want to pivot from a regular data structure to a purely functional one?


The most common point is that they're safe to share between threads making parallel algorithms easier to invent, understand, and implement correctly.

You can also safely re-use sub-structures without performing a deep copy. For example, if you want to keep a sub-tree around for later you can do that in O(1) time because it's safe to keep a reference to it. If it is a mutable tree you don't know what's going to happen to it so you need to do a deep copy of the entire sub-tree you're holding on to. This can save a lot on memory allocation and copying depending on your use case.


Deep Learning applications is one area.

Traditionally, OO code is written all the time.

But after I learned JAX/Flax, a light turned on inside my head and I now write functional Deep Learning code as much as I can. All my side projects and new code are purely functional in JAX/Flax. PyTorch has functional API known as functorch, and I have used it one project.

Where lots and lots of data in 3,4,5 dimensional tensors exist, and you need to run lots of transformation on them, and then you need to multiply a huge number of them thousand times in each second- functional code makes much more sense and gives immense sanity and peace of mind.

Those of you writing Deep Learning code, learn functional programming principles (immutable data, pure functions, leaving no side effect, etc.), and apply them to DL via functorch or JAX.

Your life will never be the same.


Do JAX and functorch have the same level of builtin functionalities (operations) as the original Pytorch library?

Where to learn about it other than the documentation?


JAX is very barebones and will require you to write much more code for the same task than you write in PyTorch.

Functorch is still new, and honestly, there is little to learn if you already know JAX. There are some talks from Meta, and then there is always the docs.


There's a tradeoff between the complexity of the implementation of a data structure, and the use of one. While the complexity of implementing purely functional data structures is often (except maybe in the case of singly linked lists) higher than their mutable counterparts, actually using them in a program is simpler and less error prone.

There are obviously other trade offs as well, like performance and memory usage.


they have some nice properties that you might want to benefit from. For example, they're persistent: every previous state of the data structure is still in there.. aka snapshots!


There is no additional programming complexity in using Scala's Map vs Java's HashMap. They are both maps with keys and values. The Java one you update in-place

    var m = new HashMap<K,V>()
    m.add(k, v)
the Scala one you "update" by creating new maps,

    var m = Map.empty[K, V]
    m += (key, value)
In practice it's mostly the same except you can share the immutable one around without being scared someone will mutate it, but it takes more memory per instance than the mutable version and creates more GC churn, which may or may not be an issue.


Do you use git? The git commit graph is a purely functional data structure.


As is explained in the abstract, purely functional data structures are most useful when you want to avoid assignment, either due to the restrictions of your programming environment (f.ex. you're programming in Haskell), or because that is a property you're interested in for other reasons (e.g. you want to ensure immutability due to concurrency concerns, or exploit persistence to improve performance).


This book is near and dear to my heart. Back in the mists of time (~05?) when I was learning Haskell I reimplemented several of these in order to get my head around things. Lots of fond memories.


I remember encountering an early draft when he was still doing his masters? Absolutely cracking paper, one for the ages. Thanks Chris :)


For me, the most mind-blowing part of Okasaki's book was the chapter on "numerical representations". Never looked at it like that before I read that chapter. While the other chapters certainly introduced material that was new to me at the time, this one took some things I knew and added a whole new dimension to them.


I have a personal pet peeve about the misuse of terminology when dealing with such names, for which the only solution is to go read the original reference to figure out what they meant by it.

E.g., in this case, to describe a data structure as "purely functional" makes zero sense to me intuitively at first. You need to go read the thesis and realise they're referring to data structures implemented as algebraic data types, which in the context of a purely functional language can themselves be thought of as functions, and can therefore be described as 'functional' ...

But unless you do that, the first thought is going to be "huh? can arrays be further split into imperative vs functional?" "Does he mean immutable?" "Can I use functional arrays in c?" "Are they faster/safer than normal arrays?".

By contrast, I think "Purely Functional Data-Structure Types" would have been a far more intuitive term ... but I suppose the author may have felt that clarifying further wasn't punchy enough, and could have made the thesis more wordy...


A data structure is a type, so "data-structure type" is a pleonasm. Please don't misuse terminology :)


Nominative determinism aside, was that second part really necessary?


I thought it said “Purely Fictional Data Structures” - which would have been fascinating.


Borges meets Knuth.


Thanks. I have the printed book, but a PDF is much more comfortable!


Gah, why did I just buy this as an ebook if it's free.


It would be cool if someone could translate this into Typescript or the like, I think it would make it a lot more readable.


TypeScript is much less readable than Haskell and OCaml, but you can easily find translations to TypeScript such as https://github.com/skeate/lambdata.


> TypeScript is much less readable than Haskell and OCaml

That's like saying that Norwegian is much less readable than Italian. It is in the eye of the beholder. They can both express the same concepts but which one is more readable depends on which one you already know.


> They can both express the same concepts

Only in the turing tarpit sense. Out of the box, they have very different capabilities. For example:

Higher-kinded types: easy in Haskell, hard in OCaml, mostly impossible in Typescript.

First-class modules: OCaml has them, Typescript can sort of simulate them with extraordinarily unsafe prototype mangling stuff that you should never ever use, impossible in Haskell

Open variants: Easy in OCaml and Typescript, hard in Haskell


Using this to push Brainfuck at work


this literature was my "serious" introduction to fp. implemented some of the data structure on f#.


Related. Others?

Purely Functional Data Structures in Elm – course lecture notes (2015) - https://news.ycombinator.com/item?id=12145741 - July 2016 (15 comments)

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=11056704 - Feb 2016 (42 comments)

Purely Functional Data Structures (1996) [pdf] - https://news.ycombinator.com/item?id=10486481 - Nov 2015 (13 comments)

Okasaki: Purely Functional Data Structures (1996) [pdf] - https://news.ycombinator.com/item?id=8327838 - Sept 2014 (1 comment)

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=7081191 - Jan 2014 (17 comments)

Ten Years of Purely Functional Data Structures (2008) - https://news.ycombinator.com/item?id=5701396 - May 2013 (24 comments)

What's new in purely functional data structures since Okasaki? - https://news.ycombinator.com/item?id=1983461 - Dec 2010 (2 comments)

What's new in purely functional data structures since Okasaki - https://news.ycombinator.com/item?id=1713594 - Sept 2010 (1 comment)

"Purely Functional Data Structures" by Chris Okasaki [pdf] - https://news.ycombinator.com/item?id=1138979 - Feb 2010 (12 comments)

Teaching, Playing, and Programming: Ten Years of Purely Functional Data Structures - https://news.ycombinator.com/item?id=112270 - Feb 2008 (2 comments)

Chris Okasaki's PhD thesis on purely functional data structures (pdf) - https://news.ycombinator.com/item?id=8221 - April 2007 (1 comment)


my dream language is rebol with immutable data structures only




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: