More

vaibhavsagar · 2024-05-14T05:55:58.000000Z

This article doesn't touch on what I consider to be the central feature of Haskell: it is lazily (more precisely non-strictly) evaluated. A lot of the other aspects of it, such as monadic IO, follow from that decision.

tasuki · 2024-05-14T06:01:29.000000Z

How does monadic IO follow from laziness?

tome · 2024-05-14T07:43:50.000000Z

If you start with an impure lazy programming language, you will very quickly discover that you may as well make it pure and handle side effects through an IO type. Specifically, it would be far too hard to perform effects in the correct order without an IO monad. I wrote up an explanation:

http://h2.jaguarpaw.co.uk/posts/impure-lazy-language/

tasuki · 2024-05-14T15:52:28.000000Z

If you start with a pure and strict programming language, how will you handle side effects?

mcguire · 2024-05-14T18:20:45.000000Z

Side effects like IO are very difficult to handle in a pure language. I believe Simon Peyton Jones' argument was that, if it's strict, there is too much of a temptation to say, "screw it", and abandon purity.

https://www.microsoft.com/en-us/research/publication/wearing...

tasuki · 2024-05-14T19:31:08.000000Z

Much respect for Simon Peyton Jones, but empirically, that's not true: see Idris, PureScript, Elm, and probably many others.

mcguire · 2024-05-15T18:17:03.000000Z

Those were all designed after monadic IO was introduced in Haskell. The ability to mark IO operations in types (and the do notation) was a game-changer.

kccqzy · 2024-05-14T17:39:39.000000Z

The same way as a pure and lazy programming language. That is to say, the IO monad. This abstraction works for both lazy and eager evaluation because it's internally just passing a token from function to function.

When you have something like `doThis >> doThat` (two actions sequentially) the first action results in a token (RealWorld#) that must be passed into the second action. Even though the evaluation of the two actions may be undetermined in a lazy language (but determined in a strict language), the token being passed around has a definite order.

tasuki · 2024-05-14T19:32:25.000000Z

That was kind of my point, but thanks for laying it out nicely, I guess :)

mcguire · 2024-05-14T18:29:15.000000Z

In a lazy language, you have a significant difficulty in ensuring

the order of operations, specifically printing a prompt and reading the answer. Before someone (Mogi?) realized that a category theoretical monad with specific rules made IO sequencing trivial, the state of the art (IIRC) was world-passing. (With world-passing, you have a value that represents the state of everything outside the program; any IO operation absorbs the current world value and returns a new one. Only the most current value actually works.)

I don't know if it is still the case, but the last time I poked around, in GHC the IO monad was implemented using the older world-passing code.

pdpi · 2024-05-14T09:15:42.000000Z

Take this bit of pseudocode:

    a = readLine()
    b = readLine()
    print(b)
    print(a)

What order are the lines executed in?

In a strict language, the answer is obvious. In a non-strict language, line 4 has a data dependency on line 1 so always executes after it, ditto lines 3 and 2. But how the two groups get interleaved is completely unpredictable, so, if you’re really committed to non-strict evaluation, you need a way to force data dependencies between all four lines such that order of evaluation is forced. Once you achieve that, you have a bunch of ugly-looking machinery that’s peppered across your whole codebase.

Monadic IO (and do-notation in particular) gives us a way to write this sort of code without feeling the urge to gouge our eyes out.

nequo · 2024-05-14T06:09:07.000000Z

Laziness requires immutability to work well, and that means you need to represent mutations such as IO operations in an immutable cover like the IO monad.

kccqzy · 2024-05-14T17:35:28.000000Z

You've got it backwards. Laziness requires mutation to work well. Laziness is just a thunk (a closure) that can be overwritten by a value at a later time. It is impossible to implement laziness if you cannot mutate memory: if the address of the unevaluated closure and the evaluated value cannot be the same, you have to update references to the closure into references to the value, but you cannot do that if you can't mutate memory!

People always assume garbage collectors in Haskell might somehow be different due to immutability. But due to laziness it works just like Java garbage collection because laziness requires mutability.

nequo · 2024-05-14T18:53:57.000000Z

I don't quite understand. You don't update any references when thunks are evaluated, GHC's runtime does. The runtime shields you from any mutation, doesn't it? (Unless you explicitly ask it to let you mutate a value in memory, of course. But that goes beyond the normal evaluation of thunks.)

kccqzy · 2024-05-14T19:00:22.000000Z

Oh yes. I just realized we are talking past each other. I'm talking about the implementation detail of laziness. The mutation is internal and usually invisible. The only knob that I know of to adjust it is -feager-blackholing to control whether evaluation results in two mutations or one (except when switching threads: always two in that case).

tasuki · 2024-05-14T15:56:15.000000Z

Mhm, I see now. Perhaps immutability without the laziness would also lead to monadic IO ?

vaibhavsagar · 2024-05-14T05:19:27.000000Z

Have you worked with Nix much?

vaibhavsagar · 2024-03-05T09:39:27.000000Z

Why not `git clone --bare`?

v7engine · 2024-03-06T10:14:04.000000Z

I've tried this and it seems to be working fine.

git clone --bare https://path.to/repo.git

mv repo.git .git

now we can create worktrees in the same folder.

jacoblambda · 2024-03-06T15:38:32.000000Z

fwiw you can actually one line that

> git clone --bare https://path.to/repo.git .git

jacoblambda · 2024-03-06T15:37:35.000000Z

You can bare clone and it works fine. This is in part just habit from converting existing repos into bare + worktree repos.

vaibhavsagar · 2024-03-04T04:18:17.000000Z

I hope there's at least one chapter on compact data structures!

pncnmnp · 2024-03-04T04:58:03.000000Z

I've sort of made plans for the last four chapters, sort of :). I would have loved to write about Wavelet trees, but a lot has already been written about them. If you're into succinct data structures and are looking for something more practical, check out this blog on Succinct de Bruijn Graphs (https://www.alexbowe.com/succinct-debruijn-graphs/).

vaibhavsagar · on Aug 25, 2023

I wrote about these too: https://vaibhavsagar.com/blog/2018/07/29/hamts-from-scratch/! Unfortunately it's in Haskell which probably makes it less approachable than it would be otherwise.

vaibhavsagar · on Aug 24, 2023

I personally use Vim, but Emacs alone is IMO the strongest counterexample to your claim about Lisp interpreters (followed by all the programs scriptable with Guile).

mathisfun123 · on Aug 24, 2023

I mean I'm aware of emacs but I am also a vim user so I proudly stand by my claim.

vaibhavsagar · on Aug 30, 2022

I also have my own implementation of packfiles [1], which I implemented as part of a toy Git implementation in Haskell [2]. I personally found the packfile format underdocumented, e.g. the official reference for the packfile generation strategy is an IRC transcript [3].

1: https://github.com/vaibhavsagar/duffer/blob/bad9c6c6cf09f717...

2: https://vaibhavsagar.com/blog/2017/08/13/i-haskell-a-git/

3: https://git-scm.com/docs/pack-heuristics/

vaibhavsagar · on Aug 29, 2022

What if Haskell is my favourite language?

jasonboyd · on Aug 29, 2022

Then it's a fixed point?

vaibhavsagar · on July 5, 2022

This is why I do all of mine in Haskell.

vaibhavsagar · on May 29, 2022

Doing a hashmap lookup is already very much like using a bloom filter: for any addressing scheme, if an element exists at a particular location, then either the key exists or it's a false positive. If no element exists, then there's no way the key exists. Adding an extra bloom filter would then come down to trading off space usage against a different probability of false positives, in addition to an extra lookup, and I think one would have to contrive an extremely unlikely scenario for it to make sense.

tialaramex · on May 29, 2022

Sometimes "Does anything in this category exist?" is constantly needed and must be as fast as possible - especially if the answer is often "No", while "What is it exactly?" is more rarely needed and it's OK that it's slower.

The Bloom filter gives you quick "No" answers for much less size than a hash table, in exchange for not being able to answer the second question at all.

We built distributed database software, years ago when 40GB was a lot of data, and that used Bloom filters so that if you asked "Show me everything about vaibhavsagar, tialaramex, dang, and celeritascelery" it could do a Bloom filter check, identify that there isn't any data about celeritascelery and dang and immediately cut in half the work to be done, before going to the disk to pull in actual hash tables where records for tialaramex and vaibhavsagar would exist if the positive from the Bloom filter wasn't a false positive.

The UX is nice too because it can make typos instant. For every municipality in Texas, show me the CesusCount. Instant zero records. Oh, right, not CesusCount I wanted CensusCount with an N.