One sentence in the article made me do a double take: "reasonably tightly optimized low-level Haskell". For I minute I thought that this defeats the whole purpose of using Haskell, because for me it's all about being able to code at a higher level of abstraction.
But when I saw how quickly he fixed the problem, I reconsidered. As wrong as low-level tweaking in a high-level language feels, if it's still less work to optimize than in C or Java, that's a win, right?
I think of programming in Haskell as being more about being able to choose a level of abstraction that makes sense for the situation.
The more abstract code is great when it works well, and when it falters I can easily drop down to something more concrete, or even to C, and still have most of my code benefit from that higher level of thinking.
Someone with more experience can correct me, but "low-level Haskell" is Haskell that takes into consideration things like evaluation order and thunks. It is stuff that many other languages would still consider way-high-level.
It sounds like you might be trying to be snarky, as if someone is trying to sweep details under the rug, but just as "the file had its first bit flipped" is more descriptive than "the file is corrupt" but the latter is still a correct description, "unevaluated thunks" is merely a more precise categorization of "space leak", which is itself a subset of "resource leaks", after all.
Yes, I had heard the term "space leak" used in functional language communities but wasn't sure what it meant. Asking around, the definition is commonly given as I did above.
I thought readers here might find that information useful, given that someone might describe a problem similar to the OP's by using the term "space leak".
> The culprit turned out to be JSON strings containing zillions of backslashes.
My guess is that there was a feedback loop somewhere, with the JSON output of a process being fed into the same process as input, adding extra backslashes each time the data went through the loop.
So, what happens if the size of the JSON chunk to unescape is greater than the size of available memory? One of the benefits of streaming APIs is trading off CPU for memory.
Of course, it sounds like if it hit the pathological edge case in that setup, it would be in just of bad a position.
In my experience, this is a huge problem with Haskell. Laziness, for all of its benefits, makes it difficult to reason correctly about performance in large code bases. I think non-strict evaluation by default probably was a mistake. Other than that, Haskell is great.
This is primitive C practice: allocate a "bigger buffer than you'll ever need"--a siren call for exploiters. There's a simpler way: when you see a backslash, start a (strict) counter.
Silly question: why does a string builder need or use escape characters? If you dump to JSON why can't you escape at the very end, with a sikple iteration theough all the characters?
But when I saw how quickly he fixed the problem, I reconsidered. As wrong as low-level tweaking in a high-level language feels, if it's still less work to optimize than in C or Java, that's a win, right?