Sometimes, the old ways are the best

jasperry · on May 13, 2015

One sentence in the article made me do a double take: "reasonably tightly optimized low-level Haskell". For I minute I thought that this defeats the whole purpose of using Haskell, because for me it's all about being able to code at a higher level of abstraction.

But when I saw how quickly he fixed the problem, I reconsidered. As wrong as low-level tweaking in a high-level language feels, if it's still less work to optimize than in C or Java, that's a win, right?

bos · on May 13, 2015

I think of programming in Haskell as being more about being able to choose a level of abstraction that makes sense for the situation.

The more abstract code is great when it works well, and when it falters I can easily drop down to something more concrete, or even to C, and still have most of my code benefit from that higher level of thinking.

mcguire · on May 13, 2015

Someone with more experience can correct me, but "low-level Haskell" is Haskell that takes into consideration things like evaluation order and thunks. It is stuff that many other languages would still consider way-high-level.

jkarni · on May 13, 2015

Evaluation-order in a sense is more low-level in Haskell.

michaelsbradley · on May 13, 2015

A memory leak owing to the accumulation of unevaluated thunks is sometimes referred to as a "space leak".

jerf · on May 13, 2015

What are you trying to say?

It sounds like you might be trying to be snarky, as if someone is trying to sweep details under the rug, but just as "the file had its first bit flipped" is more descriptive than "the file is corrupt" but the latter is still a correct description, "unevaluated thunks" is merely a more precise categorization of "space leak", which is itself a subset of "resource leaks", after all.

nostrademons · on May 13, 2015

I interpreted his comment as simply providing more information on terminology.

michaelsbradley · on May 13, 2015

Yes, I had heard the term "space leak" used in functional language communities but wasn't sure what it meant. Asking around, the definition is commonly given as I did above.

I thought readers here might find that information useful, given that someone might describe a problem similar to the OP's by using the term "space leak".

jerf · on May 13, 2015

Ah, sorry. The hypersensitive snark detector has been taken out back and flogged.

noahlt · on May 13, 2015

> The culprit turned out to be JSON strings containing zillions of backslashes.

My guess is that there was a feedback loop somewhere, with the JSON output of a process being fed into the same process as input, adding extra backslashes each time the data went through the loop.

digi_owl · on May 13, 2015

Well as long as it's not a JSON variant of a billion laughs...

allochthon · on May 14, 2015

> The culprit turned out to be JSON strings containing zillions of backslashes.

Sounds a little like doubly-encoded JSON. Not that I would know about or have ever been responsible for something like that.

userbinator · on May 14, 2015

You can do unescaping in constant time and space using a state machine. Coroutines would be the ideal way to express this.

JSON strings containing zillions of backslashes. (I have no idea why.

I've seen source code where someone was fond of using forward and backslashes in comments as a banner. Perhaps that might be related.

falcolas · on May 14, 2015

So, what happens if the size of the JSON chunk to unescape is greater than the size of available memory? One of the benefits of streaming APIs is trading off CPU for memory.

Of course, it sounds like if it hit the pathological edge case in that setup, it would be in just of bad a position.

prog2 · on May 13, 2015

In my experience, this is a huge problem with Haskell. Laziness, for all of its benefits, makes it difficult to reason correctly about performance in large code bases. I think non-strict evaluation by default probably was a mistake. Other than that, Haskell is great.

speede · on May 15, 2015

This is primitive C practice: allocate a "bigger buffer than you'll ever need"--a siren call for exploiters. There's a simpler way: when you see a backslash, start a (strict) counter.

frozenport · on May 14, 2015

Silly question: why does a string builder need or use escape characters? If you dump to JSON why can't you escape at the very end, with a sikple iteration theough all the characters?

Kluny · on May 13, 2015

Is "thunk" a technical term?

rifung · on May 13, 2015

Yes, a "thunk is a value that is yet to be evaluated."

https://wiki.haskell.org/Thunk

Kluny · on May 14, 2015

Cool, thank you.

pekk · on May 13, 2015

Yes. Why do you ask?

Kluny · on May 14, 2015

Because I want to know.

serve_yay · on May 13, 2015

Sometimes, yes. But which times? As ever, therein lies the rub