Clickbait (in the context of Rust). In languages with managed memory there are n...

deathanatos · on Jan 18, 2024

… Box::leak¹ is a function that exists. That seems like a memory leak, no?

Less tongue-in-cheek, if a program allocates far more memory than expected of it, I going to colloquially called that a "memory leak". If I see a Java program whose RSS is doing nothing but "up and to the right" until the VM runs out of memory and dies a sweet sweet page thrashing death, I'm going to describe that as a "memory leak". Having someone tell me, "well, actually, it's not a leak per se it's just that the JVM's GC didn't collect all the available garbage prior to running out of memory because …" … I don't care? You're just forcing me to wordsmith the problem description —-the problem is still there. Program is still dead, and still exceeding the constraints of the environment it should have been operating in.

The author had some assumptions: that Vec doesn't overalloc by more than 2x, and that collect allocates — one of those did turn out to be false, but I think if I polled Rust programmers, a fair number of them would make the wrong assumption. I would, and TIL from this article that it was wrong, and that collect can reuse the original allocation, despite it not being readily apparent how it knows how to do that with a generic Iterator. (And, the article got me to understand that part, too!)

Unlike most clickbaits which lure you in only to let you down, I learned something here. Reading it was worthwhile.

¹https://doc.rust-lang.org/stable/std/boxed/struct.Box.html#m...

jodrellblank · on Jan 18, 2024

> "If I see a Java program whose RSS is doing nothing but "up and to the right" until the VM runs out of memory and dies a sweet sweet page thrashing death, I'm going to describe that as a "memory leak"."

By this definition, if a program reads in a file and you point it to a small file then the program does not have a memory leak, but if you point it to a large enough file, then the program does have a memory leak. Whether or not a program has a memory leak doesn't depend on the code of the program, but how you use it. But then on a bigger computer, the program doesn't have a memory leak anymore.

That seems a less useful definition than the parent poster's / the common definition.

deathanatos · on Jan 19, 2024

… that's really not the idea I'm trying to convey with the comment.

Clearly, if you feed a program a larger file that it is going to read into memory to process, it is then expected that it will consume more resources on account of it doing more work. But that is memory being expended on visible, useful work. All of the examples in the comment are referring to memory being "allocated" (in the sense of being assigned to the program) but not fulfilling any visibly useful function insofar as the operator/programmer can see: Java's GC being unable to effectively reclaim unused memory prior to killing a machine, the OP's example of a Vec allocating without (seemingly) have a purpose (…as it is excess of what is required to allow for amortized appends).

Dylan16807 · on Jan 19, 2024

There is an implied steady state in what the program is doing. If it goes right from "loading" to "exit" then you need a more complicated analysis.

When you have that steady state, that definition looking at uncontrolled growth is more useful than trying to dissect whether the memory is truly unreachable or only practically unreachable.

nemothekid · on Jan 19, 2024

>I don't care? You're just forcing me to wordsmith the problem description

Yes, because if you don't define the problem clearly, the problem won't be solved. Java being inefficient with memory use doesn't mean any memory was leaked.

Memory leaks can be tricky to track down, and if I spent 6 hours looking for a memory leak only to come back and found out you meant it uses more memory than what's efficient I'd be pissed I wasted 6 hours because you wanted to save 5 minutes.

Dylan16807 · on Jan 19, 2024

"uses more memory than what's efficient"

There is a hidden memory store using orders of magnitude more RAM than the live data. Why do we need to nitpick exactly how hidden it is? Are you going to be mad if I don't know whether it's literally inaccessible or not?

nemothekid · on Jan 20, 2024

Because there are legitimate reasons why memory can be allocated. This is like calling your OS cache a memory leak when you open up Task Manager and see you only have 400MB free. A memory leak implies memory that is lost for good and it no longer being kept track of.

Consider it this way - if I had a program that connected to a database and used a connection pool to improve performance, would it be a "connection leak" that 5 connections were opened even though the database was idle?

The framing here is similar - Rust, in an attempt to improve performance reused large memory allocations. Some applications do this on purpose and call it buffer pools.

Dylan16807 · on Jan 20, 2024

Except this memory is not being intentionally used as a cache/buffer.

Rust attempts to keep the vector capacity in a range for that purpose, and failed to do so here.

No matter what, it's a bug. So none of those possible justifications fit, because it's a bug.

For the database analogy, I would call it a connection leak if the number of idle connections greatly exceeded the amount that had ever been simultaneously busy and they weren't actually getting reused.