Memory safety? Garbage collection? Library ecosystem?

nine_k · on Dec 27, 2021

RAII is a kind of garbage collection, too.

Longer-lived objects still need (trickier) manual deallocation.

josephcsible · on Dec 27, 2021

> RAII is a kind of garbage collection, too.

No it isn't. With RAII, you can look where an object gets constructed and know exactly where it will be destructed. With garbage collection, you can't, and in fact there's no guarantee that it ever will be. Also, with garbage collection, you can save references to whatever you like, wherever you like, for as long as you like. With RAII, you need to make sure you don't create any dangling references or use any dangling pointers.

tsimionescu · on Dec 27, 2021

No, with RAII you still need to design your program around who owns each object, and thus who should clean it up. You end up with borrowing, move semantics and others. With (Tracing/Copying) Garbage Collection, none of this exists.

Not to mention, Copying GC also solves memory fragmentation, which C++ still suffers from unless you also design your allocations carefully around sizes of types.

pkolaczk · on Dec 27, 2021

> No, with RAII you still need to design your program around who owns each object, and thus who should clean it up

With or without RAII you should design your program around who owns each object, unless you want to end up with unmaintainable mess leaking file descriptors, network sockets, native memory buffers or trying to access resources after closing them. Which is why Cassandra and Netty implement their own reference counting.

> Not to mention, Copying GC also solves memory fragmentation

Not really. It only moves the problem elsewhere so it doesn't look like fragmentation. Compacting GC needs additional memory to have a room to allocate from, and that amount of memory is substantial unless you want to do more GC than any useful work. Also it is not free from fragmentation most of the time - the heap is defragmented only at the moment right after compaction. As soon as your program logically frees a memory region (by dropping a path to it), you have temporary fragmentation until the next GC cycle, because that region is not available for allocation immediately. And there is internal fragmentation caused by object headers needed to store marking flags for GC - which can consume a huge amount of memory if your data is divided into tiny chunks.

> which C++ still suffers from unless you also design your allocations carefully around sizes of types

Modern allocators split allocations into size buckets automatically.

hayley-patton · on Dec 27, 2021

> Compacting GC needs additional memory to have a room to allocate from, and that amount of memory is substantial unless you want to do more GC than any useful work.

Not in the case of a mark-compact collector, which works entirely in place, or a mark-region collector such as Immix [0], which only copies a small fraction of the heap.

> Also it is not free from fragmentation most of the time - the heap is defragmented only at the moment right after compaction.

An improvement would be to to perform more frequent "partial" collections, such as in the Train algorithm [1]. But some collectors (such as Immix again) avoid compaction until fragmentation is considered bad enough, which seems like a fair compromise.

> And there is internal fragmentation caused by object headers needed to store marking flags for GC - which can consume a huge amount of memory if your data is divided into tiny chunks.

The description of Doug Lea's allocator [2] suggests there are also "object headers" of a sort on allocated data in dlmalloc. You could probably steal mark bits from those headers, but it is commmon to use a separate marking bit/bytemap which is separate to space where objects are allocated, and thus has none of the fragmentation you describe.

[0] https://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix...

[1] https://beta.cs.au.dk/Papers/Train/train.html

[2] http://gee.cs.oswego.edu/dl/html/malloc.html

pkolaczk · on Dec 27, 2021

> Not in the case of a mark-compact collector, which works entirely in place, or a mark-region collector such as Immix [0], which only copies a small fraction of the heap.

The mutator always allocates from a contiguous memory region. It can't allocate from the memory that was logically released, but not yet collected. So it needs more total memory than the amount of live memory in use at any time, unless you have an infinitely fast GC (which you don't have). In order to avoid too frequent GC cycles, or to allow it to run in the background, you need to make that additional amount of memory substantial.

JVM GCs typically try to keep low GC overhead (within single %), which often results in crazy high memory use, like 10x the size of the live memory set.

> but it is commmon to use a separate marking bit/bytemap

Sure, you can place it wherever you wish, but it still requires additional space.

kaba0 · on Dec 27, 2021

Your comparison would only be fair if the alternative (malloc/object pool) would not have more memory than strictly necessary either.

But malloc and friends usually do what a very basic GC would (make separate pools for differently sized “objects”).

While object pools also need much more memory unless it is full.

So all in all, GCs do trade off more memory for more efficient allocation/deallocation but that is a conscious (and sane) tradeoff to make for like 99% of applications as memory stored on RAM doesn’t consume much energy compared to doing GC cycles like a mad man. Also, it is quite configurable in case of JVM GCs.

pkolaczk · on Dec 27, 2021

The only overhead memory used by a pool allocator is the rounding to the page size. The difference from a compacting GC is that a pool allocator can allocate from the freed memory immediately after the memory was freed. So the overhead does not depend on the allocation rate, it is just a tiny constant factor.

As for the energy efficiency, I seriously doubt that bringing all memory into cache once in a while, including memory that is not needed frequently by the application, only in order to find live vs dead memory is all that energy efficient. The allocation itself is indeed typically slightly faster but the marking and compaction is additional cost you don't have to pay in manual memory management.

hayley-patton · on Dec 27, 2021

Hence why I'd suggest using partial GCs like the Train, as that would have better locality of reference almost all the time. A generational GC could have similar effects, but nurseries seem to be much larger than caches nowadays, with few exceptions.

pkolaczk · on Dec 28, 2021

Partial, generational or region based GCs still need to scan the whole heap from time to time. By bringing stuff once a while into cache they also push stuff that's actively used out of cache. Those effects are typically not visible in tiny benchmarks that allocate temporary garbage in a loop, but can get pretty nasty in real apps. LRU-cache-like memory use patterns are particularly terrible for generational GCs - because the generational hypothesis does not hold (objects die old).

Also using generational algorithms does not remove the dependency of the memory overhead on the allocation rate. Those techniques improve the constant factor, but it is still an O(N) relationship, vs O(1) for a manual allocator. If the allocation rate is too high there are basically two solutions: (1) waste more memory (use very big nurseries, oversize the heap) or (2) slow down / pause the mutator.

The industry seems to prefer (1) so that probably explains why I never see Java apps using <100 MB of RAM, which is pretty standard for many C, C++ or Rust apps; and 50x-100x memory use differences between apps doing a similar thing are not that uncommon.

kaba0 · on Dec 29, 2021

> By bringing stuff once a while into cache they also push stuff that's actively used out of cache.

I may very well be wrong, but I don’t think it is any worse than the occasional OS scheduling/syscall, etc. GCs happen very rarely (unless of course someone trashes the GC by allocating in hot loops)

Also, while a destructor is indeed O(n) it is a cost that has to be paid on the given thread, while GCs can amortize it to a separate thread.

hayley-patton · on Dec 27, 2021

> Sure, you can place it wherever you wish, but it still requires additional space.

I thought we were discussing fragmentation, i.e. where we put marking bits.

nine_k · on Dec 27, 2021

Fortunately, with GC, you can avoid thinking about many small objects you constantly allocate along the way. Most of them will get collected the next GC run as a young generation going out of function / block scope. Some of them will travel down the call graph and may end up long-living, then eventually collected.

But I agree: for anything that you want to deallocate deterministically, or at least soon enough, you need to track ownership, and care about the lifetimes. Such objects are relatively few, though.

pkolaczk · on Dec 27, 2021

> Most of them will get collected the next GC run as a young generation going out of function / block scope.

Depends on the use case. Not if you're storing them in a long living collection. Also heap allocation is costly, even in languages with fast heap allocation. It is still an order of magnitude slower than stack allocation.

> But I agree: for anything that you want to deallocate deterministically, or at least soon enough, you need to track ownership, and care about the lifetimes

It is not only that. You need ownership not only to determine lifetimes.

You need to know it in order to be able to tell if, having a reference to an object, you're allowed to update it and in what way. Is it the only reference? If it is shared, who also has it and what can it do with it? If I call "foo" on it, will I cause a "problem at a distance" for another shareholder? Being able to answer such questions directly by looking at the code makes it way easier to navigate in a big project written by other people.

In C++ if I can see a simple value or a value wrapped in a unique_ptr, I know that I can update it safely and nothing else holds a reference. If I see a shared_ptr, I can expect it is shared, so I have been warned. The intent is clear. In Rust it is even safer, because the compiler enforces that what I see is really what I get (it is not just relying on conventions).

On the flip side, GC-based languages tend to invite a style of coding where reference aliasing is everywhere and there are no clear ownerships. I can see a reference to something and I have no idea what kind of reference it is and what I can safely do with it. It is just like a C pointer. I need to rely on code comments which could be wrong (or read a million lines of code).

kaba0 · on Dec 27, 2021

That’s what OOP should handle though. You shouldn’t let internal objects escape if it is not intended.

Don’t get me wrong, I really like RAII or Rust’s compiler enforced ownership model but it doesn’t solve everything. Eg. it only disallows data races not race conditions.

Also, immutability goes a long way toward solving all that.

ameliaquining · on Dec 27, 2021

I meant tracing garbage collection. I'd say that something like 95% of allocations in real-world code can be done straightforwardly with RAII, or could be if the language supported it (and indeed gain maintainability benefits from being forced into an RAII-centric paradigm). But the remaining 5% is a real pain, and distributed over a wide variety of problems in a wide variety of domains. So tracing GC really does make life a lot easier, if you can afford it.

pkolaczk · on Dec 27, 2021

The freedom to reference anything easily from any place is a double edge sword. I agree it makes 5% of hard issues go away, but on the flip side it makes the other 95% more complex. Tracing GC is a "goto" of memory management. You may argue goto is a good thing because it offers you freedom to jump from anywhere to anywhere and you're not tied to constraints enforced by loops and functions. We all know this is not the case. Similarly being able to make a reference from anywhere to anywhere leads to programs that are hard to reason about. We should optimize for readability not the ease of writing.

ameliaquining · on Dec 27, 2021

There is no reason why you could not, in principle, have Rust-style compile-time borrow checking in a managed language.

As an extreme example (that I have occasionally thought about doing though probably won't), you could fork TypeScript and add ownership and lifetime and inherited-mutability annotations to it, and have the compiler enforce single-ownership and shared-xor-mutable except in code that has specifically opted out of this. As with existing features of TypeScript's type system, this wouldn't affect the emitted code at all—heap allocations would still be freed nondeterministically by the tracing GC at runtime, not necessarily at the particular point in the program where they stop being used—but you'd get the maintainability benefits of not allowing unrestricted aliasing.

(Since you wouldn't have destructors, you might need to use linear instead of affine types, to ensure that programmers can't forget to call a resource object's cleanup method when they're done with it. Alternatively, you could require https://github.com/tc39/proposal-explicit-resource-managemen... to be used, once that gets added to JavaScript.)

Of course, if you design a runtime specifically to be targeted by such a language, more becomes possible. See https://without.boats/blog/revisiting-a-smaller-rust/ for one sketch of what this might look like.