Visualising the .NET Garbage Collector

lemming · on June 21, 2016

The linked part 1 is excellent, and includes a fascinating example of how to use the .NET GC outside of .NET itself. This seems like a really interesting capability, the only general-purpose open source GC I was aware of was the Boehm one. I'd expect the .NET one to have had a lot more performance tuning.

matthewwarren · on June 21, 2016

Yeah it was finding that stand-alone sample that inspired me to write that first part. It's so cool that you can easily run the full .NET GC, albeit in a simplified environment

simfoo · on June 21, 2016

The linked blog post about the crazy hacks needed to lessen the impact of the gc on a server got me thinking. Is it possible to implement a non-gc language on top of the CLR? For example could it represent RAII with an imaginary destructor in C# reasonably well?

matthewwarren · on June 21, 2016

There's a couple of issues on the CoreCLR GitHub page that touch on this idea, i.e. a replacement GC, see - https://github.com/dotnet/coreclr/issues/732 - https://github.com/dotnet/coreclr/issues/430

In-particular this comment https://github.com/dotnet/coreclr/issues/430#issuecomment-78...:

> The "alt-gc" direction is possible, but not as easy.

> The GC features that the CoreCLR runtime depends on (for full functionality) are pretty rich. It includes features that are not always found in experimental GC implementations out there such as interior pointers or pinning. It is non-trivial to build an alternative implementation of the rich advanced feature set.

> Integrating a new GC was actually done as research project back in 2003 on top of Rotor. Some of the difficulties highlighted by the project report are not valid anymore - the interface between the CoreCLR runtime and GC is much cleaner than it used to be back in 2003. However, a lot of it is still holds - the issues listed in 5.1 "Integration Issues and Solutions" in particular.

pjmlp · on June 21, 2016

Kind of, C++/CLI for example.

https://msdn.microsoft.com/en-us/library/68td296t.aspx

https://msdn.microsoft.com/en-us/library/ms379617%28v=vs.80%...

Contrary to the JVM, the CLR was designed to support any strong typed memory language. The DLR came afterwards to fix the holes making dynamic ones run well.

So the CLR bytecode does support more features than each MS language exposes.

Another fact is that CLR was going to be based on COM, not bytecode but eventually with Java picking up steam, they went with the CLR design instead.

The new WinRT model is the return to the original idea to use COM and everything native.

You don't really need imaginary destructors in C#, using, structs and closure FOP patterns already help a lot. C# 7 will make it even better.

kaushiks · on June 21, 2016

It might be, depending on exactly what you'd like to implement. There are two things at play here:

1. The memory that holds the resource. 2. The resource itself.

C++ RAII lets you control the lifetime of both. In a GCd language, the GC takes control over (1) but can't, over (2). Instead it gives you the ability to do 2 yourself (much like C++ does using custom destructors) using finalizers. Due to the way a GC works though, it is hard to be able to say when (or in the context of which thread) your finalizer would run. Now, the GC does what it does so it can guarantee that you are never able to hold a pointer to something that some other part of your application has deemed dead. While it is impossible to take over (1), you could imagine a world where (2), in a GCd environment is still up to the programmer, and is exposed (say) as a language destructor. In the presence of such a feature you run the risk of committing errors such as (say) closing a socket twice - but it isn't much different from say, code calling a poorly implemented Close (IDisposable) on an object twice.

junke · on June 21, 2016

You never rely on finalizers to clean up resources. At best, you check that the resource is properly released and signal an error otherwise, because you should have released it earlier (you have a bug). That may happen when you manually manage your resources, but most of the time, you only need to use a resource inside a delimited block (defer, finally, unwind-protect, with X). If I understand correctly, this is how RAII works too when you allocate objects on the stack.

kaushiks · on June 21, 2016

You're missing the point. The original question (and hence my answer) isn't about whether or not it is a good idea to use finalizers. I'm merely pointing out that both in the case of RAII and a GC, only the lifetime of memory allocated is being managed. Not the resource contained in it. They use different mechanisms to let the developer deal with the resource. In the case of RAII the time at which clean up code is executed is deterministic (destructors), whereas with a GC, it is not (finalizers). Said mechanism, can be implemented, should a language choose to, regardless of whether or not the underlying memory allocation scheme is automatic (GC). (IDisposable is mostly a made up thing that C# has syntactic sugar for, that lets developers eagerly release resources (other than memory) when they're done with it. My point is, neither the language nor the runtime makes any effort to enforce its usage, like it does in the case of memory allocation.)

tigershark · on June 21, 2016

You are doing it wrong if you want to use a finalizer / destructor. The right thing to do is to have the container manage the lifecycle for disposable resources. In case they are transient then it's up to you to remember about "using" or "Dispose". I can't see how you can dispose twice something by error if you implemented correctly the disposable pattern.

Someone · on June 21, 2016

I doubt it. Visual C++ does RAII in the CLR (https://msdn.microsoft.com/en-us/library/ke3a209d.aspx#BKMK_...), but again, that's for reference types only (https://msdn.microsoft.com/en-us/library/ms235315.aspx)

Visual C++ also has an unmanaged heap, though. If you restrict your program to use that, you would think you avoided the garbage collector, but reading https://msdn.microsoft.com/en-us/library/yk97tc08.aspx, that isn't the case ("The system uses the CLR garbage collector mechanism to determine if the object is no longer being used and can be deleted")

So, Microsoft hasn't managed to avoid using the garbage collector.

You still could write a language that didn't rely on it, though. For example, you could prevent anything from being garbage collected by maintaining a global hash map containing every explicitly allocated object (possibly with optimizations for local objects that the compiler can prove do not escape)

Alternatively, you could have your runtime do everything in one huge byte array. I doubt that would be efficient, though.

[and in case you don't know: C# has real destructors (https://msdn.microsoft.com/en-us/library/66x5fx1b.aspx). Unfortunately, they can only be used with classes.]

zamalek · on June 21, 2016

> avoid using the garbage collector.

The reason for that is likely because many libraries would simply stop working. Anyone who has written lock-free C++ and lock-free managed understands the fundamental reasons why explicit memory management could never be added. Even if you exclude threading, things remain broken: the entire BCL would need to be rewritten and most likely suffer drastic API changes. The problem of allocation lifetime impacts API design to a very large degree. As soon as you call code that assumes implicit allocation management (BCL), you lose all guaranteed knowledge about lifetime.

What would be nice is to be able to tell the GC something along the lines of "this weak reference is pointing to something that is most likely garbage." It could then tune its heuristics for that object and collection in general.

> huge byte array

Huge struct arrays are commonplace in perf critical code (e.g. particle systems). They work quite well because they are O(1) during marking and should quickly promote to G2.

josteink · on June 21, 2016

> For example could it represent RAII with an imaginary destructor in C# reasonably well?

Not going to comment on the rest of your post, but just in case you didn't know, there's no need for an imaginary destructor in C#, as it already has one.[1]

The reason it's not really much known (or used) is because of its indeterminstic GC-tied nature.

I think the only "valid" usage I've seen for it is as a fallback method for IDisposable objects which are not getting cleaned up as they should (and as a way of tracing that in debug-builds).

[1] https://msdn.microsoft.com/en-us/library/66x5fx1b.aspx

vishbar · on June 21, 2016

The "canonical" implementation of IDisposable includes a finalizer. As you said, that's the only time I've seen it used. Adding finalizers to classes has performance implications as well as the CLR will add any object with a finalizer to the finalizer queue (hence the GC.SuppressFinalize(this) you always see in implementations of IDisposable).

Arnavion · on June 21, 2016

The .Net runtime already provides you the tools to implement RAII. The library exposes this functionality canonically with `IDisposable` [1] When you're done with your object, you call `obj.Dispose()` on it.

Your language could automatically derive `IDisposable` (or any similar interface) in the same way for every type, and insert calls to `this.Dispose()` based on scope.

Of course the .Net library itself requires GC. You can't change that.

[1] https://msdn.microsoft.com/en-us/library/ms244737.aspx

kodfodrasz · on June 21, 2016

I would not consider IDisposable as a true RAII solution, but that comes the closest

kodfodrasz · on June 21, 2016

The impact is GC is negliable for most workloads. First try it, and if you find problems then profile, and optimize, if needed.

AFAIK the CLR is GC only. You can emulate RAII, but still the GC will run and purge the objects from the heap.

In the CLR allocation is blazing fast: just increment the heap head pointer. You don't have to take care of problems traditional allocators face: find a big enough fitting free hole. This has a cost: when objects are destroyed, the free memory on the heap becomes fragmented. This needs to be compacted, and this is where the GC pause is needed, as the references for moved objects need to be updated. Basically the cost of finding a free hole in memory at allocation was moved here.

This was done intentionally, as studies found that most real world applications produce a load where most objects are short lived: this cheap allocation is needed, and a multigenerational GC can take care of them cheaply after destroyed (as for most of the time you only scan and free the youngest generation for unreachable objects, limiting the GC pause time).

The bottom line is: premature optimalization is the root of all evil :D If you need hacks you either chose a bad tool for the problem, or more often there is some problem in your application, maybe even at architecture level.

If you need real-time responses, use non-managed code. You can take advantage of the interop capabilities of the CLR (quite easy and handy, provided you are careful around the corners. Mono had a nice thorough doc on the topic), moving that code to a non-managed part of the application, maybe even running that on a different OS thread (this is just a guess, I have never resorted to this kind of hack), but usually the GC will not pose a great problem.

EDIT: Also, you can create value types in the CLR (struct in C#), which are always stack allocated. They work like RAII, but only some framework types are such, and there is a rule of thumb that you do not want to create them if they are larger than twice the word size of the processor you are mainly targeting, as the are passed by value, which makes a lot of copying. Also boxing can cause problems, thus they are best use as immutable types, which limits their usefulness for certain scenarios, yet because of autoboxing otherwise serious unexpected behaviours could arise. (If I remember well assigning to an interface typed variable boxes a struct, thus if mutable changes to this variable do not affect the original variable which was assigned.)

On the other hand they can be passed by reference (ref, in, out keywords in C#), which makes them first class citizens, unlike in Java. Also Generics work as expected for them, with no unnecessary boxing taking place.

EDIT 2: some typo cleanup

I suggest reading the great book CLR via C# from Eric Lippert if you are interested.

taspeotis · on June 21, 2016

> you can create value types in the CLR (struct in C#), which are always stack allocated

"always stack allocated" for certain values of "always" [1].

[1] https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-...

kodfodrasz · on June 21, 2016

Yeah, you are right! :D

just edited into to avoid being misleading, and pointed the readers to The Authentic Source on the topic :)

Also I have read the book some time ago, but recently I work with Java, so forgot some stuff. :(

matthewwarren · on June 21, 2016

just a small correction, "CLR via C#" is written by Jeffrey Richter. Eric Lippert is the C# compiler/language guru

ryanlm · on June 20, 2016

Is the GC open source?

TwoFx · on June 20, 2016

The CoreCLR version of the .NET Garbage Collector can be found at https://github.com/dotnet/coreclr/tree/master/src/gc

niftich · on June 20, 2016

Not sure if this is the same exact GC because the family tree of .NET Core is complicated, but the CoreCLR project has a GC here:

https://github.com/dotnet/coreclr/tree/master/src/gc

matthewwarren · on June 20, 2016

The post is about the Desktop GC, but AFAIK everything applies to the CoreCLR you linked to. I don't think that much has changed (yet) with the GC on CoreCLR.

spo81rty · on June 21, 2016

Great post! Would you mind if I added this to our Prefix profiler? Would be a fun little feature. In case you aren't familiar: http://Prefix.io