Measuring the Impact of the .NET Garbage Collector

Locke1689 · on Sept 7, 2014

FYI, as soon as you start getting serious about profiling, you should use PerfView[0]. It's what we use in Roslyn for almost all investigations.

[0] http://www.microsoft.com/en-us/download/details.aspx?id=2856...

matthewwarren · on Sept 7, 2014

Yeah I keep meaning to dig into PerfView a bit more, I've only scratched the surface of it.

BTW I really like the Perf stuff in Roslyn, I wrote 2 posts about it, in case your interested? Although as you worked on it, it won't be anything you don't already know ;-)

http://mattwarren.org/2014/06/05/roslyn-code-base-performanc... http://mattwarren.org/2014/06/10/roslyn-code-base-performanc...

Locke1689 · on Sept 8, 2014

Seems pretty accurate to me.

matthewwarren · on Sept 8, 2014

Thanks for taking the time to read it!

x0x0 · on Sept 7, 2014

I've worked on a product that used the classic tricks (large byte[] arrays, access through sun.misc.Unsafe, indexing instead of references, etc) and while it was quite fast, it was written by someone who worked at Azul and deeply understood the jvm and gc. Personally, it makes me think D is the right solution: gc for 99.99% of your objects, but for the performance critical bits or where you're fighting the gc, opt out of gc and manage memory by hand.

Also, theUnsafe makes me snicker

    .Unsafe.class.getDeclaredField("theUnsafe");

matthewwarren · on Sept 7, 2014

Yeah there definitely comes a point with .NET/Java where, to get very high performance, you are fighting the language/runtime.

The general argument is that you are still more productive by writing 90% or 95% of you app in a managed language, in the idiomatic way. Then you use crazy tricks to tune the last 10/5%. Rather than doing the whole thing in C/C++, which will give better performance, but may not be quicker (more productive) to write.

I don't know much about D, it's interesting to find that you can opt out of GC like that.

x0x0 · on Sept 7, 2014

I haven't done it, but according to a tutorial [1] you can directly access malloc / free and bypass the gc

[1] http://qznc.github.io/d-tut/memory.html

matthewwarren · on Sept 7, 2014

Thanks for the link, that's nice to know.

You get some of that functionality in .NET, with the new GC mode SustainedLowLatency [1]. But it doesn't guarantee no GC, it just tried to avoid it.

> Enables garbage collection that tries to minimize latency over an extended period. The collector tries to perform only generation 0, generation 1, and concurrent generation 2 collections. Full blocking collections may still occur if the system is under memory pressure.

[1] http://msdn.microsoft.com/en-us/library/system.runtime.gclat...

barrkel · on Sept 7, 2014

You can directly access malloc and free in .NET as well, it's just more inconvenient.

Locke1689 · on Sept 8, 2014

It's also usually a bad idea. Memory allocated via PInvoke or similar APIs is basically opaque to the garbage collector. This can produce undesirable behavior, from polluting your code to interfering with GC due to memory fragmentation. Marshalling also isn't free.

Basically, if you're at the point where the GC is impacting you but if you haven't tried Roslyn-level optimizations -- do that first.

barrkel · on Sept 8, 2014

Fragmentation is unlikely - they'll be completely different heaps, and on 64-bit probably far apart. Marshalling costs only in so far as you work with safe code. C++/CLI is working at a slightly different level.

But I agree that working with the grain of the GC is usually more productive.

Locke1689 · on Sept 9, 2014

That seems like a lot of assumptions. I don't think I'd be OK with generally advocating based on all those assumptions. For example, you never touched on what would happen if you allocated enough in native and managed that you start to get significant memory pressure -- collecting with paging is almost impossible, so the CLR goes into panic mode in an attempt to prevent paging and a large portion of the heap would be untouchable/immovable.

barrkel · on Sept 10, 2014

Are you confusing physical memory with address space?

There's no good reason for the managed heap to be anywhere near any of the native heaps in address space, on 64-bit platforms.

And the CLR's GC should actively allocate slabs well away from any native heap (trivial to do - reserve (not commit, reserve) a big contiguous chunk of address space), simply because it relies on third party code which will itself be allocating native memory; everything from GUI code to native DB drivers and their caches, quite independent of unsafe code doing manual allocation.

In the absence of a GC-aware virtual memory manager, GC-immovable memory has little relevance to paging.

(Of course, GC.Add/RemoveMemoryPressure should be called if you're doing native allocation from .net.)

akgoel · on Sept 7, 2014

Don't unsafe pointers in C# also opt you out of GC?

matthewwarren · on Sept 7, 2014

Yeah pretty much, you can use fixed [1], to access managed memory in an unsafe way. The GC then treats this as pinned and won't relocate it.

[1] http://msdn.microsoft.com/en-gb/library/f58wzh21(v=vs.71).as...

Padding · on Sept 8, 2014

What many people seem to miss in these sorts of discussions is that "pauseless" usually also means less throughput. I guess it's not different from real-time systems being slower in practice than non-deterministic ones.

If what Azul has worked on conventional Client/Server-JVMs, Sun/Oracle/IBM would've adopted it long ago.

The sad reality is that spending 99% of your time in GC, but never in a pause longer than "x units of time" qualifies as "pauseless" and if you want more throughput, you have to scale up the system so that the remaining 1% of througput is sufficiently large for your workload.

I guess this works in the markets where Azul is active (things like finance) but is useless in more conventional use cases.

electrum · on Sept 8, 2014

Azul's garbage collector is patented and until recently required specialized hardware: http://www.azulsystems.com/products/vega/processor

Recent advances in Intel CPUs have allowed them to run on commodity hardware, but last I looked, it still required a custom kernel module.

molixiaoge · on Sept 8, 2014

great