Yeah I keep meaning to dig into PerfView a bit more, I've only scratched the surface of it.
BTW I really like the Perf stuff in Roslyn, I wrote 2 posts about it, in case your interested? Although as you worked on it, it won't be anything you don't already know ;-)
I've worked on a product that used the classic tricks (large byte[] arrays, access through sun.misc.Unsafe, indexing instead of references, etc) and while it was quite fast, it was written by someone who worked at Azul and deeply understood the jvm and gc. Personally, it makes me think D is the right solution: gc for 99.99% of your objects, but for the performance critical bits or where you're fighting the gc, opt out of gc and manage memory by hand.
Yeah there definitely comes a point with .NET/Java where, to get very high performance, you are fighting the language/runtime.
The general argument is that you are still more productive by writing 90% or 95% of you app in a managed language, in the idiomatic way. Then you use crazy tricks to tune the last 10/5%. Rather than doing the whole thing in C/C++, which will give better performance, but may not be quicker (more productive) to write.
I don't know much about D, it's interesting to find that you can opt out of GC like that.
You get some of that functionality in .NET, with the new GC mode SustainedLowLatency [1]. But it doesn't guarantee no GC, it just tried to avoid it.
> Enables garbage collection that tries to minimize latency over an extended period. The collector tries to perform only generation 0, generation 1, and concurrent generation 2 collections. Full blocking collections may still occur if the system is under memory pressure.
It's also usually a bad idea. Memory allocated via PInvoke or similar APIs is basically opaque to the garbage collector. This can produce undesirable behavior, from polluting your code to interfering with GC due to memory fragmentation. Marshalling also isn't free.
Basically, if you're at the point where the GC is impacting you but if you haven't tried Roslyn-level optimizations -- do that first.
Fragmentation is unlikely - they'll be completely different heaps, and on 64-bit probably far apart. Marshalling costs only in so far as you work with safe code. C++/CLI is working at a slightly different level.
But I agree that working with the grain of the GC is usually more productive.
That seems like a lot of assumptions. I don't think I'd be OK with generally advocating based on all those assumptions. For example, you never touched on what would happen if you allocated enough in native and managed that you start to get significant memory pressure -- collecting with paging is almost impossible, so the CLR goes into panic mode in an attempt to prevent paging and a large portion of the heap would be untouchable/immovable.
Are you confusing physical memory with address space?
There's no good reason for the managed heap to be anywhere near any of the native heaps in address space, on 64-bit platforms.
And the CLR's GC should actively allocate slabs well away from any native heap (trivial to do - reserve (not commit, reserve) a big contiguous chunk of address space), simply because it relies on third party code which will itself be allocating native memory; everything from GUI code to native DB drivers and their caches, quite independent of unsafe code doing manual allocation.
In the absence of a GC-aware virtual memory manager, GC-immovable memory has little relevance to paging.
(Of course, GC.Add/RemoveMemoryPressure should be called if you're doing native allocation from .net.)
What many people seem to miss in these sorts of discussions is that "pauseless" usually also means less throughput. I guess it's not different from real-time systems being slower in practice than non-deterministic ones.
If what Azul has worked on conventional Client/Server-JVMs, Sun/Oracle/IBM would've adopted it long ago.
The sad reality is that spending 99% of your time in GC, but never in a pause longer than "x units of time" qualifies as "pauseless" and if you want more throughput, you have to scale up the system so that the remaining 1% of througput is sufficiently large for your workload.
I guess this works in the markets where Azul is active (things like finance) but is useless in more conventional use cases.
[0] http://www.microsoft.com/en-us/download/details.aspx?id=2856...