"Managed memory is free. Not as in free beer, as in free puppy." Dev manager of ...

rjbwork · on Jan 25, 2018

>they'll totally forget that allocation is expensive no matter what the platform or runtime.

Well, it's easier to do in a managed language. When you literally don't have to agonize or obsess over every allocation because you aren't responsible for cleaning it up (unmanaged held resources withstanding), you tend not to do so.

P.S.: You're always free to drop down into C or C++ if you want to get some speed, but of course you need to clean up after yourself there. A friend of mine wrote a good guide on doing so, if anyone cares https://github.com/TheBlackCentipede/PlatformInvocationIndep...

Nuzzerino · on Jan 25, 2018

>You're always free to drop down into C or C++ if you want to get some speed

Wouldn't C# with structs and pointers do the job in many cases? I've been able to get 50-fold increases in speed through heavy optimizations, without switching to another language. Using C or C++ solely for a "speed boost" over C# is not only unnecessary, but it creates more problems than it solves. If you don't know how to optimize within C# (as a C# developer), how are you going to succeed in writing efficient C++ code?

Once you learn the nuances and limitations of making optimizations in C#, then you should start looking into how and when other languages such as C can wisely be used. To name an example, C makes it easier to micromanage assembly instructions (can be done in C# too, but not in a very practical way, and yes I mean assembly and not IL). C also contains more syntax and features which are suitable for bitwise micromanagement, whereas with C# it can be more awkward.

pjmlp · on Jan 25, 2018

Yes they would, and the C# 7 improvements taken from Midori experience make it much better.

I think in general it is a culture problem.

Those of us that embraced managed languages, including for systems programming (Oberon, D, ...), know that we can be productive 99% of the time and just have to care how to do speed boost tricks on that 1% using profiler and low level language tricks.

In C and C++ communities there is a sub-culture of thinking too much ahead of time how much each line of code costs, thus speeding too much time with design decisions that actually have zero value in the context of the application being delivered.

The problem is not taking those decisions, rather taking them without validating if they are right with a profiler, or regard to the goals that have to be met for the application.

Beyond which any low level fine tuning, while fun, is needless engineering.

sterlind · on Jan 25, 2018

Midori was so beautiful. I think it would have succeeded as a .Net runtime replacement with picoprocesses. it frustrates me that we didn't open-source it.

pjmlp · on Jan 25, 2018

As believer in GC enabled system programming languages, I do feel it was indeed a missed opportunity, specially to change the mind of those that think C and C++ are the only way to write OSes.

eni · on Jan 25, 2018

Can you please point to any resources that talk about heavy optimization options in c#. That 50-fold increase you talk about is very interesting. I would like to learn more.

pjmlp · on Jan 25, 2018

Here is a list, not easy to track all of them down, but maybe as keywords to easy googling.

- structs

- unsafe code

- stack allocation in unsafe code (think alloca())

- attribute annotations for packing and inline calls across assemblies

- ref parameters

- ref returns

- readonly ref parameters

- Span<> and Memory<>

- Native memory allocation via MarshalInterop, SafeHandles

- Buffer and ArraySegment

- SIMD (with RyuJIT)

- Profiled code cache for JIT code background generation between executions (System.Runtime.ProfileOptimization)

Nuzzerino · on Jan 25, 2018

Great list. It's important to understand when to use each one of these. Identify your bottleneck, through the use of profilers. Execution time is largely based on memory bus blocking I/O and not the CPU calculations, so if you start with writing SIMD, you're not going to get anywhere.

Accessing data on the stack instead of the heap is the #1 saver of execution time, in my experience. But your bottlenecks might be different. Locally scoped value-type variables are generally on the stack. Object-scoped and static fields and properties are on the heap.

Writes to local variables seem to be faster than reads, IIRC. The fastest operators seem to be the bitwise instructions, IIRC. If running in 32-bit mode, try to work with 32-bit integers. If running in 64-bit mode, try to work with 64-bit integers.

Here's an example of a major, major improvement in performance

for(int x = 0; x < this.Width; x++)

{

   for(int y = 0; y < this.Height; y++) { foo = bar; }

}

Much faster version (due to storing a copy of Width and Height on the stack instead of the heap):

int width = this.Width;

int height = this.Height;

for(int x = 0; x < width; x++)

{

   for(int y = 0; y < height; y++) { foo = bar; }

}

My comment here describes roughly the approach I used to take advantage of stack-allocated memory (before Span<T> was available). https://news.ycombinator.com/item?id=15136627

eni · on Jan 25, 2018

Thanks! Your example is pretty interesting. Any reason why this is the case? In both cases, it is just accessing a memory location to read the value. Are there compiler optimization heuristics at play here? E.g., for the local variable compiler knows that its value is not changing during the loop execution, so it can be pushed to register for faster access.

Nuzzerino · on Jan 25, 2018

Register access isn't the issue. In the first example, this.Width and this.Height are accessing the Width and Height property of the current object. This requires a heap fetch on each iteration of the loop. There may be OS-specific nuances with automatic caching that I can't remember clearly enough to reliably mention.

If you can get rid of all heap lookups in your iterative loop, then you'll see a large speed boost if that was the bottleneck. Local variables exist on the stack, which tends to exist in the CPU cache when the current thread is active. https://msdn.microsoft.com/en-us/library/windows/desktop/ms6...

Unfortunately, method calls in C# have a much higher overhead than in C and C++. If you must do a method call in your loop, be sure to read this to see if your method can be inlined. Only very small methods of 32 IL bytes or less can be inlined: https://stackoverflow.com/questions/473782/inline-functions-...

eni · on Jan 25, 2018

Thanks!

Nuzzerino · on Jan 25, 2018

Also be sure to check this out https://gist.github.com/jboner/2841832 Notably the L1/L2 cache vs the main memory reference