Hacker News new | past | comments | ask | show | jobs | submit login

Great list. It's important to understand when to use each one of these. Identify your bottleneck, through the use of profilers. Execution time is largely based on memory bus blocking I/O and not the CPU calculations, so if you start with writing SIMD, you're not going to get anywhere.

Accessing data on the stack instead of the heap is the #1 saver of execution time, in my experience. But your bottlenecks might be different. Locally scoped value-type variables are generally on the stack. Object-scoped and static fields and properties are on the heap.

Writes to local variables seem to be faster than reads, IIRC. The fastest operators seem to be the bitwise instructions, IIRC. If running in 32-bit mode, try to work with 32-bit integers. If running in 64-bit mode, try to work with 64-bit integers.

Here's an example of a major, major improvement in performance

for(int x = 0; x < this.Width; x++)

{

   for(int y = 0; y < this.Height; y++) { foo = bar; } 
}

Much faster version (due to storing a copy of Width and Height on the stack instead of the heap):

int width = this.Width;

int height = this.Height;

for(int x = 0; x < width; x++)

{

   for(int y = 0; y < height; y++) { foo = bar; }
}

My comment here describes roughly the approach I used to take advantage of stack-allocated memory (before Span<T> was available). https://news.ycombinator.com/item?id=15136627




Thanks! Your example is pretty interesting. Any reason why this is the case? In both cases, it is just accessing a memory location to read the value. Are there compiler optimization heuristics at play here? E.g., for the local variable compiler knows that its value is not changing during the loop execution, so it can be pushed to register for faster access.


Register access isn't the issue. In the first example, this.Width and this.Height are accessing the Width and Height property of the current object. This requires a heap fetch on each iteration of the loop. There may be OS-specific nuances with automatic caching that I can't remember clearly enough to reliably mention.

If you can get rid of all heap lookups in your iterative loop, then you'll see a large speed boost if that was the bottleneck. Local variables exist on the stack, which tends to exist in the CPU cache when the current thread is active. https://msdn.microsoft.com/en-us/library/windows/desktop/ms6...

Unfortunately, method calls in C# have a much higher overhead than in C and C++. If you must do a method call in your loop, be sure to read this to see if your method can be inlined. Only very small methods of 32 IL bytes or less can be inlined: https://stackoverflow.com/questions/473782/inline-functions-...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: