Hacker News new | past | comments | ask | show | jobs | submit login

The fill algorithm is far from being properly optimized:

1. Use pointIdx instead of a string for fill keys. That's going to be way faster and memory efficient

2. There's no need to have separate added and visited. Just keep added and continue checking it before adding to the queue, and don't delete added entries

3. Even better, don't even use added but instead just check the output image data

4. Compute pointIdx incrementally by adding or subtracting 1 or width, only do a multiplication for the first point

5. Don't keep y coordinates in the loop, instead store the minimum and maximum pointIdx and then divide at the end to find the y coordinates (you still need to keep x to compute the x minimum and maximum if you need it, unless you can make the image have a power of two stride, in which case you can just mask pointIdx to compute x)

6. Use a black border instead of doing boundary checks

7. The % 5000 check is horribly slow since division is horribly slow. Use a power-of-two size and a bitwise and to be safe

8. Doing an update every 5000 pixels seems absurdly low. CPUs do billions of operations per second and a pixel should not take more than 100 CPU cycles with proper optimization, so even at around 100 fps you should be able to do in the magnitude of 100K pixels per frame, which means you shouldn't even need to do it incrementally or pre-process. If you really need incrementality you should use both a pixel number check and a timer and adaptively double or halve the pixel number if the timer check gives a too low or high time elapsed.

9. Using "chunks of pixel" is probably faster than single pixels and using horizontal groups of chunks even better, although that requires significantly more code

10. WebAssembly, with SIMD if available, is going to be faster than JavaScript

In general, this code has clearly not been written by someone experienced in writing optimized code, so I assume the pre-processing can also be greatly optimized (if it's even necessary after optimizing the filling code).

Also, there is no need to limit to 256 areas since you can use multiple images and all channels to store larger integers.




Thanks for the great write up - pull requests are welcome.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: