actually on further examination i think each four iterations of the loop paint one pixel in the most common case, because each pixel in a large filled area gets visited from each of its neighbors
btw, i want to point out that shaneos's request for an 'algorithm [that] can run in under 16ms for arbitrarily large images' is obviously constructed to be impossible
no flood-fill algorithm can run in less than 16ms, or less than any finite time, for arbitrarily large images; they necessarily require at least linear work and, even with arbitrary parallelism (itself impossible in a finite universe), logarithmic time
you need to bound the image size if you are going to bound the execution time
so this is not a good faith request; it is, rather, a deliberately impossible task, like spinning straw into gold, finding an acre of land between the salt water and the beach, making a cambric shirt with no seams or needlework, or reaping a crop with a sickle of leather
unfortunately my previous comment pointing this out has been flagged and is therefore generally no longer visible; i consider this sort of response to a purely logical argument of impossibility to be astoundingly mean-spirited and contemptible
it also has 10 other method and function calls in it, one of which allocates a new 4-element array
also it's maintaining a stack of pixels to paint instead of looping over a scan line
each iteration of the loop paints one pixel
i don't think canvas memory access patterns are the root of the performance problem
i'm not saying it's bad code, just that it doesn't justify your conclusion