I think the main difference is that the CCL would compute 'enclosed areas' on th...

I think the main difference is that the CCL would compute 'enclosed areas' on the fly. To be more specific, it would first create a mask (black & white) of pixels that are similar to the clicked pixel, and then find connections between foreground/white pixels.

I don't work in web development but accelerating CCL algorithms do happen to be my area of expertise. While their execution time depends on the image content, you can expect them to be in the tens of milliseconds for a 4K image, at least for the 'modern' one you can find in OpenCV, and on current consumer-level CPUs, without multi-threading.

Of course, implementing these newer algorithms isn't straightforward. That's why you should instead use something like OpenCV if you have the option to.

Moreover, if CCL doesn't fit what you're looking for then you can also check out watershed algorithms ( https://en.wikipedia.org/wiki/Watershed_%28image_processing%... ). Because they don't work on binary image, they might be a better way to describe what you call 'enclosed space'.