There are image-representation versions of wavelets that would work well in that context, with some tolerance/quantization of the frequency representation to accommodate fuzzy edges, and likewise for nearby hues.
Perceptual color representation gets a bit harder but if you're only looking at gamut differences on cameras/screens/printed media I think it's feasible.
Alternatively, if you know a lot about the source image you can train a NN for the specific application.