A company in Berkeley is doing this: www.iqengines.com (see Developer API), and demonstration app www.omoby.com. HTML Post image and JSON/XML return label (also face, barcode, ocr, etc).
So just colours and faces at the moment. You're right that arbitrary images are ridiculously complicated, I'm hoping to start off on a smaller domain and build up :)
(edit: obviously I can return the coords of the face too, as well as coords of empty parts of the image etc, but tagging is really what I'm focusing on at the moment)
Hoping to build a freemium model out of it for image libraries to use. Happy to speak to anyone with any kind of CV / object detection knowhow.