If you dither properly, then if you use a small number of bits you'll have a sine wave with noise and with more bits you'll have a sine wave with less noise.
If you dither improperly, then you'll end up with a sine wave plus some noise (less than if you had dithered) plus some harmonic distortion.
When talking about audio, one typically doesn't compare in the time domain, since that's not even remotely how ears work. One compares in the frequency domain, and until your quantization error is very large compared to the signal, there will still be some of the original content in the result.
If you consider the original to have been an unquantized, but already sampled grayscale image, then essentially all of the pixels have large quantization error (since they were originally some shade of gray, but now they are all either black or white), so in one sense the original image is gone. But in another sense what you see is the original image, plus some noise.
If you dither improperly, then you'll end up with a sine wave plus some noise (less than if you had dithered) plus some harmonic distortion.
When talking about audio, one typically doesn't compare in the time domain, since that's not even remotely how ears work. One compares in the frequency domain, and until your quantization error is very large compared to the signal, there will still be some of the original content in the result.
This isn't a great analogy, as eyes and ears operate quite differently, but it does sort of get to the issue. Look at this image: http://en.wikipedia.org/wiki/Dither#mediaviewer/File:1_bit.p...
If you consider the original to have been an unquantized, but already sampled grayscale image, then essentially all of the pixels have large quantization error (since they were originally some shade of gray, but now they are all either black or white), so in one sense the original image is gone. But in another sense what you see is the original image, plus some noise.