> I didn't really grok why we use sines and cosines.
As has been pointed out elsewhere in this thread, they have nice mathematical properties. But another important thing is that they typically work "well enough" for applications. Consider audio. Tones clearly have frequency, but they also have a position in time. Doing a sine-cosine decomposition of a whole song doesn't really make sense, since it has no way of saying that a tone on the piano is played at a given time.
So you would think that it would make sense to break the signal down into stuff with frequency and time. Some kind of wavelet probably. Maybe something that very accurately models what a human hears.
The thing is that chopping the audio stream up into windows, and decomposing those windows into sines and cosines, while a bit ad hoc, just works well enough.
> So you would think that it would make sense to break the signal down into stuff with frequency and time. Some kind of wavelet probably. Maybe something that very accurately models what a human hears.
Interestingly wavelet based compression went nowhere because although they have nice mathematical properties, when applied in a lossy compression scheme, they did not fit well with how humans perceive detail/quality, both in terms of psychoaccoustics and psychovisuals, i.e. PSNR vs subjective quality diverged more than with other systems. Not surprisingly none of the state of the art lossy compression algorithms use wavelets.
Yes, that didn't help of course. JPEG2000 was a bit more efficient than JPEG by virtue of being a newer, more computationally intensive format, not because of wavelets. A modern format like the h.265 derived HEIF still uses DCT/DST based transforms. For video the situation is worse, as AFAIK no-one has been able to come up with a decent wavelet based motion estimation algorithm.
As has been pointed out elsewhere in this thread, they have nice mathematical properties. But another important thing is that they typically work "well enough" for applications. Consider audio. Tones clearly have frequency, but they also have a position in time. Doing a sine-cosine decomposition of a whole song doesn't really make sense, since it has no way of saying that a tone on the piano is played at a given time.
So you would think that it would make sense to break the signal down into stuff with frequency and time. Some kind of wavelet probably. Maybe something that very accurately models what a human hears.
The thing is that chopping the audio stream up into windows, and decomposing those windows into sines and cosines, while a bit ad hoc, just works well enough.