Thanks for this comment. I mostly agree with you. A music theory professor used to lazily say that "nature says so" when asked why a major 3rd sounds better than an augmented 4th (i.e. tritone). Lacking a better explanation I assumed this had to do with consonant chords having notes whose wavelength had a lesser common multiple, but am not entirely sure. Can you provide some citations or resources describing the mathematical justification for a 12-tone basis for tonal music in more detail? I know there was a group theory element to some of Schoenberg's compositions but aside from that I have not been exposed much to this idea.
- Sound is vibrations produced by some object resonating and transmitted through the air. In the real world, things that resonate tend to have multiple resonant frequencies, and these resonant frequencies tend to be integer multiples of each other (due to physics). This sequence of integer multiples is called the harmonic series.
- Therefore, our ears are built to recognize the harmonic series, and to try to fit incoming sounds to the harmonic series in order to quickly and accurately identify the source of a sound (and isolate different sound sources).
- If and only if the least-common-multiple of a set of pitches is small, these pitches share many harmonics in common. If we hear these pitches together, our ears tend to recognize them as one blended tone, rather than as separate notes.
- Sounds that blend together are called “consonant”, and sounds that don’t are called “dissonant”.
As far as tuning systems go: frequencies are real numbers, but for practical reasons we would like to choose a discrete set of frequencies to work with when making music (unless you can build me a piano with an uncountably infinite number of keys). So, we choose notes that harmonize reasonably well together. But if you sit down and try to construct a tuning system by starting with one frequency and multiplying it by simple integer ratios, you’ll find that you do not obtain a finite closed group, which means you’ll have to make approximations somewhere. For instance, start with a frequency of 100 Hz and go up three perfect fifths - each is a 3:2 ratio so you end up at a 27:8 ratio, or 337.5 Hz. Then, start at the same base frequency and go up a major sixth (5:3) and then an octave (2:1), making an 10:3 ratio, or 333.3 Hz. These sound so close to each other that they are functionally the same note (you can listen to them on a website like https://onlinetonegenerator.com/), but no matter whether we choose to put one frequency or the other on our keyboard, some intervals will always be just a little off, so we have to make a compromise.
12-TET is the most common such compromise; it divides the octave into 12 equally-spaced notes (on a geometric series). Why 12? Because it gives us very close approximations of all the intervals used in Western music, without having so many notes that it would become unwieldy. Equal temperament has the advantage that the same interval sounds exactly the same everywhere — the 5th from F to C has the exact same ratio as the 5th from G to D, so you can play the same piece of music in different keys and all the intervals will sound exactly the same. It also guarantees that octaves (the most consonant interval) are tuned exactly and never approximated.
> Unfretted string ensembles, which can adjust the tuning of all notes except for open strings, and vocal groups, who have no mechanical tuning limitations, sometimes use a tuning much closer to just intonation for acoustic reasons. Other instruments, such as some wind, keyboard, and fretted instruments, often only approximate equal temperament, where technical limitations prevent exact tunings. Some wind instruments that can easily and spontaneously bend their tone, most notably trombones, use tuning similar to string ensembles and vocal groups.