Firing rates for neurons are in the order of 10hz - under the minimum acoustic frequencies of about 20hz.
Peak neuron firing rates basically only touch the lower bounds of frequencies that need to be processed.
Without the "hardware acceleration" of the cochlea that preprocesses the time domain signals to frequency domain first, basically our whole audio sensorium is out of processing range.
Normal nervous signal transmission speeds are in the order of the speed of sound, switching rates are in Hz range. Voltages are in the 10s of millivolt range.
Neuron spiking isn't rate encoded, it's temporal encoded. Ie. the distance from the next expected regular interval is the signal, not the spiking interval itself.
Additionally there's large variations in neuron spiking rates.
Yes it is encoded. Essentially the rate of firing for a neuron encodes an exponential value to be represented. This is called "spike trains".
You can see this clearly if you do an extreme slowdown of a human movement. Then, suddenly, what looks like a smooth movement, like raising an arm (and because of inertia it is smoothed of course), isn't really smooth. A pulse arrives in the muscle, and there is 20ms where the muscle is tensioned, and then it's back to neutral for 100ms. Then another spike arrives, another 20ms where a lot of tension is put on the muscle, the movement accelerates, and the muscle goes back to neutral. It's not a continuous movement at all.
But odds are good that it's not just the value that's encoded. Many experiments have shown that it matters a lot if the signals are in phase (ie. they encode the same or some multiple of a value, but that the signals started at the exact same time matters, maybe more than the value itself. Or in the encoding: while for the value only the distance between 2 spikes matters, if 2 spikes on 2 different neurons occur at the exact same time, this will be interpreted as very relevant, even those both spikes may have very different firing rates)