A little bit more advanced, but specifically the AMR codec also provides a noise...

A little bit more advanced, but specifically the AMR codec also provides a noise colouring packet to be sent. This packet requires far less bandwidth than the actual voice codec packets.

This packet describes the background noise that is present at the sender, so that when speach resumes, the natural background noise and the fake comfort noise sounds the same and so there is no discontinuity between the two.

To give an idea of the bandwidth requirements, the actual data portion every 20 milliseconds of speech audio is ~14 Bytes, noise colouring ~6 Bytes and silence packets is 1 Byte. It might be implementation specific though.

edit - sorry it was not clear, but during silent periods the silent packets and noise colouring packets are intermixed in a ratio of something like 6 to 1.

So a typical transmission might look as follows:

V V V V V C S S S S S C S S S S S C etc.

V - Voice C - Noise colour S - Silence