Most of that is fairly straightforward. It is using newton's method to calculate the inverse square root. But to get that with one or two iterations, you need a good estimate to start with. The square root of the floating point exponent is half the value. Knowing how the floating point number is packed, we know a right shift is equal to divide by two and negate it. What remains is how the shift affects the mantissa and if some correction factor is needed. This could have been gotten by least square optimization to minimize the error.
> Knowing how the floating point number is packed, we know a right shift is equal to divide by two and negate it.
Shifting an IEEE754 floating point number does not have that effect.[0] The fact that it doesn't do that is the source of the "mystery" of fast inverse square root.