What's interesting (to me at least) is "oh right, the spec for this only says abs-diff less than 1.5 x 2^-12 or something" and that AMD and Intel parts may differ by a lot (in the example post and value, the AMD part is more accurate).
You're in good company though with the WASM folks. It does feel like you've strayed outside of your non-goal of Speed. Also though as a nit, there are double precision forms (e..g, VSQRTPD), so:
> When it upgraded SSE to SSE2 (in 2000), most of its single-precision floating-point instructions got upgraded to double-precision — but not these two.
isn't correct.
It's still not likely that you needed to care about these for mu.
A lot of code uses _mm_rsqrt_ps (sometimes) followed by a Newton-raphson update to compute a "precise" 1/sqrt(x). Here's a good example of NEON's rsqrt being sufficiently different from Intel, that more iterations were necessary for Embree on ARM [1].
Because I only cared about vectorization a long time ago, and AMD was so uncompetitive then, I'd bet a lot of code assumes that the SSE rsqrtps values match.
Looks like Eigen also defaults to EIGEN_FAST_MATH which makes Eigen's psqrt ("packet sqrt") use _mm256_rsqrt_ps instead of _mm256_sqrt_ps [1].
Interestingly, the thing they're trying to avoid (long latency of sqrt vs rsqrt) hasn't been true for a long time on Intel processors, but apparently is still true for AMD parts according to Agner Fog's tables [2] (though maybe I'm reading them wrong, there is no vsqrtps entry for Zen2/3).
Hopefully, Eigen will separate the single global "fast math" config [3].
Normalizing a vector needs this. And it's super common as whenever you just want to have a direction you're likely going to need it. Your GPU also has a fast instruction for this because in graphics it's very very common operation.
To add to what others have already pointed out, inverse square root is an important operation in finance (for example in pricing options: calculating implied volatility in Black-Scholes Model).
However, if speed is that important to you (e.g. if you are an HFT), you don't even want to be calculating the inverse square root in your hotpath. Basically, the implied volatility is "seeded" once and then you update it using greeks (finance term for a derivative, don't ask me why).
You know you're in a forum of programmers when you implicitly switch a noun from plural to singular in a parenthetical remark and instead of inferring the meaning based on context, someone feels the need to correct you.
Indeed I wasn't trying to correct the parent - I actually found the explanation pretty cool. Rather to provide some color to the folks here without financial background. I figured an explanation of what the common greeks were would help folks without a financial background understand how it's possible to work backwards from them to an updated price.
Sorry if it didn't come across that way @alex_smart!
Huh. I'm amused at the Wikipedia section on this [1].
> The use of Greek letter names is presumably by extension from the common finance terms alpha and beta, and the use of sigma (the standard deviation of logarithmic returns) and tau (time to expiry) in the Black–Scholes option pricing model. Several names such as 'vega' and 'zomma' are invented, but sound similar to Greek letters. The names 'color' and 'charm' presumably derive from the use of these terms for exotic properties of quarks in particle physics.
Who would know? Most of this stuff started off in industry rather in academia; probably somebody started writing a nu on the whiteboard sometime to mean sensitivity to implied vol and someone else gave it a Greek-sounding name that stuck.
It's before my time but I bet it's the finance equivalent of Philosophers (i.e. fashionable nonsense) science-washing their work - even if unknowingly.
The Newton iteration is nicer for the inverse square root than for the square root. You can refine an initial approximation for `1 / sqrt(x)`, and multiply the result by `x` to compute an approximation of `sqrt(x)`. This less direct approach only needs FP multiplications and additions (and the initial approximation).
Not a real answer because I've never deployed such a thing, but I could imagine using these functions to bootstrap an approximation procedure (e.g. newton-raphson, or something of the sort) so you're going to do an accuracy refinement anyways that is going to buy you more accuracy and guarantee convergence within <1 ulp in a subsequent step so you might as well use a fast instruction over an accurate one for the first step.
They compute reciprocal, or inverse, square roots. You may be familiar with the “fast inverse square root” code snippet: this is a hardware instruction for that operation, and is thus useful for games.