Cosine Implementation in C

okaleniuk · on March 31, 2023

I love these things! Approximation is art.

I have a similar example in my book Geometry for Programmers (https://www.manning.com/books/geometry-for-programmers). There I make a polynomial approximation for sine that is: 1) odd (meaning antisymmetric); 2) precise at 0 and Pi/2 (and -Pi/2 automatically); 3) have precise derivatives and 0 and Pi/2 (and -Pi/2); 4) have better overall precision than the equivalent approximation with Taylor series; and it only consists of 4 members! Adding more members makes it impractical in the modern world since we have hardware implementations for sine on CPUs and GPUs.

Spoiler alert, the secret sauce is the integral equation ;-)

okaleniuk · on March 31, 2023

Oh, and I have a free explanation too: https://wordsandbuttons.online/sympy_makes_math_fun_again.ht...

JunkEmu · on March 31, 2023

That was a really fun read. Thanks for sharing!

boyka · on March 31, 2023

The realization that a 6th order approximation is good enough given our definition of float is beautiful.

commandlinefan · on March 31, 2023

You can approximate 8 to 8 significant figures using 987654321/123456789 = 8.0000000729

piperswe · on April 1, 2023

Interestingly, this pattern ([all digits reversed] / [all digits] ~= [base] - 2) holds in base 16 as well: 0xfedcba987654321 / 0x123456789abcdef = 1147797409030816500 / 81985529216486900 ~= 13.99999999999999878027

starsep · on April 1, 2023

This pattern holds in many bases:

1.0 1 1 1.4 12 21 2.111111111111111 123 321 3.020618556701031 1234 4321 4.002680965147453 12345 54321 5.000262295081967 123456 654321 6.000020444462617 1234567 7654321 7.000001321561743 12345678 87654321 8.0000000729 123456789 987654321 9.000000003504939 123456789a a987654321 10.00000000014928 123456789ab ba987654321 11.000000000005706 123456789abc cba987654321 12.000000000000197 123456789abcd dcba987654321 13.000000000000007 123456789abcde edcba987654321 14.0 123456789abcdef fedcba987654321

https://ideone.com/DvljON

kolbe · on March 31, 2023

It is technically 12th order, but every other coefficient is zero, so it calculates a polynomial approximation in x^2 space

stkdump · on March 31, 2023

CPUs do have a 'hardware' (I guess microcode) implementation in the 80387 era FPU. But it isn't used. One reason is that the software implementation is faster in modern floating point code. I don't know about GPUs but I would guess compilers transparently take care of that.

sacnoradhq · on March 31, 2023

It depends on the precision and accuracy and data types of the arguments and result. The 80387 was absolutely faster than software emulation for F32/64/80 by hooking real-mode interrupt handlers the way it was back to the 8086/8087. For I8/16, lookup tables were faster. For I32, a quality result would be very expensive. There were no GPUs in the 387 era, and there wasn't SIMD (MMX) either. In the x86 era, compilers weren't that smart. If they were, then they were slow.

adgjlsfhk1 · on March 31, 2023

the real problem is Intel screwed up their sin implementation by only using 66 bits of pi and then refused to fix it.

lifthrasiir · on March 31, 2023

Which is just a support function for cos (which doesn't do argument reduction). See [1] for the actual function, and [2] & [3] for the argument reduction code.

[1] https://github.com/ifduyue/musl/blob/master/src/math/cos.c

[2] https://github.com/ifduyue/musl/blob/master/src/math/__rem_p...

[3] https://github.com/ifduyue/musl/blob/master/src/math/__rem_p...

FabHK · on March 31, 2023

To give a bit more background on this:

You really only need to compute one eighth of a circle segment (so, zero to pi/4), everything else you can do by symmetry by switching signs and flipping x, y. (And if you supported a larger range, you’d waste precision).

The function supports doubledouble precision (which is a trick, different from long double or quadruple precision, in which you express a number as an unevaluated sum of two doubles, the head (large) and the tail (much smaller)).

So, the “real” cosine function first takes its argument, and then reduces it modulo pi/2 to the desired range -pi/4, +pi/4 (with minimal loss of precision - you utilise the sign bit, so that’s better than using the range 0, +pi/2) in double double precision (rem_pio2 returns an int denoting which quadrant the input was in, and two doubles), and those two doubles are then passed to this function (or its cousin __sin, with the sign flipped as appropriate, depending on the quadrant).

Many people really have put in quite some care and thought to come up with this neat and precise solution.

ETA: Double double (TIL that it’s also something in basketball?): https://en.m.wikipedia.org/wiki/Quadruple-precision_floating...

Range reduction: https://redirect.cs.umbc.edu/~phatak/645/supl/Ng-ArgReductio...

efitz · on March 31, 2023

You really only need to compute one eighth of a circle segment (so, zero to pi/4), everything else you can do by symmetry by switching signs and flipping x, y. (And if you supported a larger range, you’d waste precision).

In high school I was writing a paint program in BASIC on the Apple ][. Drawing circles was mind-numbingly slow until I read an article somewhere that pointed this out. It was like magic.

toxik · on March 31, 2023

Interesting idea to compare the absolute value by simply not doing a float comparison.

I see they do a bit more than that now. Wonder how much time that saves.

schappim · on March 31, 2023

This is a faster implementation of cosine that utilises six static const double variables (C1, C2, C3, C4, C5, C6) to speed things up. These variables represent the coefficients of a polynomial approximation for the cosine function.

For the genuine article, check out: https://github.com/ifduyue/musl/blob/master/src/math/cos.c

FabHK · on March 31, 2023

It’s the core implementation for a specific and very narrow range of arguments. The “genuine article” moves the argument into that range (via the rem_pio2 function performing range reduction, which is worth an article in itself), and then calls this core implementation.

kolbe · on March 31, 2023

Why do you return x-x for inf or nan? I can't test now, but I imagine they both return nan?

I run a high performance math library in C#, and I'm always looking to improve.

https://github.com/matthewkolbe/LitMath

hurrrrrrrr · on March 31, 2023

It's an easy way to return nan and set FE_INVALID. If you're not worried about floating point exceptions you can just return NaN directly. Idk how much C# even guarantees about the floating point operations and if it's applicable there.

https://en.cppreference.com/w/cpp/numeric/fenv/FE_exceptions

jcranmer · on March 31, 2023

Using the expression ensures that the FP exceptions are properly set. It also does quieting of sNaN or NaN payload propagation as appropriate.

EVa5I7bHFq9mnYK · on March 31, 2023

I remember an approximation function used in vacuum-tubes based surface-to-air system of the 60s: cos(x) = 0.7. As explained to us rookies, that precision was enough to hit the target.

marshray · on March 31, 2023

I can see it now.

  float cos(x)
  {
      return 0.7; // Chosen by fair dice roll
  }

ogogmad · on March 31, 2023

That can't be good for all x. You mean x=π/4 or to 1dp.

Joker_vD · on March 31, 2023

Eh, it's close enough. After all, in the time of war the value of sine can go up to 4.

nroets · on March 31, 2023

Only 4 ?

sin(1.5708) = 1

sin(1.5708 - 1.3170 i) = 2

sin(1.5708 - 2.0634 i) = 4

...

dustingetz · on March 31, 2023

wait what? ELI grad student, I know you can plug i into the sine taylor series expansion but what is happening here

hansvm · on March 31, 2023

Sin/cos are tightly connected to the complex exponential and exhibit a lot of such behavior.

In particular, sin(x+it) = sin(x)cosh(y) + i cos(x)sinh(y). Set x something to zero out cosine, and rely on the exponential nature of cosh to find a target y value.

hansvm · on April 1, 2023

Oops, autocorrect, sin(x+iy). Also x and y should be real.

scythe · on March 31, 2023

sin(x + pi/2) = cos(x) = cosh(ix)

Therefore, sin (pi/2 + i arcosh(2)) = cosh(-arcosh(2)) = 2

where arcosh(y) is the familiar inverse of the hyperbolic cosine, i.e. log(y + sqrt(y^2 - 1)), so we find sin(pi/2 + i log(2 + sqrt(3))) = 2.

version_five · on March 31, 2023

The above should be intelligible to someone with high school math, no? I'd be interested in the grad student version, just out of curiosity

nroets · on March 31, 2023

Most highschool students mainly use sin to calculate the lengths of sides of triangles. So it feels very counterintuitive that sin can be greater than 1.

If you define sin as the solution to a particular differential equation, then it's relationship to the hyperbolic functions is much more obvious.

some_furry · on March 31, 2023

This is radians, which can go from 0 to tau (2 * pi, about 6.28).

version_five · on March 31, 2023

It's probably a pretty good 1st order guess that a surface to air missile launcher is aimed at 45 degrees. It's not aimed straight up and it's not flat on the ground.

Arrath · on March 31, 2023

> It's not aimed straight up

Well that certainly depends!

The VLS in ships are exactly that, vertical launch. As are, I believe, most Soviet/Russian large SAM systems like the S300 and S400

dodnkskd · on March 31, 2023

Well, cos(π/4) happens to be the RMS value of the function.

paulddraper · on March 31, 2023

> That can't be good for all x.

It's within 1.3.

dboreham · on March 31, 2023

My first experience with "Use the source Luke" was wondering as a young lad how computers calculated trig functions. This led me to the Unix V7 math lib source code[1], and thence to a book by Hart & Cheney which by luck my university library had on the shelf. Good times.

[1] https://github.com/v7unix/v7unix/blob/master/v7/usr/src/libm...

abudabi123 · on March 31, 2023

A like minded reader may want to have a look at the Go lang. math standard library source.

* https://pkg.go.dev/math

chubot · on March 31, 2023

Woah why do they have s390x assembly versions of everything?

https://cs.opensource.google/go/go/+/refs/tags/go1.20.2:src/...

That's for a mainframe?

cnity · on March 31, 2023

What are the functions P and Q defined here? https://cs.opensource.google/go/go/+/refs/tags/go1.20.2:src/...

hurrrrrrrr · on March 31, 2023

Polynomials with the coefficients defined in the sin and cos arrays (and likely in the referenced literature).

stagger87 · on March 31, 2023

Implementing low precision transcendental functions is much more interesting IMO, as it allows for a wider variety of algorithms and approaches. When you are limited to the IEEE definitions, there really isn't much room for creativity and you have to handle all the obscure edge cases and chase least significant bits, bleh. That's not something you want to do if you need to calculate a trillion atan's. If one is curious how these algorithms work, check out the book "Elementary Functions".

lifthrasiir · on March 31, 2023

In reality though, a typical stock libm is not that accurate; their error generally ranges from 1 to 5 ulps with no actual guarantees. I'm more interested in a hybrid libm with configurable errors, from the most accurate possible (0.5 ulps) to something like 2^20 ulps, and all in between.

l33t233372 · on March 31, 2023

This function is completely constant in running time, so I don’t think these considerations are important here.

saulpw · on March 31, 2023

You can always have a smaller constant!

l33t233372 · on March 31, 2023

You sure can, but if you can beat a single IEEE multiplication on an arbitrary processor my god that’s a program I hope to never ever in my life have to reverse engineer.

dan_mctree · on March 31, 2023

Never quite got this argument. You can frequently do 1000 fold improvements or do even better by optimizing within the same algorithmic complexity. That frequently flips the bit of "is this software usable".

Sometimes you can even make real world performance significantly better by ignoring the technically better algorithmic complexity, especially when you almost always with low n's

mananaysiempre · on March 31, 2023

The IEEE standard doesn’t really require any particular precision for transcendental functions IIUC? The original standard doesn’t even define them. The 2008 version includes optional correctly-rounded transcendentals, but there seems to be a total of one implementation, CRlibm, which is an academic project of more than a decade (and a dead home page because of the INRIA Forge shutdown). Your run-of-the-mill libm has little to do with this.

lntue · on March 31, 2023

The latest revision IEEE 754-2019 do require correctly rounded for transcendental functions: https://en.wikipedia.org/wiki/IEEE_754#Recommended_operation...

mananaysiempre · on March 31, 2023

Right, it requires correct rounding for the optional correctly-rounded functions, which ISO C, for example, does not require to correspond to your usual libm ones, or even to be provided at all (even with __STDC_IEC_559__ or __STDC_IEC_60559_BFP__). I think I saw a spec for correctly rounded functions for C somewhere, but even the latest draft[1] of C23 is content with just reserving the cr_* namespace and not mandating much for the usual functions outside it (F.10/16):

> IEC 60559 specifies correct rounding for the operations in the F.3 table of operations recommended by IEC 60559, and thereby preserves useful mathematical properties such as symmetry, monotonicity, and periodicity. The corresponding functions with (potentially) reserved `cr_`-prefixed names (7.33.8) do the same. The C functions in the table, however, are not required to be correctly rounded, but implementations should still preserve as many of these useful mathematical properties as possible.

[1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3088.pdf

lntue · on April 1, 2023

There are recent efforts to provide a correctly rounded and performant C23 libm, such as https://core-math.gitlabpages.inria.fr/

Maybe correct rounding (and hence consistency) will be made into the standard requirements (at least for single and double precisions) in the near future?

vector_spaces · on March 31, 2023

Does anyone know of a good reference on signals and systems for people with a pure math background rather than an engineering or physics background?

So, the implementation linked above uses the polynomial of best approximation to cosine on an interval. That polynomial was presumably obtained using the Remez algorithm, which, given a function f and a set of n+2 points in your desired interval, yields a polynomial of degree n best approximating f in the interval (in the uniform norm sense)[1] by performing a sequence of linear solves

I have a math and specifically approximation theory background, and the wiki article on the Remez algorithm makes a lot of sense to me. Like, this is the language I speak: https://en.wikipedia.org/wiki/Remez_algorithm

I'm less comfortable though when I try to understand the documentation for the remez implementation in SciPy because it is using the language of signals and systems, which is how engineers think about this stuff: https://docs.scipy.org/doc/scipy/reference/generated/scipy.s...

Signals seems to use a lot of analogies to sound waves, and it does map in some way or another to the mathematical language I'm comfortable with. I've spent a little bit of time reading papers and introductory texts on signals, but the definitions feel a little loosey goosey to me.

Anyway, just opportunistically using this thread to ask if anyone knows of some book or papers or blog posts that can help me translate this stuff into math language :)

ur-whale · on March 31, 2023

I feel your pain.

I've spent most of my career working at the intersection of multiple highly technical domains, both theoretical and applied, and a) the "jargon switch" b) the methodology differences and worst c) the switch of conceptual framework required to work on a problem that spans multiple domains is initially very irritating.

I'm pointing that out because if you are only looking for a rosetta stone to only tackle a), even if you find one, it'll leave you wanting because you'll still need maps to tackle b) and c)

Example: analog electronics. From a mathematician's POV, an analog circuit is just a system of differential equations.

But very few analog electronics specialist ever think about a circuit that way.

The words they use are the least important part of the "translation" problem. The way they approach and think about the system will be utterly alien to a pure or even applied math person.

Eventually, after learning the trade, you'll start to see familiar patterns emerge that can be mapped back to stuff you're familiar with, but a simple "this EE jargon means that in math language" won't be enough: you're going to have to get your hands dirty and learn how they think of the problem.

benj111 · on April 1, 2023

This mirrors my experience (failing) to learn a second language.

The words are the easy bit. It's the constructs, genders etc that's the hard bit.

lntue · on March 31, 2023

I don't know much about numpy.signal package, but there are other python packages that have good implementation for generating polynomial approximation. Among them are pythonsollya (a Python wrapper for Sollya), and SageMath.

Other than that, some of the overview references about Remez algorithm (should be a bit closer to your taste) that I like are:

- J.M. Muller et. al., "Handbook of Floating-Point Arithmetic", section 10.3.2 (and follow references in that section).

- N. Brisebarre and S. Chevillard, "Efficient polynomial L-approximations," 18th IEEE Symposium on Computer Arithmetic (ARITH '07), which is implemented in Sollya tool: https://www.sollya.org/sollya-2.0/help.php?name=fpminimax

stagger87 · on March 31, 2023

The scipy documentation you linked is specifically for generating a FIR filter using the Remez algorithm, which is why it is written in that style. It is a very specific use case of the algorithm and probably not the one I would recommend learning from. Another top level comment has a link to a decent overview of the Remez algorithm written without the "Signals and Systems" language.

vector_spaces · on March 31, 2023

Thanks, I think my request was unclear, but I'm specifically trying to understand the documentation linked, not the Remex algorithm itself. I'm trying to reconcile my knowledge of it with the language used in the documentation.

I have a pretty comfortable understanding of the Remez algorithm and generalizations and have written my own implementations for a research project

stagger87 · on March 31, 2023

Take a look at the 2 listed references by Parks and McClellan. They are considered seminal papers in the field of signal processing and are the basis of the function you linked. And they will probably appeal to your math background more.

The name of the algorithm is Parks McClellan filter design, or sometimes you will see it named firpm, as popularized by MATLAB.

https://en.wikipedia.org/wiki/Parks%E2%80%93McClellan_filter....

asdf_snar · on March 31, 2023

Just noting that your link to docs.scipy is broken. I'm also in a similar boat to you and would be interested in an answer to this question.

vector_spaces · on March 31, 2023

Thanks, fixed it!

lntue · on March 31, 2023

Some of the more recent implementations of cosine for single precision that are correctly rounded to all rounding modes according to IEEE 754 standard:

- The CORE-MATH project: https://gitlab.inria.fr/core-math/core-math/-/blob/master/sr...

- The RLIBM project: https://github.com/rutgers-apl/The-RLIBM-Project/blob/main/l...

- The LLVM libc project: https://github.com/llvm/llvm-project/blob/main/libc/src/math...

mianos · on March 31, 2023

This takes me back. I did the transcendental functions for a CPM C compiler in the early 80s using Chebyshev polynomials.

My tutor failed my implementation because my code formatting was utterly crap. :) I am still laughing as there was no sin, cos in the maths library at all at the time.

OnlyMortal · on March 31, 2023

Bit shifted sin values in a pre-made table of integers could be used so the maths was all integer based.

Commonly used on processors like the 6502 or 68000 for vector graphics.

It was accurate enough for games.

richrichardsson · on March 31, 2023

> It was accurate enough for games.

Unless like me you made a typo when copying the table from a book and had a vector routine that mostly looked alright but would jerk slightly when that one value that was incorrect was chosen! <facepalm />

mianos · on March 31, 2023

It was floating point math. There was nothing in the CPU to support it so it was ridiculously slow.

But, sometimes you just want to have those functions to do some complex geometry.

anonymousiam · on March 31, 2023

I'm pretty sure that indent was around even in the early 80's. Even if you couldn't be bothered to run it, your tutor could have. Must not have been a very good tutor to fail you for style.

Quarrel · on March 31, 2023

How sure?

indent was written in 76, but didn't make it to BSD until 82.

CP/M was first released in 74!

It was a pretty different world.

anonymousiam · on March 31, 2023

I was using CP/M since about 1978, but you're right -- AFAIK there never was a port of indent to CP/M. I did make some assumptions, but developing for CP/M on CP/M was somewhat painful. There just wasn't enough memory space to work efficiently, or (often) even fit the compiled image into memory (<~60k after BIOS/BDOS/CCP on CP/M 2.2). Chebyshev polynomials certainly imply floating point, which could be a BIG memory hog with so little room.

If I had the resources at the time, I would have undertaken such a development task on a more capable (Unix) platform. Such an environment would improve the ability for developing/debugging the optimizations required. As you say, indent was available in BSD since '82, and I know it was present on the Sun2 systems I used in '83.

mianos · on March 31, 2023

I had an Osborne 1. I had to swap disks multiple times to compile anything. I remember the C compiler by some guy, 'Walt Bolisvki'?, had the arguments passed in reverse on the stack (last on the top) and printf and friends ended up crazy hacks.

I didn't have access to a Unix at the time or very limited open source so ident was not a thing.

howerj · on March 31, 2023

Personally I like Universal CORDIC, you can calculate much more than Cosine, but it tends to be slower. See https://en.wikibooks.org/wiki/Digital_Circuits/CORDIC#The_Un... and a somewhat janky fixed point library that implements it https://github.com/howerj/q

phazy · on March 31, 2023

The CORDIC algorithm is great! Just a few months ago, I implemented a sine generator with it and also used fixed point arithmetic. It is a real eye opener to see that you can substitute multiplications by bit shifting when you store the numbers in the “right“ format.

userbinator · on March 31, 2023

CORDIC is what many digital calculators use (along with decimal floating point).

nsajko · on March 31, 2023

> along with decimal floating point

Source? I'd find this a bit strange because:

1. I suspect decimal floating point is quite expensive to implement computationally.

2. Not an expert, but I think the floating-point algorithms and proofs for bases other than 2 are less well researched and less well known.

3. A calculator is a resource-constrained device.

4. Are there even any real benefits? Yes, some human-friendly numbers have an exact finite representation that is missing in base-2, but I can't see why would that matter, compared to just using 32-bit or 64-bit IEEE-754.

shadowofneptune · on April 1, 2023

The TI-83 series of calculators not only uses decimal floating-point, but the format is documented:

http://merthsoft.com/linkguide/ti83+/vars.html

In practice, the cost is not that much different because four-function calculators do not do many calculations at once. Even in a graphing calculator, drawing a plot is never slow enough to make the device useless.

userbinator · on March 31, 2023

One of the more memorable examples: https://news.ycombinator.com/item?id=28559442

Decimal, and particularly BCD floating point, is much simpler than binary for a calculator since the numbers don't need any additional conversion steps to display.

And the fact that 0.1 + 0.1 works as expected is indeed a huge benefit.

adgjlsfhk1 · on March 31, 2023

Decimal floating point is pretty easy to impliment in hardware.

nsajko · on March 31, 2023

You may be confusing floating-point with fixed-point?

Hardware with decimal floating-point only appeared in 2007, and still doesn't seem to be easily available:

https://en.wikipedia.org/wiki/Decimal_floating_point

https://en.wikipedia.org/wiki/POWER6

Dylan16807 · on March 31, 2023

I expect that's because the cost versus effort for having two kinds of floating point isn't very good. Not because decimal floating point is notably more expensive than binary floating point.

zhamisen · on March 31, 2023

Very good explanation about Remez algorithm: https://xn--2-umb.com/22/approximation/index.html

IIAOPSW · on March 31, 2023

Personally I go with the recursive definition. Use the trig identity cos(x)^2 + 1 = cos(2x).

    2cos(x)^2 - 1 = cos(2x).
    2(2cos(x)^2 - 1)^2 - 1 = 2cos(2x)^2 - 1 = cos(4x)
    2(2(2cos(x)^2 - 1)^2 - 1)^2 - 1 = 2cos(4x)^2 - 1 = cos(8x)

Keep squaring, doubling and subtracting one one the left. Keep raising by powers of 2 in the argument on the right. Eventually, if you're working with rational numbers, multiplication by 2^k moves all k fractional bits over to the integer side of the decimal, and then integer times pi is the base case of the recursion. If you rescale the units so that pi = 2^(n-1) for some n, you don't even have to think about the non-fractional part. The register value circling back to 0 is supposed to happen anyway.

You could Just as easily recursively define cos to "spiral downwards in x rather than outwards", each recursive layer reducing the argument of x by a factor of 2 until it reduces down to the precomputed value cos(machine precision).

This was all just circular reasoning

austinjp · on March 31, 2023

> This was all just circular reasoning.

Nice :)

spenczar5 · on March 31, 2023

Here is a naive question. A lot of effort goes into dealing with floating point imprecision and instabilities, right. So why aren’t these algorithms implemented with integers? A 64 bit integer could represent the region from 0 to 1 with 2*64 evenly spaced slices, for example.

Would that make things more accurate across operations? Is this something that is done?

moonchild · on March 31, 2023

That is called 'fixed point', and it is useful for some applications. What makes floating point interesting is that it can represent values with very different magnitudes; your 64-bit fixed point number can't represent any number greater than 1, for instance. Whereas a 32-bit float can represent numbers with magnitudes between ~2^-127 and ~2^127.

psychphysic · on March 31, 2023

Maybe Cosine could be fixed and converter to float before use. Using identities to calculate Sine and Tangent functions.

bigbillheck · on March 31, 2023

Seems like a lot of effort for no actual gain.

psychphysic · on March 31, 2023

Just an option if magnitude is an issue. You only needs 0-1 plus a sign bit.

Fixed to float is easy enough. It'd let you use a fixed point implementation for 0 - pi/2 benefiting from smaller look up table and simpler implementation.

Someone · on March 31, 2023

> It'd let you use a fixed point implementation for 0 - pi/2 benefiting from smaller look up table

Why would that table be smaller if you use 2^64 values in that range instead of the fewer than 2^55 doubles in that range?

> and simpler implementation.

Why would it be simpler? With doubles, the hardware will do scaling for you. With fixed point, you need a helper function or macro to do multiplications, and those have to use an intermediate result with more than 64 bits (in this special case, some shortcuts are possible, but those won’t make the implementation simpler).

psychphysic · on March 31, 2023

One or both of us have lost the thread here.

There was an exchange about using fixed point (which apparently is being used according to other comments for easier correctness verification, speed and simplicity). Someone pointed out that float gives you a wider range of values. But those values are not needed for cosine which has a range of 0-1 in the interval 0 to pi/2.

dahart · on March 31, 2023

> But those values are not needed for cosine which has a range of 0-1 in the interval 0 to pi/2

Ah not quite true though; the “range” is also referring to the exponent range of representable numbers, so don’t forget the small numbers near 0. You can get much smaller numbers with much higher precision out of a single-precision float than a fixed-point 64 bit [0,1] number.

psychphysic · on March 31, 2023

Ah makes sense. I imagine depending one how you intend to use the numbers you need that precision at the steeper parts too.

ChuckNorris89 · on March 31, 2023

There should be plenty of C trigonometric algorithms made for fixed point. They usually use lookup tables and interpolation. Used a lot in embedded microcontrollers that don't have hardware floating point or hardware trig functions, and in automotive where floats were (still are?) a no-go due to the dangerous float math pitfalls and the safety critical nature of ECUs.

I remember writing a fast, small footprint, but very accurate fixed point trig function for the microcontroller used in my batcher project, many lives ago. I'll try to find it and put it on github.

The neat thing is that you can use the native overflow wrap around property enabled by C integers and processor registers to rotate back and fourth around the trigonometric circle without needing to account for the edge cases of the 0/360 degree point. You do need to suppress MISRA warning here though as it will go nuts seeing that.

rtfeldman · on March 31, 2023

I'd love to see some implementations of fixed point trig functions, but I haven't found any! Do you have any links by any chance?

Someone · on March 31, 2023

I’d Google “forth trig functions” and get, for example

https://web.stanford.edu/~dljaffe/SVFIG99/trig.html

https://www.hpmuseum.org/forum/thread-18123.html (which links to https://archive.org/details/Forth_Dimension_Volume_04_Number...)

rtfeldman · on March 31, 2023

Nice, I didn't realize Forth used them - thank you!

ChuckNorris89 · on March 31, 2023

I don't have the code at hand to show you, nor do I have any links similar trig stuff as I wrote everything from scratch, first in Matlab to prover the concept, then in C, as there wasn't anything similar online at the time, but I used the principles of fixed point fractional math of representing floating point numbers as scaled integers in the Q notation, which is well documented for use in microcontrollers and DSPs of the era (early to late '00s)

Here's the wiki gist with some C code snippet examples [1]. Here's a more detailed math paper that goes in the weeds[2]. Here's a nice presentation from Motorola-Freescale-NXP I just found which I wish I had back then as this is awesome and explains the basics for everyone to understand, and seeing those Motorola Windows 95 GUI tools brings a nostalgic smile on my face.

[1] https://en.wikipedia.org/wiki/Q_(number_format)

[2] http://darcy.rsgc.on.ca/ACES/ICE4M/FixedPoint/FixedPointRepr...

[3] https://www.nxp.com/files-static/training_presentation/TP_CO...

jcranmer · on March 31, 2023

What you're describing is fixed point. Fixed point absolutely exists in practice, but it has an important limitation: its accuracy is scale-dependent, unlike floating-point.

Suppose for example you were doing a simulation of the solar system. What units do you use for coordinates--AUs, light-seconds, kilometers, meters, even miles? With floating-point, it doesn't matter: you get the same relative accuracy at the end. With fixed point, you get vastly different relative accuracy if your numbers are ~10¯⁶ versus ~10⁶.

dzdt · on March 31, 2023

Re: "imprecision and instabilities" on standard hardware there really are neither. Floating point numbers have an exact meaning (a particular fraction with an integer numerator and a denominator that is a power of 2). There are exactly specified rules for basic operations so that adding, subtracting, multiplying, or dividing these fractions will give you the nearest representable fraction to the exact result (assuming the most common round-to-nearest mode).

The bit that appears as "imprecision and instabilities" is a mismatch between three things: (1) what the hardware provides (2) what controls or compiler optimisation settings the programming language exposes (3) what the user expects.

There are several sticky points with this. One mentioned in the parent article is higher precision. Starting from the original 8087 floating point coprocessor x86 has supported both 80-bit and 64-bit floating point in hardware, but programming language support has mostly ignored this and only shows the user "double" precison with 64 bits of storage allocated. But with some compiler settings parts of the compuation would be done at 80 bit precision with no user visibility or control over which parts had that extra precision.

Newer hardware modes stick to 64 bit computation so you're unlikely to run into that particular stickiness on a modern programming stack. There is another that comes up more now: fused multiply and add. Modern hardware can calculate

   round(a*x+b)

in one step where a,x,b are floating point numbers (i.e. fractions) and the "round" operation produces the nearest representable fraction to the correct result. This is faster and more accurate to calculate than doing the multiplication and addition seperately, which gives

    round(round(a*x)+b)

But again programming language control is weak over whether and when fused-multiply-and-add is used, and this is a detail that almost all programmers almost always just want to ignore anyway.

syockit · on March 31, 2023

Speaking from no experience, but it should be doable. But since in most applications you're going to use floating point anyway (or, if you're using fixed point, maybe right bit shift or maybe even do integer division until it fits the range of your value), the extra precision would be discarded in the end.

metalrain · on March 31, 2023

So why is that extra term needed here?

w = 1.0-hz;

return w + (((1.0-w)-hz) + (zr-xy));

------------------^ here

jcranmer · on March 31, 2023

The idea is to be able to compute 1.0 - h * z + (z * r - x * y) in high precision.

If h * z is slightly less than 1.0, then w = 1.0 - h * z can lose a lot of bits of precision. To recover the bits that were dropped in subtracting h * z from 1.0, we compute which bits of h * z were used (the high bits) and then subtract those bits from h * z to get only the low bits. We find the bits that were used by simply undoing the operation: 1.0 - w is mathematically the same as h * z, but is not the same as implemented due to limited precision.

So (1.0 - w) - h * z is the extra bits that were dropped from the computation of w. Add that to the final term z * r - x * y before that result is added to w, and we have a more precise computation of w + z * r - x * y.

bonzini · on March 31, 2023

Also why does it do r = z * ..., and then does another multiplication z * r at the end? Why not leave that z out and do w * r at the end?

moonchild · on March 31, 2023

Mathematically, we have (1 - (1 - hz)) - hz = (1 + hz) - hz = 1, so the computation seems redundant. But due to floating-point rounding, it is not, and still creates information. As the comment says:

> cos(x+y) = 1 - (x*x/2 - (r - x*y))

> For better accuracy, rearrange to

> cos(x+y) ~ w + (tmp + (r-x*y))

> where w = 1 - x*x/2 and tmp is a tiny correction term

> (1 - x*x/2 == w + tmp exactly in infinite precision)

(hz is x*x/2)

See double-double arithmetic - https://en.wikipedia.org/wiki/Quadruple-precision_floating-p...

rurban · on March 31, 2023

Why does he link to his lame fork of musl for the official FreeBSD sources for libm, taken from Sun, used by musl?

https://github.com/freebsd/freebsd-src/blob/main/lib/msun/sr... or at least https://git.musl-libc.org/cgit/musl/tree/src/math/__cos.c

erk__ · on March 31, 2023

The history of the freebsd file is interesting in itself

claytongulick · on March 31, 2023

This is the kind of thing that triggers my imposter syndrome big time.

I've been coding professionally for almost 30 years, and I see stuff like this and wonder if I'll ever be a real programmer.

Mike_12345 · on March 31, 2023

This is Numerical Analysis, not "programming". It's a topic in Applied Mathematics that you would need to study separately from programming. https://ocw.mit.edu/courses/18-330-introduction-to-numerical...

jcranmer · on March 31, 2023

You're probably more of a "real programmer" than whomever wrote this code--it's numerics code, after all.

There's no real tricks or anything to this code, it's just a polynomial evaluation of a Taylor series (which you probably learned, and subsequently forgot, from calculus), with some steps done using techniques for getting higher precision out of a floating-point value, and (in some other functions in libm, not here) occasional use of the bit representation of floating-point values. It's standard techniques, but in a field that is incredibly niche and where there is very little value in delving into if you don't have a need for it.

Sesse__ · on March 31, 2023

It's not a Taylor series. It's a series, but a minimax approximation, not Taylor.

Taylor series are essentially “Give me a polynomial p(x). At x=a, I want the value to be b, the first derivative to be c, the second derivative to be d, etc. etc.”. (Usually, a=0, also known as a MacLaurin series.) This gives you a _great_ approximation in the immediate vicinity of a, but usually a pretty horrible one the further away you get; in a sense, you spend all your approximation energy on that single point.

Minimax approximations are “Give me a polynomial p(x). Over the area x=[a,b], I want it to never stray far from f(x); give me the one that minimizes the worst-case distance”. (The distance function is usually either the absolute error |p(x)-f(x)|, or the relative error |p(x)-f(x)|/|f(x)|, but you can have an arbitrary weight function in some implementations.) So you consider everything in that area to be important, not just one point.

Taylor series are simple to calculate (which can help if you need to calculate the coefficients on the-fly, or by hand), good for analysis, and easy to understand, so most engineers will have heard of them in calculus. The Remez algorithm, on the other hand, essensially requires a computer, and is significantly more complicated to understand and implement. (Remez is not the only way to generate minimax polynomials; Chebychev polynomials come close, and I believe Sollya uses some other algorithm.) Both will converge to the actual function giving infinite terms, but minimax polynomials will be _much_ closer in most real-life situations. Also, if you are really pressed for it, you can have rational minimax polynomials (p(x)/q(x)) that approximate tricky functions _really_ well even at low degree, but of course at the cost of a division.

new2yc · on March 31, 2023

If you don't care about "performance":

double my_cos(double x) {

    double sum = 0;

    for (int n = 0; n < 10; n++) { // 10 terms of Taylor Series

        sum += pow(-1, n) * pow(x, 2 * n) / factorial(2 * n);

    }

    return sum;

}

Edit: code formatting

irchans · on March 31, 2023

/* This code is silly, not super accurate, but fun /
double cos2(double x, int n) {
double numerator = 1.0; double denominator = 1.0; double pi = 3.14159265358979323846; int i; for (i = 1; i <= n; ++i) { numerator *= pow((x - n * pi / 2.0), 2); denominator *= pow((n * pi / 2.0), 2); } return numerator / denominator;
}

(Edit for format)

Sesse__ · on March 31, 2023

Also if you don't care about accuracy at all. my_cos(2 * M_PI) = 0.9965. my_cos(4 * M_PI) = -2917.7144 (!).

opportune · on March 31, 2023

Agreed with this. Also outside of very specific applications like graphics libraries, robotics, game engines you would never want to write something like this from scratch and instead rely on an existing library or game engine. So unless you’re working in that area it’s not imposter syndrome to find it foreign, it’s probably good!

I’ve never worked directly on floating point libraries like this, but having dealt with similar stuff, I suspect if this were in production it would end up peppered with IFDEF to support various implementation differences across hardware types, even if it is compliant with eg IEEE standards.

okaleniuk · on March 31, 2023

This is a fascinating topic indeed. And the math behind polynomial approximation is not that deep or unapproachable. You just haven't found an explanation that works for you. Some time ago I tried to make one with a sine as an example: https://wordsandbuttons.online/sympy_makes_math_fun_again.ht...

It doesn't mention Taylor series anywhere but if you only make your system out of N differential equations, you'll get N members of the power series.

bigdict · on March 31, 2023

Just slow down and look at it line by line. Take detours to make sure you understand every part.

sillysaurusx · on March 31, 2023

I don’t think this would help anyone’s imposter syndrome. Taking hardcore math line by line will yield confusion.

A better approach might be to learn how cosine can be approximated with a Taylor expansion, then realize this is just an optimized version of that. (Even if it’s a different algorithm, it’s still an approximation algorithm, which is the point.)

Staring at math constants won’t get you that insight.

claytongulick · on March 31, 2023

Sure, but that's not what causes it.

I can reason through the code and understand it.

The thing that makes me wonder if I'll ever be a real programmer when I grow up is that someone wrote this, probably on a Thursday after lunch (at least, that's how my brain imagines it).

They were able to conceptualize the math solution to a major optimization problem, deal with all sorts of precision issues with bit manipulation, and come up with a blazing fast bedrock algorithm that's dazzling both in it's simplicity and complexity. And then likely popped out to a pub for a pint and didn't think much about it again.

Intellectually, I understand that this person was specialized, and may struggle mightily navigating the complexity of modern web frameworks, something that I find straightforward because I do it every day.

But for some reason, this stuff always strikes me as being "real" programming, and what I'm doing as just plumbing, piping together the bits that the "real" programmers write.

nicky0 · on March 31, 2023

Your mind is inventing the part about them doing it on a thursday afternoon before popping out to the pub.

This could have taken weeks of research and refinement. You can't tell from just looking at the final result.

Give yourself some credit. If you were tasked with producing a cosine implementation, you'd pull out some textbooks, read specs, research algorithms, try a few ideas, test, iterate. You could come up with this. The only reason you didn't is that you didn't need to make a cos function yet; you had other stuff to do.

nemo1618 · on March 31, 2023

Do you ever feel a burning desire to just solve the hell out of a particular problem? Usually this happens in the context of performance optimization, but it can be anything. Maybe you're supposed to be working on something else, but all you can think about is your pet problem. You throw a bunch of stuff at it, and mostly it doesn't work, but the failures don't discourage you at all; making any sort of progress is addicting. You look at other people's code. You read blog posts and research papers. You fill up notepads. None of this feels remotely like work. It's fun, and you simply cannot stop until you finally grind the problem into the dust.

My guess is that the author of this code was possessed by a spirit like this. Maybe I'm wrong; maybe he was just assigned by his manager to write a cosine function that met a certain performance threshold, and he immediately had a general idea of how to do it, and banged it out in an afternoon. But I doubt it.

My point is that, if you have felt this way about programming, you are doing "real" programming. It's not about whether you're writing a web framework or a cosine function. It's about the thrill of the chase.

BeetleB · on March 31, 2023

If it's any consolation, I worked with one of these teams for a few years. Brilliant at the math stuff, and many could not handle git. Or unit tests.

You've spent your time one X, they've spent their time on Y. That's it. Most of these people are math people. They may have CS degrees, but they've spent the bulk of their time in math.

Actually, now that I recall, one of them was a nuclear physicist and the other was an industrial engineer - neither had CS degrees. I don't know about the others.

nicoburns · on March 31, 2023

IMO the part of programming that interfaces with humans and requirements is just as tricky (in a different way) and just as important.

Also, have you ever formally studied maths? People don't just come up with this kind of thing. They read about algorithms in textbooks. They might not have read this specific algorithm (which may be original), but they'll have read lots of similar ones.

charlieyu1 · on April 2, 2023

I have some math background so it is not difficult to understand.

If you ask me to write a simple API for a CRUD app though I'd cry like babies

kolbe · on March 31, 2023

There are 1000's of different ways to be an expert in CS. implementing high performance functions in numerical analysis is only one.

squeaky-clean · on March 31, 2023

Here's someone else's fast-sine code snippet I found and used about a decade ago when I was working on an additive synthesizer I never finished. It basically uses the same method, Taylor series approximation.

Driving it as an oscillator relies on just a single integer addition operation per sample to increment the phase, and relies on integer overflow to reset the phase.

I like the comment from someone asking if it could really be so fast since it has 7 whole multiplications in it. Ah I love the realtime dsp world, so different from my python/JS day job.

https://www.musicdsp.org/en/latest/Synthesis/261-fast-sin-ap...

IIAOPSW · on March 31, 2023

You might be interested to know about mod 2^m -1. Look at these numbers:

32640, 15420, 39321, 43690

In binary they look like: 1111111100000000, 0111100001111000, 0011001100110011, 0101010101010101

How cute. They are like little one bit precision sine waves.

I assure you that these numbers multiplied by each other mod (2^16 -1) are all equal to zero. When multiplied by themselves they shift position. test it for yourself. Long story short, in the field of mod 2^16 -1, 2^x works just like e^pi i, these numbers function just like sine and cosine, and you can even make linear superposition out of them and use them as a basis to decompose other ints into "frequency space."

Even better, the number i is represented here. There's a number int this modulo system which squares to 1111111111111110 (aka -1). Once you see it, you can implement fast sine and cosine directly through 2^{ix} = cos(x) + i sin(x).

Cool right.

Have fun.

squeaky-clean · on March 31, 2023

Whoa that is awesome. You and Sesse__ are really inspiring me to get a VST dev environment set up again this weekend.

Sesse__ · on March 31, 2023

If you're doing a synth and want a sine wave, you usually won't need to compute cos(x + at) from scratch every time. There's a cheaper and more accurate trick.

First, figure out how much the phase (in radians) is going to increase every sample. E.g., for 440 Hz at 44100 Hz sample rate, you want 2pi * 440 / 44100 = 0.06268937721 (or so). Precompute sin(a) and cos(a), once.

Now your oscillator is represented as a point in 2D space. Start it at [1,0]. Every sample, you rotate this point a radians around the origin by a radians (this is a 2x2 matrix multiplication, using the aforemented sin(a) and cos(a) constants, so four muladds instead of seven). You can now directly read off your desired cos value; it's the x coordinate. You get sin for free if you need it (in the y coordinate).

If you run this for millions or billions of samples (most likely not; few notes are longer than a few seconds), the accumulated numerical errors will cause your [x,y] vector to be of length different from 1; if so, just renormalize every now and then.

Of course, if you need to change a during the course of a note, e.g. for frequency or phase modulation (FM/PM), this won't work. But if so, you probably already have aliasing problems and usually can't rely on a simple cos() call anyway.

squeaky-clean · on March 31, 2023

That is a really cool trick, I think I'll give this a try over the weekend.

I'm a bit confused about pre-computing the table though, wouldn't it have to be massive since the sample rate and note frequency don't evenly line up? On my phone at the minute but with a 440hz note under 44.1k sr, doesn't one cycle equal 100.227~ samples? I'm guessing the idea is to make a smaller table and interpolate? And I'm guessing you compute these tables for each musical note (or a limited but realistic range) at compile or load time, and not with each note press.

Not being able to change the frequency while the note is running seems like a big deal for a synth though, no? Even without high frequency FM/PM which is often okay to not support, you would still be giving up on things like legato, lfo pitch mod (vibrato) and pitch wheel bending. Or is this only a problem in the higher frequencies of modulation? You know, it sounds like a fun thing to find out myself lol.

The accumulating errors could be an issue with free-running oscillators, especially in a studio system left on for a few days. Free-running oscillators aren't that common in additive synths I believe, but it could be neat to see if it makes a difference like it does in a subtractive synth. Either way, periodic renormalization should be pretty straightforward.

Also I do somewhat remember when I was doing my original research for an additive synth, the word on KVR was a lookup table based sin was fastest as long as you could fit it into L1 cache. That was over a decade ago and caches are a lot bigger now.

IIAOPSW · on April 1, 2023

I think I can explain the trick even easier and hopefully that clears up your confusion. Simply store a complex number s which represents the current output of the synth (cos on the real part, sin on the imag). At each time update (the sample rate), multiply s by a pre-computed constant c (which is also complex valued). You just need for c^(i/sample_rate) to equal e^(i*freq/2pi). It does not matter that freq/2pi doesn't line up with the sample rate. The trick is wholly dependent on just keeping track of the state s and reliably multiplying it at a known update frequency.

Sesse__ · on April 1, 2023

You don't need a table for a simple sine oscillator. You just compute sin(a) and cos(a) on note-on, and that's it. You don't store the output in a table, you just apply that 2x2 matrix every time.

Tables can definitely have their place, depending on your synthesis method. If you want something like pitch bend, going for a sine table may very well be the fastest (it depends a bit on how much cache you want to burn).

pjmlp · on March 31, 2023

If you care about this kind of stuff, the theory lies in numerical methods, for example,

http://www.sci.brooklyn.cuny.edu/~mate/nml/numanal.pdf

nhatcher · on March 31, 2023

I recently had to implement some Bessel functions numerically. It's fantastic to see that someone at Sun Microsystems implemented all this procedures in the early 90's (I'm sure even long before that). You will find this implementations in R, Julia, musl, you name it!

I like it specially the use the "GET_HIGH_WORD" and friends:

https://git.musl-libc.org/cgit/musl/tree/src/math/j0.c

It's a lot of fun to figure out what they meant by things like:

> if (ix >= 0x7ff00000) return 1/(x*x);

adgjlsfhk1 · on March 31, 2023

you should check out Bessels.jl which has really high performance implications.

nhatcher · on March 31, 2023

https://github.com/JuliaMath/Bessels.jl/blob/master/src/bess...

Thanks! I love it, so easy to understand and follow.

My favourite work on the subject is Fredrik Johansson's:

https://github.com/fredrik-johansson/arb

Whenever I feel down and without energy I just read something in there

adgjlsfhk1 · on March 31, 2023

yeah, Fredrik's papers are great. A lot of the reason Bessels.jl is so good is because ARB makes it easy for us to have test against a known source of truth (which make things like minimax polynomials and picking optimal tradeoffs a ton easier).

nhatcher · on March 31, 2023

You work with Julia? Nice! I actually use Julia to generate my test cases:

  using Nemo

  CC = AcbField(100)

  values = [1, 2, 3, -2, 5, 30, 2e-8]

  for value in values
      y_acb = besseli(CC(1), CC(value))
      real64 = convert(Float64, real(y_acb))
      im64 = convert(Float64, real(y_acb))
      println("(", value, ", ", real64, "),")
  end

noobcoder · on March 31, 2023

https://www.atwillys.de/content/cc/sine-lookup-for-embedded-...

Have you checked out this?

Its an alternative implementation which uses interpolated table lookup and requires only 66 bytes of constant memory with an error of less than 0.05% compared to the floating point sine.

lifthrasiir · on March 31, 2023

The (presumably relative) error of 0.05% implies only ~11 significant bits, while 32-bit and 64-bit floating point formats have 24 and 53 significant bits respectively. If your sine has only 11 correct bits and you are okay with that, why do you have much more than 11 bits of input precision after all?

EDIT: Someone (sorry, the comment was removed before I could cite one) mentioned that it is actually for Q1.15 fixed point. So it was not even an "alternative" implementation...

mockery · on March 31, 2023

Can someone shed light on these two steps in the approximation?

  *         since cos(x+y) ~ cos(x) - sin(x)*y
  *                        ~ cos(x) - x*y,

For the first, maybe it's an approximate identity I'd never heard of?

For the second, it seems like it relies on X being small, but in this code... Y might be small enough for sin==identity approximation, but X isn't, is it?

dahart · on March 31, 2023

The input is a double-double where y << x by definition. So the first line is just using the standard cos identity cos(x+y) = cos(x)cos(y) - sin(x)sin(y). Since y is very small, we can approximate cos(y) ~= 1, and sin(y) ~= y. This yields the first line.

The second line, I believe, is not relying on x being small, but rather relying on sin(x)*y being small. The whole result of the sin(x)*y term is approximately on the order of the LSB of the result, and the relative error of approximating sin(x) using 1 is better than using 0, so this term will help produce more accurate rounding of the bottom bits.

lntue · on March 31, 2023

I think the function inputs are in double-double format, so the assumption is that the magnitude of y is significantly smaller than the magnitude of x, ideally |y| < 2^(-52) |x|. So that 1 is a very good approximation for cos(y), since |cos(y) - 1| < |y|^2 < 2^(-104) |x|^2. Similarly, if we assume |x| is also small, |sin(x) - x| < |x|^3 is also a very good approximation (same with sin(y) ~ y).

So using the cosine of sum formula:

  *  cos(x + y) = cos(x) cos(y) - sin(x) sin(y)
  *             ~ cos(x) - x*y.

hurrrrrrrr · on March 31, 2023

Looks like angle sum identity.

    cos(x+y) = cos(x)*cos(y) - sin(x)*sin(y)     (always)
             ~ cos(x)*1      - sin(x)*y          (if y small)

Can't speak for the validity of small angle assumptions though.

fwungy · on March 31, 2023

It is possible to draw a line through n points with an n-1 degree polynomial, and since the sine/cosine curve is so simple you don't need to take that many samples to get a good approximate.

Would it not be faster to just store a lookup table and interpolate is my question. I recall hearing that's how it's done.

6nf · on March 31, 2023

CPUs can do dozens of calculations for the cost of one L1 cache fetch so this is still going to be faster probably

zython · on March 31, 2023

Its been a while since I've dealt with this, is this something similar to the CORDIC algorithm ? Couple of years ago I implemented it in VHDL, very fun exercise.

https://en.wikipedia.org/wiki/CORDIC

olabyne · on March 31, 2023

https://github.com/RobTillaart/FastTrig/blob/master/FastTrig...

If absolute precision is not required, there is many ways to compute cosinus !

sylware · on March 31, 2023

Programming accurate floating point calculations is where you need a maths college/uni degree. 99% or the "other programming" (including "AI") does not require skills higher than those from high-school.

Mike_12345 · on March 31, 2023

AI / Machine Learning requires university level mathematics https://www.deeplearningbook.org

sylware · on April 1, 2023

As far as I can remember, most probability functions they use in some neural net models were taught in high-school.

When I coded a neural net, the most important was the size of its connectom and not really related to the mathematical fancy-ness of neurons and synapses operations.

kolbe · on March 31, 2023

Memory model experts: if you define the constants in the way this code has, will they be contiguous in memory? Or would it be better to allocate these in an aligned(64) array?

sweetjuly · on April 1, 2023

C provides no guarantees about the memory layout of distinct globals. If you want a specific contiguous layout, you need to define it as a buffer. In practice though, most linkers will place them contiguously since they are simply appended to the constants section in the order they are parsed.

himinlomax · on March 31, 2023

What is y, "the tail of x"?

bonzini · on March 31, 2023

You pass an argument, let's call it a, to cos. Because the polynomial approximation is only valid for a relatively small value of the argument, the first step is to compute the remainder of a when divided by pi/2. The result, let's call it x, is as small as possible but will (or should) satisfy cos(x)=cos(a).

For example, let's say you're calculating cos(4). To reduce the argument you want to compute 4-n*pi/2 for some n. In this case n=2 so you want 4-pi, the smallest possible number whose cosine is equal to cos(4). In double-precision floating point, pi is not irrationall it's exactly 7074237752028440/2^51. Thus, x=4-pi will be 7731846010850208/2^53.

However, that's not enough to calculate cos(4) precisely, because pi is actually 1.224... * 10^-16 more than the fraction I gave before and thus cos(x) will not be exactly cos(4). To fix this, the remainder is computed in multiple precision, and that's where y comes in.

y is the result of computing (a-n*pi/2)-x with a value of pi that has a higher precision than just 7074237752028440/2^51. It is very small---so small that you don't need a polynomial to compute the remaining digits of cos(4), you can just use the linear approximation -x*y. But it's still very important in order to get the best possible approximation of cos(a), even if a is outside the interval -pi/2 to pi/2.

SeanLuke · on March 31, 2023

Fascinating. But why are cos and sin implemented with different polynomials?

cultofmetatron · on March 31, 2023

cosine and sine are easy thanks to the taylor series. They are both one liners in haskell. tangent on the other hand....

djmips · on March 31, 2023

sincos is better. Two for one deal. :)

hardware2win · on March 31, 2023

double_t hz,z,r,w;

Wtf?

*clicks request changes*

ducktective · on March 31, 2023

Same vibes:

  float Q_rsqrt( float number )
   {
   long i;
   float x2, y;
   const float threehalfs = 1.5F;

   x2 = number * 0.5F;
   y  = number;
   i  = * ( long * ) &y;                       // evil floating point bit level hacking
   i  = 0x5f3759df - ( i >> 1 );               // what the fuck? 
   y  = * ( float * ) &i;
   y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
   // y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed
   return y;
 }

https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...

29athrowaway · on March 31, 2023

It may be possible to rewrite this in a more readable way, with fewer magic numbers, using constant expressions.