I have a similar example in my book Geometry for Programmers (https://www.manning.com/books/geometry-for-programmers). There I make a polynomial approximation for sine that is:
1) odd (meaning antisymmetric);
2) precise at 0 and Pi/2 (and -Pi/2 automatically);
3) have precise derivatives and 0 and Pi/2 (and -Pi/2);
4) have better overall precision than the equivalent approximation with Taylor series;
and it only consists of 4 members! Adding more members makes it impractical in the modern world since we have hardware implementations for sine on CPUs and GPUs.
Spoiler alert, the secret sauce is the integral equation ;-)
CPUs do have a 'hardware' (I guess microcode) implementation in the 80387 era FPU. But it isn't used. One reason is that the software implementation is faster in modern floating point code. I don't know about GPUs but I would guess compilers transparently take care of that.
It depends on the precision and accuracy and data types of the arguments and result. The 80387 was absolutely faster than software emulation for F32/64/80 by hooking real-mode interrupt handlers the way it was back to the 8086/8087. For I8/16, lookup tables were faster. For I32, a quality result would be very expensive. There were no GPUs in the 387 era, and there wasn't SIMD (MMX) either. In the x86 era, compilers weren't that smart. If they were, then they were slow.
Which is just a support function for cos (which doesn't do argument reduction). See [1] for the actual function, and [2] & [3] for the argument reduction code.
You really only need to compute one eighth of a circle segment (so, zero to pi/4), everything else you can do by symmetry by switching signs and flipping x, y. (And if you supported a larger range, you’d waste precision).
The function supports doubledouble precision (which is a trick, different from long double or quadruple precision, in which you express a number as an unevaluated sum of two doubles, the head (large) and the tail (much smaller)).
So, the “real” cosine function first takes its argument, and then reduces it modulo pi/2 to the desired range -pi/4, +pi/4 (with minimal loss of precision - you utilise the sign bit, so that’s better than using the range 0, +pi/2) in double double precision (rem_pio2 returns an int denoting which quadrant the input was in, and two doubles), and those two doubles are then passed to this function (or its cousin __sin, with the sign flipped as appropriate, depending on the quadrant).
Many people really have put in quite some care and thought to come up with this neat and precise solution.
You really only need to compute one eighth of a circle segment (so, zero to pi/4), everything else you can do by symmetry by switching signs and flipping x, y. (And if you supported a larger range, you’d waste precision).
In high school I was writing a paint program in BASIC on the Apple ][. Drawing circles was mind-numbingly slow until I read an article somewhere that pointed this out. It was like magic.
This is a faster implementation of cosine that utilises six static const double variables (C1, C2, C3, C4, C5, C6) to speed things up. These variables represent the coefficients of a polynomial approximation for the cosine function.
It’s the core implementation for a specific and very narrow range of arguments. The “genuine article” moves the argument into that range (via the rem_pio2 function performing range reduction, which is worth an article in itself), and then calls this core implementation.
It's an easy way to return nan and set FE_INVALID. If you're not worried about floating point exceptions you can just return NaN directly. Idk how much C# even guarantees about the floating point operations and if it's applicable there.
I remember an approximation function used in vacuum-tubes based surface-to-air system of the 60s: cos(x) = 0.7. As explained to us rookies, that precision was enough to hit the target.
Sin/cos are tightly connected to the complex exponential and exhibit a lot of such behavior.
In particular, sin(x+it) = sin(x)cosh(y) + i cos(x)sinh(y). Set x something to zero out cosine, and rely on the exponential nature of cosh to find a target y value.
Most highschool students mainly use sin to calculate the lengths of sides of triangles. So it feels very counterintuitive that sin can be greater than 1.
If you define sin as the solution to a particular differential equation, then it's relationship to the hyperbolic functions is much more obvious.
It's probably a pretty good 1st order guess that a surface to air missile launcher is aimed at 45 degrees. It's not aimed straight up and it's not flat on the ground.
My first experience with "Use the source Luke" was wondering as a young lad how computers calculated trig functions. This led me to the Unix V7 math lib source code[1], and thence to a book by Hart & Cheney which by luck my university library had on the shelf. Good times.
Implementing low precision transcendental functions is much more interesting IMO, as it allows for a wider variety of algorithms and approaches. When you are limited to the IEEE definitions, there really isn't much room for creativity and you have to handle all the obscure edge cases and chase least significant bits, bleh. That's not something you want to do if you need to calculate a trillion atan's. If one is curious how these algorithms work, check out the book "Elementary Functions".
In reality though, a typical stock libm is not that accurate; their error generally ranges from 1 to 5 ulps with no actual guarantees. I'm more interested in a hybrid libm with configurable errors, from the most accurate possible (0.5 ulps) to something like 2^20 ulps, and all in between.
You sure can, but if you can beat a single IEEE multiplication on an arbitrary processor my god that’s a program I hope to never ever in my life have to reverse engineer.
Never quite got this argument. You can frequently do 1000 fold improvements or do even better by optimizing within the same algorithmic complexity. That frequently flips the bit of "is this software usable".
Sometimes you can even make real world performance significantly better by ignoring the technically better algorithmic complexity, especially when you almost always with low n's
The IEEE standard doesn’t really require any particular precision for transcendental functions IIUC? The original standard doesn’t even define them. The 2008 version includes optional correctly-rounded transcendentals, but there seems to be a total of one implementation, CRlibm, which is an academic project of more than a decade (and a dead home page because of the INRIA Forge shutdown). Your run-of-the-mill libm has little to do with this.
Right, it requires correct rounding for the optional correctly-rounded functions, which ISO C, for example, does not require to correspond to your usual libm ones, or even to be provided at all (even with __STDC_IEC_559__ or __STDC_IEC_60559_BFP__). I think I saw a spec for correctly rounded functions for C somewhere, but even the latest draft[1] of C23 is content with just reserving the cr_* namespace and not mandating much for the usual functions outside it (F.10/16):
> IEC 60559 specifies correct rounding for the operations in the F.3 table of operations recommended by IEC 60559, and thereby preserves useful mathematical properties such as symmetry, monotonicity, and periodicity. The corresponding functions with (potentially) reserved `cr_`-prefixed names (7.33.8) do the same. The C functions in the table, however, are not required to be correctly rounded, but implementations should still preserve as many of these useful mathematical properties as possible.
Maybe correct rounding (and hence consistency) will be made into the standard requirements (at least for single and double precisions) in the near future?
Does anyone know of a good reference on signals and systems for people with a pure math background rather than an engineering or physics background?
So, the implementation linked above uses the polynomial of best approximation to cosine on an interval. That polynomial was presumably obtained using the Remez algorithm, which, given a function f and a set of n+2 points in your desired interval, yields a polynomial of degree n best approximating f in the interval (in the uniform norm sense)[1] by performing a sequence of linear solves
I have a math and specifically approximation theory background, and the wiki article on the Remez algorithm makes a lot of sense to me. Like, this is the language I speak: https://en.wikipedia.org/wiki/Remez_algorithm
I'm less comfortable though when I try to understand the documentation for the remez implementation in SciPy because it is using the language of signals and systems, which is how engineers think about this stuff: https://docs.scipy.org/doc/scipy/reference/generated/scipy.s...
Signals seems to use a lot of analogies to sound waves, and it does map in some way or another to the mathematical language I'm comfortable with. I've spent a little bit of time reading papers and introductory texts on signals, but the definitions feel a little loosey goosey to me.
Anyway, just opportunistically using this thread to ask if anyone knows of some book or papers or blog posts that can help me translate this stuff into math language :)
I've spent most of my career working at the intersection of multiple highly technical domains, both theoretical and applied, and a) the "jargon switch" b) the methodology differences and worst c) the switch of conceptual framework required to work on a problem that spans multiple domains is initially very irritating.
I'm pointing that out because if you are only looking for a rosetta stone to only tackle a), even if you find one, it'll leave you wanting because you'll still need maps to tackle b) and c)
Example: analog electronics. From a mathematician's POV, an analog circuit is just a system of differential equations.
But very few analog electronics specialist ever think about a circuit that way.
The words they use are the least important part of the "translation" problem. The way they approach and think about the system will be utterly alien to a pure or even applied math person.
Eventually, after learning the trade, you'll start to see familiar patterns emerge that can be mapped back to stuff you're familiar with, but a simple "this EE jargon means that in math language" won't be enough: you're going to have to get your hands dirty and learn how they think of the problem.
I don't know much about numpy.signal package, but there are other python packages that have good implementation for generating polynomial approximation. Among them are pythonsollya (a Python wrapper for Sollya), and SageMath.
Other than that, some of the overview references about Remez algorithm (should be a bit closer to your taste) that I like are:
- J.M. Muller et. al., "Handbook of Floating-Point Arithmetic", section 10.3.2 (and follow references in that section).
The scipy documentation you linked is specifically for generating a FIR filter using the Remez algorithm, which is why it is written in that style. It is a very specific use case of the algorithm and probably not the one I would recommend learning from. Another top level comment has a link to a decent overview of the Remez algorithm written without the "Signals and Systems" language.
Thanks, I think my request was unclear, but I'm specifically trying to understand the documentation linked, not the Remex algorithm itself. I'm trying to reconcile my knowledge of it with the language used in the documentation.
I have a pretty comfortable understanding of the Remez algorithm and generalizations and have written my own implementations for a research project
Take a look at the 2 listed references by Parks and McClellan. They are considered seminal papers in the field of signal processing and are the basis of the function you linked. And they will probably appeal to your math background more.
The name of the algorithm is Parks McClellan filter design, or sometimes you will see it named firpm, as popularized by MATLAB.
Some of the more recent implementations of cosine for single precision that are correctly rounded to all rounding modes according to IEEE 754 standard:
This takes me back. I did the transcendental functions for a CPM C compiler in the early 80s using Chebyshev polynomials.
My tutor failed my implementation because my code formatting was utterly crap. :) I am still laughing as there was no sin, cos in the maths library at all at the time.
Unless like me you made a typo when copying the table from a book and had a vector routine that mostly looked alright but would jerk slightly when that one value that was incorrect was chosen! <facepalm />
I'm pretty sure that indent was around even in the early 80's. Even if you couldn't be bothered to run it, your tutor could have. Must not have been a very good tutor to fail you for style.
I was using CP/M since about 1978, but you're right -- AFAIK there never was a port of indent to CP/M. I did make some assumptions, but developing for CP/M on CP/M was somewhat painful. There just wasn't enough memory space to work efficiently, or (often) even fit the compiled image into memory (<~60k after BIOS/BDOS/CCP on CP/M 2.2). Chebyshev polynomials certainly imply floating point, which could be a BIG memory hog with so little room.
If I had the resources at the time, I would have undertaken such a development task on a more capable (Unix) platform. Such an environment would improve the ability for developing/debugging the optimizations required. As you say, indent was available in BSD since '82, and I know it was present on the Sun2 systems I used in '83.
I had an Osborne 1. I had to swap disks multiple times to compile anything. I remember the C compiler by some guy, 'Walt Bolisvki'?, had the arguments passed in reverse on the stack (last on the top) and printf and friends ended up crazy hacks.
I didn't have access to a Unix at the time or very limited open source so ident was not a thing.
The CORDIC algorithm is great! Just a few months ago, I implemented a sine generator with it and also used fixed point arithmetic. It is a real eye opener to see that you can substitute multiplications by bit shifting when you store the numbers in the “right“ format.
1. I suspect decimal floating point is quite expensive to implement computationally.
2. Not an expert, but I think the floating-point algorithms and proofs for bases other than 2 are less well researched and less well known.
3. A calculator is a resource-constrained device.
4. Are there even any real benefits? Yes, some human-friendly numbers have an exact finite representation that is missing in base-2, but I can't see why would that matter, compared to just using 32-bit or 64-bit IEEE-754.
In practice, the cost is not that much different because four-function calculators do not do many calculations at once. Even in a graphing calculator, drawing a plot is never slow enough to make the device useless.
Decimal, and particularly BCD floating point, is much simpler than binary for a calculator since the numbers don't need any additional conversion steps to display.
And the fact that 0.1 + 0.1 works as expected is indeed a huge benefit.
I expect that's because the cost versus effort for having two kinds of floating point isn't very good. Not because decimal floating point is notably more expensive than binary floating point.
Keep squaring, doubling and subtracting one one the left. Keep raising by powers of 2 in the argument on the right. Eventually, if you're working with rational numbers, multiplication by 2^k moves all k fractional bits over to the integer side of the decimal, and then integer times pi is the base case of the recursion. If you rescale the units so that pi = 2^(n-1) for some n, you don't even have to think about the non-fractional part. The register value circling back to 0 is supposed to happen anyway.
You could Just as easily recursively define cos to "spiral downwards in x rather than outwards", each recursive layer reducing the argument of x by a factor of 2 until it reduces down to the precomputed value cos(machine precision).
Here is a naive question. A lot of effort goes into dealing with floating point imprecision and instabilities, right. So why aren’t these algorithms implemented with integers? A 64 bit integer could represent the region from 0 to 1 with 2*64 evenly spaced slices, for example.
Would that make things more accurate across operations? Is this something that is done?
That is called 'fixed point', and it is useful for some applications. What makes floating point interesting is that it can represent values with very different magnitudes; your 64-bit fixed point number can't represent any number greater than 1, for instance. Whereas a 32-bit float can represent numbers with magnitudes between ~2^-127 and ~2^127.
Just an option if magnitude is an issue. You only needs 0-1 plus a sign bit.
Fixed to float is easy enough. It'd let you use a fixed point implementation for 0 - pi/2 benefiting from smaller look up table and simpler implementation.
> It'd let you use a fixed point implementation for 0 - pi/2 benefiting from smaller look up table
Why would that table be smaller if you use 2^64 values in that range instead of the fewer than 2^55 doubles in that range?
> and simpler implementation.
Why would it be simpler? With doubles, the hardware will do scaling for you. With fixed point, you need a helper function or macro to do multiplications, and those have to use an intermediate result with more than 64 bits (in this special case, some shortcuts are possible, but those won’t make the implementation simpler).
There was an exchange about using fixed point (which apparently is being used according to other comments for easier correctness verification, speed and simplicity). Someone pointed out that float gives you a wider range of values. But those values are not needed for cosine which has a range of 0-1 in the interval 0 to pi/2.
> But those values are not needed for cosine which has a range of 0-1 in the interval 0 to pi/2
Ah not quite true though; the “range” is also referring to the exponent range of representable numbers, so don’t forget the small numbers near 0. You can get much smaller numbers with much higher precision out of a single-precision float than a fixed-point 64 bit [0,1] number.
There should be plenty of C trigonometric algorithms made for fixed point. They usually use lookup tables and interpolation. Used a lot in embedded microcontrollers that don't have hardware floating point or hardware trig functions, and in automotive where floats were (still are?) a no-go due to the dangerous float math pitfalls and the safety critical nature of ECUs.
I remember writing a fast, small footprint, but very accurate fixed point trig function for the microcontroller used in my batcher project, many lives ago. I'll try to find it and put it on github.
The neat thing is that you can use the native overflow wrap around property enabled by C integers and processor registers to rotate back and fourth around the trigonometric circle without needing to account for the edge cases of the 0/360 degree point. You do need to suppress MISRA warning here though as it will go nuts seeing that.
I don't have the code at hand to show you, nor do I have any links similar trig stuff as I wrote everything from scratch, first in Matlab to prover the concept, then in C, as there wasn't anything similar online at the time, but I used the principles of fixed point fractional math of representing floating point numbers as scaled integers in the Q notation, which is well documented for use in microcontrollers and DSPs of the era (early to late '00s)
Here's the wiki gist with some C code snippet examples [1]. Here's a more detailed math paper that goes in the weeds[2]. Here's a nice presentation from Motorola-Freescale-NXP I just found which I wish I had back then as this is awesome and explains the basics for everyone to understand, and seeing those Motorola Windows 95 GUI tools brings a nostalgic smile on my face.
What you're describing is fixed point. Fixed point absolutely exists in practice, but it has an important limitation: its accuracy is scale-dependent, unlike floating-point.
Suppose for example you were doing a simulation of the solar system. What units do you use for coordinates--AUs, light-seconds, kilometers, meters, even miles? With floating-point, it doesn't matter: you get the same relative accuracy at the end. With fixed point, you get vastly different relative accuracy if your numbers are ~10¯⁶ versus ~10⁶.
Re: "imprecision and instabilities" on standard hardware there really are neither. Floating point numbers have an exact meaning (a particular fraction with an integer numerator and a denominator that is a power of 2). There are exactly specified rules for basic operations so that adding, subtracting, multiplying, or dividing these fractions will give you the nearest representable fraction to the exact result (assuming the most common round-to-nearest mode).
The bit that appears as "imprecision and instabilities" is a mismatch between three things: (1) what the hardware provides (2) what controls or compiler optimisation settings the programming language exposes (3) what the user expects.
There are several sticky points with this. One mentioned in the parent article is higher precision. Starting from the original 8087 floating point coprocessor x86 has supported both 80-bit and 64-bit floating point in hardware, but programming language support has mostly ignored this and only shows the user "double" precison with 64 bits of storage allocated. But with some compiler settings parts of the compuation would be done at 80 bit precision with no user visibility or control over which parts had that extra precision.
Newer hardware modes stick to 64 bit computation so you're unlikely to run into that particular stickiness on a modern programming stack. There is another that comes up more now: fused multiply and add. Modern hardware can calculate
round(a*x+b)
in one step where a,x,b are floating point numbers (i.e. fractions) and the "round" operation produces the nearest representable fraction to the correct result. This is faster and more accurate to calculate than doing the multiplication and addition seperately, which gives
round(round(a*x)+b)
But again programming language control is weak over whether and when fused-multiply-and-add is used, and this is a detail that almost all programmers almost always just want to ignore anyway.
Speaking from no experience, but it should be doable. But since in most applications you're going to use floating point anyway (or, if you're using fixed point, maybe right bit shift or maybe even do integer division until it fits the range of your value), the extra precision would be discarded in the end.
The idea is to be able to compute 1.0 - h * z + (z * r - x * y) in high precision.
If h * z is slightly less than 1.0, then w = 1.0 - h * z can lose a lot of bits of precision. To recover the bits that were dropped in subtracting h * z from 1.0, we compute which bits of h * z were used (the high bits) and then subtract those bits from h * z to get only the low bits. We find the bits that were used by simply undoing the operation: 1.0 - w is mathematically the same as h * z, but is not the same as implemented due to limited precision.
So (1.0 - w) - h * z is the extra bits that were dropped from the computation of w. Add that to the final term z * r - x * y before that result is added to w, and we have a more precise computation of w + z * r - x * y.
Mathematically, we have (1 - (1 - hz)) - hz = (1 + hz) - hz = 1, so the computation seems redundant. But due to floating-point rounding, it is not, and still creates information. As the comment says:
> cos(x+y) = 1 - (x*x/2 - (r - x*y))
> For better accuracy, rearrange to
> cos(x+y) ~ w + (tmp + (r-x*y))
> where w = 1 - x*x/2 and tmp is a tiny correction term
> (1 - x*x/2 == w + tmp exactly in infinite precision)
You're probably more of a "real programmer" than whomever wrote this code--it's numerics code, after all.
There's no real tricks or anything to this code, it's just a polynomial evaluation of a Taylor series (which you probably learned, and subsequently forgot, from calculus), with some steps done using techniques for getting higher precision out of a floating-point value, and (in some other functions in libm, not here) occasional use of the bit representation of floating-point values. It's standard techniques, but in a field that is incredibly niche and where there is very little value in delving into if you don't have a need for it.
It's not a Taylor series. It's a series, but a minimax approximation, not Taylor.
Taylor series are essentially “Give me a polynomial p(x). At x=a, I want the value to be b, the first derivative to be c, the second derivative to be d, etc. etc.”. (Usually, a=0, also known as a MacLaurin series.) This gives you a _great_ approximation in the immediate vicinity of a, but usually a pretty horrible one the further away you get; in a sense, you spend all your approximation energy on that single point.
Minimax approximations are “Give me a polynomial p(x). Over the area x=[a,b], I want it to never stray far from f(x); give me the one that minimizes the worst-case distance”. (The distance function is usually either the absolute error |p(x)-f(x)|, or the relative error |p(x)-f(x)|/|f(x)|, but you can have an arbitrary weight function in some implementations.) So you consider everything in that area to be important, not just one point.
Taylor series are simple to calculate (which can help if you need to calculate the coefficients on the-fly, or by hand), good for analysis, and easy to understand, so most engineers will have heard of them in calculus. The Remez algorithm, on the other hand, essensially requires a computer, and is significantly more complicated to understand and implement. (Remez is not the only way to generate minimax polynomials; Chebychev polynomials come close, and I believe Sollya uses some other algorithm.) Both will converge to the actual function giving infinite terms, but minimax polynomials will be _much_ closer in most real-life situations. Also, if you are really pressed for it, you can have rational minimax polynomials (p(x)/q(x)) that approximate tricky functions _really_ well even at low degree, but of course at the cost of a division.
Agreed with this. Also outside of very specific applications like graphics libraries, robotics, game engines you would never want to write something like this from scratch and instead rely on an existing library or game engine. So unless you’re working in that area it’s not imposter syndrome to find it foreign, it’s probably good!
I’ve never worked directly on floating point libraries like this, but having dealt with similar stuff, I suspect if this were in production it would end up peppered with IFDEF to support various implementation differences across hardware types, even if it is compliant with eg IEEE standards.
This is a fascinating topic indeed. And the math behind polynomial approximation is not that deep or unapproachable. You just haven't found an explanation that works for you. Some time ago I tried to make one with a sine as an example: https://wordsandbuttons.online/sympy_makes_math_fun_again.ht...
It doesn't mention Taylor series anywhere but if you only make your system out of N differential equations, you'll get N members of the power series.
I don’t think this would help anyone’s imposter syndrome. Taking hardcore math line by line will yield confusion.
A better approach might be to learn how cosine can be approximated with a Taylor expansion, then realize this is just an optimized version of that. (Even if it’s a different algorithm, it’s still an approximation algorithm, which is the point.)
Staring at math constants won’t get you that insight.
The thing that makes me wonder if I'll ever be a real programmer when I grow up is that someone wrote this, probably on a Thursday after lunch (at least, that's how my brain imagines it).
They were able to conceptualize the math solution to a major optimization problem, deal with all sorts of precision issues with bit manipulation, and come up with a blazing fast bedrock algorithm that's dazzling both in it's simplicity and complexity. And then likely popped out to a pub for a pint and didn't think much about it again.
Intellectually, I understand that this person was specialized, and may struggle mightily navigating the complexity of modern web frameworks, something that I find straightforward because I do it every day.
But for some reason, this stuff always strikes me as being "real" programming, and what I'm doing as just plumbing, piping together the bits that the "real" programmers write.
Your mind is inventing the part about them doing it on a thursday afternoon before popping out to the pub.
This could have taken weeks of research and refinement. You can't tell from just looking at the final result.
Give yourself some credit. If you were tasked with producing a cosine implementation, you'd pull out some textbooks, read specs, research algorithms, try a few ideas, test, iterate. You could come up with this. The only reason you didn't is that you didn't need to make a cos function yet; you had other stuff to do.
Do you ever feel a burning desire to just solve the hell out of a particular problem? Usually this happens in the context of performance optimization, but it can be anything. Maybe you're supposed to be working on something else, but all you can think about is your pet problem. You throw a bunch of stuff at it, and mostly it doesn't work, but the failures don't discourage you at all; making any sort of progress is addicting. You look at other people's code. You read blog posts and research papers. You fill up notepads. None of this feels remotely like work. It's fun, and you simply cannot stop until you finally grind the problem into the dust.
My guess is that the author of this code was possessed by a spirit like this. Maybe I'm wrong; maybe he was just assigned by his manager to write a cosine function that met a certain performance threshold, and he immediately had a general idea of how to do it, and banged it out in an afternoon. But I doubt it.
My point is that, if you have felt this way about programming, you are doing "real" programming. It's not about whether you're writing a web framework or a cosine function. It's about the thrill of the chase.
If it's any consolation, I worked with one of these teams for a few years. Brilliant at the math stuff, and many could not handle git. Or unit tests.
You've spent your time one X, they've spent their time on Y. That's it. Most of these people are math people. They may have CS degrees, but they've spent the bulk of their time in math.
Actually, now that I recall, one of them was a nuclear physicist and the other was an industrial engineer - neither had CS degrees. I don't know about the others.
IMO the part of programming that interfaces with humans and requirements is just as tricky (in a different way) and just as important.
Also, have you ever formally studied maths? People don't just come up with this kind of thing. They read about algorithms in textbooks. They might not have read this specific algorithm (which may be original), but they'll have read lots of similar ones.
Here's someone else's fast-sine code snippet I found and used about a decade ago when I was working on an additive synthesizer I never finished. It basically uses the same method, Taylor series approximation.
Driving it as an oscillator relies on just a single integer addition operation per sample to increment the phase, and relies on integer overflow to reset the phase.
I like the comment from someone asking if it could really be so fast since it has 7 whole multiplications in it. Ah I love the realtime dsp world, so different from my python/JS day job.
You might be interested to know about mod 2^m -1. Look at these numbers:
32640, 15420, 39321, 43690
In binary they look like:
1111111100000000, 0111100001111000, 0011001100110011, 0101010101010101
How cute. They are like little one bit precision sine waves.
I assure you that these numbers multiplied by each other mod (2^16 -1) are all equal to zero. When multiplied by themselves they shift position. test it for yourself. Long story short, in the field of mod 2^16 -1, 2^x works just like e^pi i, these numbers function just like sine and cosine, and you can even make linear superposition out of them and use them as a basis to decompose other ints into "frequency space."
Even better, the number i is represented here. There's a number int this modulo system which squares to 1111111111111110 (aka -1). Once you see it, you can implement fast sine and cosine directly through 2^{ix} = cos(x) + i sin(x).
If you're doing a synth and want a sine wave, you usually won't need to compute cos(x + at) from scratch every time. There's a cheaper and more accurate trick.
First, figure out how much the phase (in radians) is going to increase every sample. E.g., for 440 Hz at 44100 Hz sample rate, you want 2pi * 440 / 44100 = 0.06268937721 (or so). Precompute sin(a) and cos(a), once.
Now your oscillator is represented as a point in 2D space. Start it at [1,0]. Every sample, you rotate this point a radians around the origin by a radians (this is a 2x2 matrix multiplication, using the aforemented sin(a) and cos(a) constants, so four muladds instead of seven). You can now directly read off your desired cos value; it's the x coordinate. You get sin for free if you need it (in the y coordinate).
If you run this for millions or billions of samples (most likely not; few notes are longer than a few seconds), the accumulated numerical errors will cause your [x,y] vector to be of length different from 1; if so, just renormalize every now and then.
Of course, if you need to change a during the course of a note, e.g. for frequency or phase modulation (FM/PM), this won't work. But if so, you probably already have aliasing problems and usually can't rely on a simple cos() call anyway.
That is a really cool trick, I think I'll give this a try over the weekend.
I'm a bit confused about pre-computing the table though, wouldn't it have to be massive since the sample rate and note frequency don't evenly line up? On my phone at the minute but with a 440hz note under 44.1k sr, doesn't one cycle equal 100.227~ samples? I'm guessing the idea is to make a smaller table and interpolate? And I'm guessing you compute these tables for each musical note (or a limited but realistic range) at compile or load time, and not with each note press.
Not being able to change the frequency while the note is running seems like a big deal for a synth though, no? Even without high frequency FM/PM which is often okay to not support, you would still be giving up on things like legato, lfo pitch mod (vibrato) and pitch wheel bending. Or is this only a problem in the higher frequencies of modulation? You know, it sounds like a fun thing to find out myself lol.
The accumulating errors could be an issue with free-running oscillators, especially in a studio system left on for a few days. Free-running oscillators aren't that common in additive synths I believe, but it could be neat to see if it makes a difference like it does in a subtractive synth. Either way, periodic renormalization should be pretty straightforward.
Also I do somewhat remember when I was doing my original research for an additive synth, the word on KVR was a lookup table based sin was fastest as long as you could fit it into L1 cache. That was over a decade ago and caches are a lot bigger now.
I think I can explain the trick even easier and hopefully that clears up your confusion. Simply store a complex number s which represents the current output of the synth (cos on the real part, sin on the imag). At each time update (the sample rate), multiply s by a pre-computed constant c (which is also complex valued). You just need for c^(i/sample_rate) to equal e^(i*freq/2pi). It does not matter that freq/2pi doesn't line up with the sample rate. The trick is wholly dependent on just keeping track of the state s and reliably multiplying it at a known update frequency.
You don't need a table for a simple sine oscillator. You just compute sin(a) and cos(a) on note-on, and that's it. You don't store the output in a table, you just apply that 2x2 matrix every time.
Tables can definitely have their place, depending on your synthesis method. If you want something like pitch bend, going for a sine table may very well be the fastest (it depends a bit on how much cache you want to burn).
I recently had to implement some Bessel functions numerically. It's fantastic to see that someone at Sun Microsystems implemented all this procedures in the early 90's (I'm sure even long before that). You will find this implementations in R, Julia, musl, you name it!
I like it specially the use the "GET_HIGH_WORD" and friends:
yeah, Fredrik's papers are great. A lot of the reason Bessels.jl is so good is because ARB makes it easy for us to have test against a known source of truth (which make things like minimax polynomials and picking optimal tradeoffs a ton easier).
Its an alternative implementation which uses interpolated table lookup and requires only 66 bytes of constant memory with an error of less than 0.05% compared to the floating point sine.
The (presumably relative) error of 0.05% implies only ~11 significant bits, while 32-bit and 64-bit floating point formats have 24 and 53 significant bits respectively. If your sine has only 11 correct bits and you are okay with that, why do you have much more than 11 bits of input precision after all?
EDIT: Someone (sorry, the comment was removed before I could cite one) mentioned that it is actually for Q1.15 fixed point. So it was not even an "alternative" implementation...
For the first, maybe it's an approximate identity I'd never heard of?
For the second, it seems like it relies on X being small, but in this code... Y might be small enough for sin==identity approximation, but X isn't, is it?
The input is a double-double where y << x by definition. So the first line is just using the standard cos identity cos(x+y) = cos(x)cos(y) - sin(x)sin(y). Since y is very small, we can approximate cos(y) ~= 1, and sin(y) ~= y. This yields the first line.
The second line, I believe, is not relying on x being small, but rather relying on sin(x)*y being small. The whole result of the sin(x)*y term is approximately on the order of the LSB of the result, and the relative error of approximating sin(x) using 1 is better than using 0, so this term will help produce more accurate rounding of the bottom bits.
I think the function inputs are in double-double format, so the assumption is that the magnitude of y is significantly smaller than the magnitude of x, ideally |y| < 2^(-52) |x|. So that 1 is a very good approximation for cos(y), since |cos(y) - 1| < |y|^2 < 2^(-104) |x|^2. Similarly, if we assume |x| is also small, |sin(x) - x| < |x|^3 is also a very good approximation (same with sin(y) ~ y).
It is possible to draw a line through n points with an n-1 degree polynomial, and since the sine/cosine curve is so simple you don't need to take that many samples to get a good approximate.
Would it not be faster to just store a lookup table and interpolate is my question. I recall hearing that's how it's done.
Its been a while since I've dealt with this, is this something similar to the CORDIC algorithm ? Couple of years ago I implemented it in VHDL, very fun exercise.
Programming accurate floating point calculations is where you need a maths college/uni degree.
99% or the "other programming" (including "AI") does not require skills higher than those from high-school.
As far as I can remember, most probability functions they use in some neural net models were taught in high-school.
When I coded a neural net, the most important was the size of its connectom and not really related to the mathematical fancy-ness of neurons and synapses operations.
Memory model experts: if you define the constants in the way this code has, will they be contiguous in memory? Or would it be better to allocate these in an aligned(64) array?
C provides no guarantees about the memory layout of distinct globals. If you want a specific contiguous layout, you need to define it as a buffer. In practice though, most linkers will place them contiguously since they are simply appended to the constants section in the order they are parsed.
You pass an argument, let's call it a, to cos. Because the polynomial approximation is only valid for a relatively small value of the argument, the first step is to compute the remainder of a when divided by pi/2. The result, let's call it x, is as small as possible but will (or should) satisfy cos(x)=cos(a).
For example, let's say you're calculating cos(4). To reduce the argument you want to compute 4-n*pi/2 for some n. In this case n=2 so you want 4-pi, the smallest possible number whose cosine is equal to cos(4). In double-precision floating point, pi is not irrationall it's exactly 7074237752028440/2^51. Thus, x=4-pi will be 7731846010850208/2^53.
However, that's not enough to calculate cos(4) precisely, because pi is actually 1.224... * 10^-16 more than the fraction I gave before and thus cos(x) will not be exactly cos(4). To fix this, the remainder is computed in multiple precision, and that's where y comes in.
y is the result of computing (a-n*pi/2)-x with a value of pi that has a higher precision than just 7074237752028440/2^51. It is very small---so small that you don't need a polynomial to compute the remaining digits of cos(4), you can just use the linear approximation -x*y. But it's still very important in order to get the best possible approximation of cos(a), even if a is outside the interval -pi/2 to pi/2.
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
I have a similar example in my book Geometry for Programmers (https://www.manning.com/books/geometry-for-programmers). There I make a polynomial approximation for sine that is: 1) odd (meaning antisymmetric); 2) precise at 0 and Pi/2 (and -Pi/2 automatically); 3) have precise derivatives and 0 and Pi/2 (and -Pi/2); 4) have better overall precision than the equivalent approximation with Taylor series; and it only consists of 4 members! Adding more members makes it impractical in the modern world since we have hardware implementations for sine on CPUs and GPUs.
Spoiler alert, the secret sauce is the integral equation ;-)