> That’s exactly how V8 multiplies BigInts: one digit at a time, adding up the intermediate results. The algorithm works just as well for 0 to 9 as it does for a BigInt’s much bigger digits.
Why not use a faster algorithm, like the Karatsuba algorithm[1] or the Toom-Cook[2] algorithm?
Speaking of Karatsuba, in 1993 I wrote a C++ big integer package and used Karatsuba. I had some questions about how to use it when the two numbers being multiplied are not the same length. I did some experiments, and noticed hints at some patterns in the data I was seeing. I asked about this on sci.math, but as far as I know no one ever gave an explanation.
If anyone is curious and wants to figure this out, here's my old sci.math post [1].
You can do that - it's the "textbook" solution - but you can probably do better by not making your inputs larger: in the worst case you're going from multiplying N digits by 1 digit to N digits by N digits.
Sometimes we forget the lower order values when considering algorithmic complexity. Not every use case is likely to see the asymptotic behaviors.
Reminds me of a demo by my algorithms professor. A certain sorting method (I think binary) requires picking a good pivot to keep complexity down. Picking a random pivot point generally gives good results, but results in an O(n^2) algorithm when asymptotically examined. An algorithm for "perfect" point picking (mean of means I believe) was demonstrated. It resulted in a complexity of O(n) (linear). However, the scale to the linear term was 22. Therefore, in nearly every case, picking randomly would outperform it.
I did not realize Karatsuba required such large numbers to outperform. When I first learned about it, I was under the impression that it would be more effective even for barely "large" numbers
> For now, we are shipping a baseline implementation of BigInts. [...] it is not particularly optimized. [...] we first want to see how you will use BigInts
It's almost as if evolution of software is an iterative process where all parties move a bit at each step, not one from 0 to "fully planned ahead in advance (by expert committee)".
There's little reason to use a poor algorithm when good ones already exist. If the problem is already solved, you don't need any expert committee, just a few days of reading and writing.
So you already know how people are going to use BigInt, so you can tell them what optimizations are useful? Send them a message, JS runtime implementers are eager to hear from you! Keep in mind each optimization comes at a cost, so you don't want to willy-nilly implement any optimization that comes to mind (time and (code) space and (runtime memory) space complexity).
It follows that, for sufficiently large n, Karatsuba's
algorithm will perform fewer shifts and single-digit
additions than longhand multiplication, even though its
basic step uses more additions and shifts than the
straightforward formula. For small values of n, however,
the extra shift and add operations may make it run
slower than the longhand method. The point of positive
return depends on the computer platform and context.
As a rule of thumb, Karatsuba is usually faster when
the multiplicands are longer than 320–640 bits.
> we allocate an object in memory. We make it large enough to hold all the BigInt’s bits, in a series of chunks, which we call “digits”
In the Lisp world, the halfwords or words or whatever that a bignum's bits are divided into are called "bigits". At least, this is the term I heard around MIT in the late 1970s. I always thought it was cute.
Anyway, I'm glad to see this. I think all garbage-collected languages should provide arbitrary-precision integers as the default integer type, as Lisp has for decades. This certainly goes for interpreted languages like JavaScript, Python (which did adopt this policy at some point), Ruby, and even, I would argue, Java and C#. This, for example, is to me not a bug in the algorithm, but in the language, as it wouldn't happen in Common Lisp: https://research.googleblog.com/2006/06/extra-extra-read-all...
(Bounded integer types are available in Common Lisp, but it would be strange to use them for the bounds in a binary search algorithm, whose runtime is almost certain to be dominated by the comparison function. In any case their use would require a conscious choice on the part of the programmer.)
The sad thing about the way that Java and now JavaScript implement BigInts is that they're a separate type from ordinary integers, rather than having a single type that automatically changes between an immediate representation ("fixnums"), for integers that fit into a machine word less a few tag bits, and an allocated representation ("bignums") for larger integers. The latter provides the semantic benefits of arbitrary-precision integers at a much smaller runtime cost than if they're all allocated.
I agree that the approach "everything is a bigint" plus optimizations where a fixed width int suffices would be nice from the semantic perspective.
However, Javascript has basically the "everything is a floating point number" approach, and bigints would be hard to fit into that (do you want bigfloat as well? do you want to reproduce float arithmetic imprecision, or be precise? Etc.)
However, bigint could internally use just one fixed width number when possible as an optimization - although the current implementation doesn't seem to do this.
I've implemented support in the Scala.js compiler to emit Longs (64-bit signed integers) as BigInts about 6 weeks ago. [1] Unfortunately, without specific optimizations that are not part of the baseline currently implemented in V8, this implementation is very slow. It's up to 60x slower than our production implementation. [2] I'm looking forward to seeing small BigInts being optimized as native 64-bit values!
No need for custom parser. Right now, you can use custom format with reviver and serializer functions (2nd arg to serialize / parse). Say, have them serialize as string `^bigint[\d]+$` and revive appropriately.
Given that `int + bigint` throws in JS, this exact thing will never happen imo, otherwise, you can't differentiate between types.
Maybe something like `{"x": 10n}`, but at that point, might as well do reviver, especially if you already agreed on a format and use some to parse out class instances vs plain objects.
I think this for the most part is useful for WASM, but also storing/reading/writing to binary buffers as in handling file formats that stores (u)int64s (such as f.ex. CAF on the Mac) and with binary protocol responses from environments such as Java/.Net that already supports bigints (and of course, eventually Node), finance calculations.. nice.
Just curious, how will you deal with things like triple-comparing (`0x1n === 0x1` ?) internally (from the perspective of performance)?
> BigInts [..] are always signed (in ref. to the >>> operator)
Disclaimer: I haven't taken a deep-dive into the BigInt spec yet, but things like this comes to mind (say you want to shift and mask using unsigned 64-bit values):
When mixing Number and BigInt, abstract equality (==) works fine, but strict equality (===) is always false, since they are two different primitive types.
Most bitwise operators are allowed, so long as all operands are the same type (Number or BigInt). The only exception is zero-fill right shift (>>>) since that doesn't make any sense with respect to BigInts.
From my console: (node 10 with --harmony-bigint)
> 1n == 1
true
> 1n === 1
false
> 12345n >> 1
TypeError: Cannot mix BigInt and other types, use explicit conversions
> 12345n >> 1n
6172n
> 12345n >>> 1n
TypeError: BigInts have no unsigned right shift, use >> instead
Yes, I'm aware of this, but I was more asking in the realm of internal performance/optimization (as I sort of point out - and not necessarily because of the comparer). F.ex. they are in all practicality the same integer and in other cases, IIRC (please correct me if I'm wrong), V8 optimizes the native Number that could easily be stored internally as IEEE-754, to actual internal integers when using bit-ops on them (to indicate to the optimizer you handle integers) and so forth.
In this case I was curious if this would be optimized internally so these comparisons could be moved directly to CPU registers as integers (assuming being on a 64-bit arc) and compared there, for example. But, future will tell - I'm equally aware of that optimizations is not first priority at this stage (it's just me excited to see this manifest). It's in any case a welcome and useful feature optimized or not.
apaprocki linked (thanks!) to a method that can do unsigned right-shift via a method (1.1.11 BigInt::unsignedRightShift).
> I think this for the most part is useful for WASM
Also, for compile-to-JS versions of languages where the hosted language has BigInts, which now [0] can be more fully equivalent to the non-JS implementations of the same language. I think it's a while before GC-dependent languages will be WASM hosted, but there are a lot that compile to JS.
[0] where “now” is “sometime in the future where this has broad support among browsers”
This looks to have nothing whatsoever to do with WASM. It is very much not just (u)int64 support, which is all WASM needs/wants. This is proper BigNumber. Which you could use for (u)int64 support, sure, but it'd be incredibly slow at it vs. the much simpler to do "just use an actual hardware 64-bit number"
I've been a JavaScript developer for years and it's a surprise to me that there's a `Number.MAX_SAFE_INTEGER` and `Number.MIN_SAFE_INTEGER`.
It seems that I've never had the use case where I exceeded the maximum or minimum integer count. I figure there's a good amount of JavaScript developers that have the same experience.
Anyone who's used the Twitter API from Javascript has probably encountered the integer limit, because Tweet IDs are 64-bit values that often exceed MAX_SAFE_INTEGER.
Their JSON API works around that limitation by returning IDs as both "id" (a number that may not be representable in Javascript) and "id_str" (the same number in JS-friendly string form).
JSON spec doesn't restrict the range of numbers [0], though it does note that because different platforms that may use JSON have different limitations and IEEE 754 doubles are widely supported, valued that aren't representable as IEEE 754 doubles may have interoperability issues.
Sure, but why would you ever return an ID as a number? IDs are explicitly a data-type where arithmetic operations or other numeric transformations will not yield meaningful results.
String is the best choice in JSON but ultimately only somewhat less wrong (since string transformations won’t yield meaningful results for an ID either). That’s why GraphQL defines a discrete ID scalar.
I blame RDBMS-centric thinking. Things that look like numbers will perform well in an index, and we get free ID generation, so... why think about hard things, just throw it in there.
In terms of identity an ID is never a number, always a unique value. It should be its own type.
> though it does note that because different platforms that may use JSON hace different limitations and IEEE 754 doubles are widely supported, valued that aren't representable as IEEE 754 doubles may have interoperability issued
Yes, in Section 6: “Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.”
Just in general: your employee ID is likely a 'number', but it's not a number, it's a unique series of digits.... You don't add employee IDs. You don't multiply them by the current month. You don't group employees in ID based sets, find the ID modulus, and do things. They're not numbers, they are an alphanumeric series that should be unique.
And after seeing a hundred systems break when IDs started to overflow the 32 bit max, or struggle to handle a change to guids, or alphanumeric security codes, or a harmonization with another systems IDs, or freak out when their numbers get exposed through a URL and it turns out keeping leading-0 formatting has something to say: ... use a damned string.
First Ajax app I worked on had the same problem and for similar reasons. We configured the JSON emitter to turn them into Strings, and back on inbound requests.
I have run into the issue a few times. I've run into it most frequently when doing binary operations on 64bit numbers. When I first ran into the issue with going over `Number.MAX_SAFE_INTEGER`, it took me a long time to figure out that was the issue.
I can imagine this is an especially important advancement for server side JS. Not something I personally care about tho.
You may also be interested in learning about Number.EPSILON (the floating point epsilon) which is required when dealing with number equality...
For example, this is an incorrect way to compare numbers:
> 0.1 + 0.2 === 0.3
false
This is the proper way to compare numbers:
a = 0.1 + 0.2
b = 0.3
> Math.abs(a - b) < Number.EPSILON
true
This means the difference is smaller than the smallest quantity that can be represented in floating point number, and every JS number is a floating point number.
Not knowing about this can cause many issues if you deal with values representing currency.
Epsilon is not the smallest quantity that can be represented. Its the smallest amount that can be added to a number between 1 and 2. If its added to 2 the result is 2. We need to add 2 * Number.EPSILON to 2 to add anything. 4 * E to 4, 8 * E to 8, 16 * E to 16 etc.
As someone who likes to play around with math, it's "fun" to be able to implement functions that output large numbers, such as tetration [0]; which can be written as:
While I expected this to take a long time to run for sufficiently large inputs (on my machine tetrate(7n, 3n) takes about 40 seconds to produce ~3.8 * 10 ^ 695974), some inputs immediately throw a "Maximum BigInt size exceeded" error. This isn't surprising, however; as there has to be _some_ practical limit, but I wonder if anyone knows how V8 determines this? Is it based on available RAM? Do (will) other browser implement something similar?
(defun tetrate (a n)
(if (= n 0) 1
(expt a (tetrate a (1- n)))))
(compile 'tetrate)
(time (integer-length (tetrate 7 3))
I used 'integer-length' to check that a number of the correct magnitude was produced without actually having 700kB of digits dumped into my REPL buffer. On a new MacBook Pro running SBCL, this took, ah, 624 milliseconds. So there's some room for optimization of the JavaScript implementation :-)
Oops! I shouldn't try out new features right before going to bed... Here's the proper formula:
const tetrate = (a, n)=> n=== 0n ? 1n : a * * (tetrate(a, n - 1n));
And I definitely am not getting the same results described. However; it's still possible to get a "Maximum BigInt size exceeded" error, and I wonder if anyone knows how this is determined?
Edit: it looks like I posted the "proper" formula originally, but HN's formatting strips away Javascript's exponentiation operator. So, I'm using "* *" instead.
Interesting approach. If I understand your library correctly, you need to choose a bitsize at compile-time? I thought you often wanted unlimited precision (aka. possibly heap-allocated) when using bigints?
Indeed, in my case I didn't need arbitrary precision, just extending uint64 to uint128, uint256, uint512, ... Fixed size are perfectly fine for a variety of use cases, can be stack-allocated and are much faster due to the specialization.
Part of me feels like a hidden motivation for doing this was so Google could continue to ask their “implement a BigInt class” interview question :). Cool write up though - I enjoyed it.
A friend and I where wondering how the bigint is represented in memory. It looks like it may be a bit for the sign, a byte for the type, a word for the count, and and array of words for the number? My question was whether the whole struct along with the array is stored in stack memory (meaning it will all have to be moved when resized) or if the array of words is behind a ptr. Am I way off base? Does anyone know the answer?
JS requires somewhat different intuition. A JS value may be a BigInt, String, Function, etc. so you need to have something that can reference any of these.
The usual technique is NaN-boxing, where doubles are stored directly and non-doubles are represented with a NaN. A NaN gives you 51 bits to store to store your data. Usually you have a union type, with some bits for the type tag and the rest for data.
A BigInt may require more than 51 bits. So the obvious representation is to reserve a new tag value, and then in the data bits store a pointer to the heap-allocated structure that you describe. This could be a single array or a chunked linked list.
A further change would be to store small BigInts inline in those 51 bits, e.g. by reserving another tag. This would save a heap allocation at the cost of more branching.
This particular engine, V8, doesn't use NaN-boxing, because of the memory costs on 32-bit systems of using 64-bit values for boxed pointers. Instead, we use pointer tagging, relying that with word alignment, we have 2 bits free at the bottom of pointers.
If the lsb is unset, then the remaining 31 bits represent a "small integer", a.k.a. a Smi (on 64-bit, the top 32 bits make the Smi, so that they can be read out of memory with a normal 32-bit access).
If the lsb is set, then the remaining bits are the pointer, where the bottom bits have to be masked off to actually dereference it (practically, most accesses are offset-based field accesses anyway, so we just subtract 1 from the field offset). The other bit, the 2nd lsb, has recently started being used as a weak pointer marker.
So, the type tag isn't stored in bits of the pointer, but rather in the object being pointed to, not dissimilar to a vtable. This has the side-advantage that we can swap the type of an object in-place, without having to update pointers to it. BigInts are immutable, so they are stored as a heap object which holds the object tag (actually another pointer to a "map", that's a story for another time), a bitfield that stores sign and length, and then data inline (very similar to how we store strings).
Thanks for the explanation! If it functions similarly to strings, which are also immutable, when you do any type of operation it causes a reallocation?
I believe so, in this baseline implementation. In fact, that's also true of operations on doubles (non-Smi) in unoptimized code, which are stored as "heap numbers". Avoiding these allocations is one of the things an optimizing compiler can do (and does do, for doubles).
That said, for short-lived BigInts, the allocations may not be as expensive as you might think, since the garbage collector is generational and short-lived objects won't leave the nursery.
Basically, serialization will throw if there is a BigInt present. But, the .toJSON() callout is supported if a particular environment wants to opt-in to supporting it in some manner.
On a 64-bit machine, the max value of num_digits is 18,446,744,073,709,551,616. Since each digit is its own 64-bit value, if num_digits reached its maximum value, you'd need to allocate 147,573,952,589,676,412,928 bytes of memory for the digits themselves, or a little over 147,573 petabytes.
Why not use a faster algorithm, like the Karatsuba algorithm[1] or the Toom-Cook[2] algorithm?
[1] https://en.wikipedia.org/wiki/Karatsuba_algorithm
[2] https://en.wikipedia.org/wiki/Toom–Cook_multiplication