Hacker News new | past | comments | ask | show | jobs | submit login
Adding BigInts to V8 (v8project.blogspot.com)
214 points by ingve on May 2, 2018 | hide | past | favorite | 84 comments



> That’s exactly how V8 multiplies BigInts: one digit at a time, adding up the intermediate results. The algorithm works just as well for 0 to 9 as it does for a BigInt’s much bigger digits.

Why not use a faster algorithm, like the Karatsuba algorithm[1] or the Toom-Cook[2] algorithm?

[1] https://en.wikipedia.org/wiki/Karatsuba_algorithm

[2] https://en.wikipedia.org/wiki/Toom–Cook_multiplication


Speaking of Karatsuba, in 1993 I wrote a C++ big integer package and used Karatsuba. I had some questions about how to use it when the two numbers being multiplied are not the same length. I did some experiments, and noticed hints at some patterns in the data I was seeing. I asked about this on sci.math, but as far as I know no one ever gave an explanation.

If anyone is curious and wants to figure this out, here's my old sci.math post [1].

[1] https://groups.google.com/d/msg/sci.math/MX3MLCQ0zzA/XTqUoZk...


Wow! I find it remarkable that you have had this question open for 25 years without a single answer.


"...the two numbers being multiplied are not the same length."

Do you mean as counted in bits? If so, prepend zeroes until they are the same length. Does this negatively affect the algorithm's results?

EDIT: When I say 'prepend' I mean on the MSB end. In typical arithmetic on paper, that'd be adding leading zeroes (prepending) onto the numbers.


You can do that - it's the "textbook" solution - but you can probably do better by not making your inputs larger: in the worst case you're going from multiplying N digits by 1 digit to N digits by N digits.


Might be something to do with caching or fitting operands in registers?


Did you eventually find an answer?


They may not be faster, depending on what size integers you are dealing with.

From your [1]: "As a rule of thumb, Karatsuba is usually faster when the multiplicands are longer than 320–640 bits."

Toom-3 doesn't get faster than Karatsuba (couldn't find numbers quickly). See e.g. https://fossies.org/linux/gmp/tune/README for a discussion.


Sometimes we forget the lower order values when considering algorithmic complexity. Not every use case is likely to see the asymptotic behaviors.

Reminds me of a demo by my algorithms professor. A certain sorting method (I think binary) requires picking a good pivot to keep complexity down. Picking a random pivot point generally gives good results, but results in an O(n^2) algorithm when asymptotically examined. An algorithm for "perfect" point picking (mean of means I believe) was demonstrated. It resulted in a complexity of O(n) (linear). However, the scale to the linear term was 22. Therefore, in nearly every case, picking randomly would outperform it.

I did not realize Karatsuba required such large numbers to outperform. When I first learned about it, I was under the impression that it would be more effective even for barely "large" numbers


> For now, we are shipping a baseline implementation of BigInts. [...] it is not particularly optimized. [...] we first want to see how you will use BigInts


To be fair, how it's used probably depends in part on how well optimized it is.


It's almost as if evolution of software is an iterative process where all parties move a bit at each step, not one from 0 to "fully planned ahead in advance (by expert committee)".


There's little reason to use a poor algorithm when good ones already exist. If the problem is already solved, you don't need any expert committee, just a few days of reading and writing.


So you already know how people are going to use BigInt, so you can tell them what optimizations are useful? Send them a message, JS runtime implementers are eager to hear from you! Keep in mind each optimization comes at a cost, so you don't want to willy-nilly implement any optimization that comes to mind (time and (code) space and (runtime memory) space complexity).


"It's Wagile!"


Those have better asymptotics but worse constants.


Specifically:

    It follows that, for sufficiently large n, Karatsuba's
    algorithm will perform fewer shifts and single-digit 
    additions than longhand multiplication, even though its 
    basic step uses more additions and shifts than the 
    straightforward formula. For small values of n, however, 
    the extra shift and add operations may make it run 
    slower than the longhand method. The point of positive 
    return depends on the computer platform and context. 
    As a rule of thumb, Karatsuba is usually faster when 
    the multiplicands are longer than 320–640 bits.
https://en.wikipedia.org/wiki/Karatsuba_algorithm#Efficiency...


Isn't their solution Toom-Cook where the base is 2^32?


> we allocate an object in memory. We make it large enough to hold all the BigInt’s bits, in a series of chunks, which we call “digits”

In the Lisp world, the halfwords or words or whatever that a bignum's bits are divided into are called "bigits". At least, this is the term I heard around MIT in the late 1970s. I always thought it was cute.

Anyway, I'm glad to see this. I think all garbage-collected languages should provide arbitrary-precision integers as the default integer type, as Lisp has for decades. This certainly goes for interpreted languages like JavaScript, Python (which did adopt this policy at some point), Ruby, and even, I would argue, Java and C#. This, for example, is to me not a bug in the algorithm, but in the language, as it wouldn't happen in Common Lisp: https://research.googleblog.com/2006/06/extra-extra-read-all...

(Bounded integer types are available in Common Lisp, but it would be strange to use them for the bounds in a binary search algorithm, whose runtime is almost certain to be dominated by the comparison function. In any case their use would require a conscious choice on the part of the programmer.)

The sad thing about the way that Java and now JavaScript implement BigInts is that they're a separate type from ordinary integers, rather than having a single type that automatically changes between an immediate representation ("fixnums"), for integers that fit into a machine word less a few tag bits, and an allocated representation ("bignums") for larger integers. The latter provides the semantic benefits of arbitrary-precision integers at a much smaller runtime cost than if they're all allocated.


I agree that the approach "everything is a bigint" plus optimizations where a fixed width int suffices would be nice from the semantic perspective.

However, Javascript has basically the "everything is a floating point number" approach, and bigints would be hard to fit into that (do you want bigfloat as well? do you want to reproduce float arithmetic imprecision, or be precise? Etc.)

However, bigint could internally use just one fixed width number when possible as an optimization - although the current implementation doesn't seem to do this.


I've implemented support in the Scala.js compiler to emit Longs (64-bit signed integers) as BigInts about 6 weeks ago. [1] Unfortunately, without specific optimizations that are not part of the baseline currently implemented in V8, this implementation is very slow. It's up to 60x slower than our production implementation. [2] I'm looking forward to seeing small BigInts being optimized as native 64-bit values!

[1] https://github.com/scala-js/scala-js/pull/3286/commits/deec9...

[2] https://github.com/tc39/proposal-bigint/issues/117


I hope that after they collect feedback Int64 shows as a significant enough use case to prioritize optimization.


This is also available in Node 10 behind the `--harmony-bigint` flag. I've already made two modules with https://github.com/emilbayes/secure-random-uniform#bigint-su... and https://github.com/emilbayes/biguintle. Lots of fun!


(Keep in mind that Node 10 will update to V8 6.8 before it becomes LTS, so it will get BigInt without any flags.)


Not as useful as it could be since it has special syntax that isn't supported in JSON where not having 64-bit ints is one of the bigger issues.

https://webapplog.com/decreasing-64-bit-tweet-id-in-javascri...

The JSON use case was even one of the motivations behind BigInt in the first place:

https://twitter.com/sampullara/status/981634718964707328


While there's a few gotchas, rolling your own JSON parser isn't necessarily a ton of work.

Or perhaps a proposal could be written to give numbers emitted by JSON.parse a "raw" property, so they could be decoded using BigInt?


No need for custom parser. Right now, you can use custom format with reviver and serializer functions (2nd arg to serialize / parse). Say, have them serialize as string `^bigint[\d]+$` and revive appropriately.


But the point is to support large numbers with the traditional syntax.

e.g. so that {"foo": 1234567890123456789012345678901234567890} doesn't lose precision.


Given that `int + bigint` throws in JS, this exact thing will never happen imo, otherwise, you can't differentiate between types.

Maybe something like `{"x": 10n}`, but at that point, might as well do reviver, especially if you already agreed on a format and use some to parse out class instances vs plain objects.


Probably best to represent long numbers as strings in json; json doesn't support a lot of JS's features.


This is a welcome feature.

I think this for the most part is useful for WASM, but also storing/reading/writing to binary buffers as in handling file formats that stores (u)int64s (such as f.ex. CAF on the Mac) and with binary protocol responses from environments such as Java/.Net that already supports bigints (and of course, eventually Node), finance calculations.. nice.

Just curious, how will you deal with things like triple-comparing (`0x1n === 0x1` ?) internally (from the perspective of performance)?

> BigInts [..] are always signed (in ref. to the >>> operator)

Disclaimer: I haven't taken a deep-dive into the BigInt spec yet, but things like this comes to mind (say you want to shift and mask using unsigned 64-bit values):

    0xffffffffffffffffn >>  24 => 0xffffffffffffffffn (hmm)

    0xffffffffffffffffn >>> 24 => 0x000000ffffffffffn (but won't be supported)
Will this be solvable with methods or will I need to instead manually mask off the signed remainder?


There's a related article linked in OP that answers your questions: https://developers.google.com/web/updates/2018/05/bigint

When mixing Number and BigInt, abstract equality (==) works fine, but strict equality (===) is always false, since they are two different primitive types.

Most bitwise operators are allowed, so long as all operands are the same type (Number or BigInt). The only exception is zero-fill right shift (>>>) since that doesn't make any sense with respect to BigInts.

From my console: (node 10 with --harmony-bigint)

  > 1n == 1
  true
  > 1n === 1
  false
  > 12345n >> 1
  TypeError: Cannot mix BigInt and other types, use explicit conversions
  > 12345n >> 1n
  6172n
  > 12345n >>> 1n
  TypeError: BigInts have no unsigned right shift, use >> instead


> since they are two different primitive types.

Yes, I'm aware of this, but I was more asking in the realm of internal performance/optimization (as I sort of point out - and not necessarily because of the comparer). F.ex. they are in all practicality the same integer and in other cases, IIRC (please correct me if I'm wrong), V8 optimizes the native Number that could easily be stored internally as IEEE-754, to actual internal integers when using bit-ops on them (to indicate to the optimizer you handle integers) and so forth.

In this case I was curious if this would be optimized internally so these comparisons could be moved directly to CPU registers as integers (assuming being on a 64-bit arc) and compared there, for example. But, future will tell - I'm equally aware of that optimizations is not first priority at this stage (it's just me excited to see this manifest). It's in any case a welcome and useful feature optimized or not.

apaprocki linked (thanks!) to a method that can do unsigned right-shift via a method (1.1.11 BigInt::unsignedRightShift).


There is no implicit conversion. 0x1n === 0x1 returns false. See here: https://github.com/tc39/proposal-bigint#no-implicit-conversi...

Shifting is specified here: https://tc39.github.io/proposal-bigint/#sec-numeric-types-bi...


> I think this for the most part is useful for WASM

Also, for compile-to-JS versions of languages where the hosted language has BigInts, which now [0] can be more fully equivalent to the non-JS implementations of the same language. I think it's a while before GC-dependent languages will be WASM hosted, but there are a lot that compile to JS.

[0] where “now” is “sometime in the future where this has broad support among browsers”


This looks to have nothing whatsoever to do with WASM. It is very much not just (u)int64 support, which is all WASM needs/wants. This is proper BigNumber. Which you could use for (u)int64 support, sure, but it'd be incredibly slow at it vs. the much simpler to do "just use an actual hardware 64-bit number"


I wish we also had regular word-sized ints as well instead of everything being double precision with all kinds of weird edge cases.


I've been a JavaScript developer for years and it's a surprise to me that there's a `Number.MAX_SAFE_INTEGER` and `Number.MIN_SAFE_INTEGER`.

It seems that I've never had the use case where I exceeded the maximum or minimum integer count. I figure there's a good amount of JavaScript developers that have the same experience.

But good article nonetheless.


Anyone who's used the Twitter API from Javascript has probably encountered the integer limit, because Tweet IDs are 64-bit values that often exceed MAX_SAFE_INTEGER.

Their JSON API works around that limitation by returning IDs as both "id" (a number that may not be representable in Javascript) and "id_str" (the same number in JS-friendly string form).


Wow, why would they ever send "id" as a number? Seems like a terrible idea when it doesn't fit in the medium's spec (JSON).


JSON spec doesn't restrict the range of numbers [0], though it does note that because different platforms that may use JSON have different limitations and IEEE 754 doubles are widely supported, valued that aren't representable as IEEE 754 doubles may have interoperability issues.

[0] https://tools.ietf.org/html/rfc8259#section-6


Sure, but why would you ever return an ID as a number? IDs are explicitly a data-type where arithmetic operations or other numeric transformations will not yield meaningful results.

String is the best choice in JSON but ultimately only somewhat less wrong (since string transformations won’t yield meaningful results for an ID either). That’s why GraphQL defines a discrete ID scalar.


I blame RDBMS-centric thinking. Things that look like numbers will perform well in an index, and we get free ID generation, so... why think about hard things, just throw it in there.

In terms of identity an ID is never a number, always a unique value. It should be its own type.


> though it does note that because different platforms that may use JSON hace different limitations and IEEE 754 doubles are widely supported, valued that aren't representable as IEEE 754 doubles may have interoperability issued

Does it note that in the spec? I can't find it.


Yes, in Section 6: “Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.”


Just in general: your employee ID is likely a 'number', but it's not a number, it's a unique series of digits.... You don't add employee IDs. You don't multiply them by the current month. You don't group employees in ID based sets, find the ID modulus, and do things. They're not numbers, they are an alphanumeric series that should be unique.

And after seeing a hundred systems break when IDs started to overflow the 32 bit max, or struggle to handle a change to guids, or alphanumeric security codes, or a harmonization with another systems IDs, or freak out when their numbers get exposed through a URL and it turns out keeping leading-0 formatting has something to say: ... use a damned string.

"00123" and 123 are different. Just use a string.


Maybe for some other languages which implement JSON libraries (Python comes to mind) with greater than double floating point precision.


First Ajax app I worked on had the same problem and for similar reasons. We configured the JSON emitter to turn them into Strings, and back on inbound requests.


Send 64 bit numbers from the server and you’ll run into trouble sooner or later.

If the numbers are random (like IDs) you’ll hit it about every 1000th ID. Solution: IDs are strings not numbers.


I have run into the issue a few times. I've run into it most frequently when doing binary operations on 64bit numbers. When I first ran into the issue with going over `Number.MAX_SAFE_INTEGER`, it took me a long time to figure out that was the issue.

I can imagine this is an especially important advancement for server side JS. Not something I personally care about tho.


Badly needed for server side development


You may also be interested in learning about Number.EPSILON (the floating point epsilon) which is required when dealing with number equality...

For example, this is an incorrect way to compare numbers:

    > 0.1 + 0.2 === 0.3
    false
This is the proper way to compare numbers:

    a = 0.1 + 0.2
    b = 0.3
    > Math.abs(a - b) < Number.EPSILON
    true
This means the difference is smaller than the smallest quantity that can be represented in floating point number, and every JS number is a floating point number.

Not knowing about this can cause many issues if you deal with values representing currency.


Epsilon is not the smallest quantity that can be represented. Its the smallest amount that can be added to a number between 1 and 2. If its added to 2 the result is 2. We need to add 2 * Number.EPSILON to 2 to add anything. 4 * E to 4, 8 * E to 8, 16 * E to 16 etc.

That kind of equality test should be:

  abs( a-b ) < Number.EPSILON * max( abs(a), abs(b) )
In practice I judge the scale of the quantization noise that may accrue and compare the difference to it.

It may also be possible to kind of caste precision away by adding a trick value. eg.

  a + t == b + t
If a and b are positive, that is somewhat similar to:

  abs( a-b ) < Number.EPSILON * t


Thanks for pointing it out.

A more popular definition is "the smallest number that yields a result different to 1 when added to 1".


Look out - I just now tested those two extra ideas which I added for equality testing and found they dont work.

This version of adding a rounding value seems to though:

  ( a-b + rounder ) == rounder
My apologies, be careful out there :)


As someone who likes to play around with math, it's "fun" to be able to implement functions that output large numbers, such as tetration [0]; which can be written as:

const tetrate = (a, n)=> n=== 0n ? 1n : a(tetrate(a, n - 1n));

While I expected this to take a long time to run for sufficiently large inputs (on my machine tetrate(7n, 3n) takes about 40 seconds to produce ~3.8 * 10 ^ 695974), some inputs immediately throw a "Maximum BigInt size exceeded" error. This isn't surprising, however; as there has to be _some_ practical limit, but I wonder if anyone knows how V8 determines this? Is it based on available RAM? Do (will) other browser implement something similar?

[0] https://en.wikipedia.org/wiki/Tetration


I tried this in Common Lisp:

  (defun tetrate (a n)
    (if (= n 0) 1
      (expt a (tetrate a (1- n)))))
  (compile 'tetrate)
  (time (integer-length (tetrate 7 3))
I used 'integer-length' to check that a number of the correct magnitude was produced without actually having 700kB of digits dumped into my REPL buffer. On a new MacBook Pro running SBCL, this took, ah, 624 milliseconds. So there's some room for optimization of the JavaScript implementation :-)


Oops! I shouldn't try out new features right before going to bed... Here's the proper formula:

const tetrate = (a, n)=> n=== 0n ? 1n : a * * (tetrate(a, n - 1n));

And I definitely am not getting the same results described. However; it's still possible to get a "Maximum BigInt size exceeded" error, and I wonder if anyone knows how this is determined?

Edit: it looks like I posted the "proper" formula originally, but HN's formatting strips away Javascript's exponentiation operator. So, I'm using "* *" instead.


IME it's easy to get an error in bigint libraries if you do something like

  2 ** (2 ** n)
for some n greater than a few dozen. Because then you will have

  2**n
digits, and the internal value holding the number of digits is limited to

  2**32
or

  2**64


I would be very interested in the test suite you used.

I've finished my own bigint implementation in the Nim language:

- stack and power-of-2 only for uint256, uint512, uint1024, etc),

- using recursive types (uint128 = 2x uint64, uint256 = 2x uint128).

It is already quite optimized (division using Burnikel and Ziegler recursive division for example), but I want to make sure it works fine.

https://github.com/status-im/nim-stint


Interesting approach. If I understand your library correctly, you need to choose a bitsize at compile-time? I thought you often wanted unlimited precision (aka. possibly heap-allocated) when using bigints?


Indeed, in my case I didn't need arbitrary precision, just extending uint64 to uint128, uint256, uint512, ... Fixed size are perfectly fine for a variety of use cases, can be stack-allocated and are much faster due to the specialization.


It would be cool to post some performance results from loops of common operations, comparing the various types and various bigint sizes.


    big_array[0] = 456;
    TypeError: Cannot convert 456 to a BigInt
That's very un-javascript-like.


Part of me feels like a hidden motivation for doing this was so Google could continue to ask their “implement a BigInt class” interview question :). Cool write up though - I enjoyed it.


A friend and I where wondering how the bigint is represented in memory. It looks like it may be a bit for the sign, a byte for the type, a word for the count, and and array of words for the number? My question was whether the whole struct along with the array is stored in stack memory (meaning it will all have to be moved when resized) or if the array of words is behind a ptr. Am I way off base? Does anyone know the answer?


JS requires somewhat different intuition. A JS value may be a BigInt, String, Function, etc. so you need to have something that can reference any of these.

The usual technique is NaN-boxing, where doubles are stored directly and non-doubles are represented with a NaN. A NaN gives you 51 bits to store to store your data. Usually you have a union type, with some bits for the type tag and the rest for data.

A BigInt may require more than 51 bits. So the obvious representation is to reserve a new tag value, and then in the data bits store a pointer to the heap-allocated structure that you describe. This could be a single array or a chunked linked list.

A further change would be to store small BigInts inline in those 51 bits, e.g. by reserving another tag. This would save a heap allocation at the cost of more branching.


This particular engine, V8, doesn't use NaN-boxing, because of the memory costs on 32-bit systems of using 64-bit values for boxed pointers. Instead, we use pointer tagging, relying that with word alignment, we have 2 bits free at the bottom of pointers.

If the lsb is unset, then the remaining 31 bits represent a "small integer", a.k.a. a Smi (on 64-bit, the top 32 bits make the Smi, so that they can be read out of memory with a normal 32-bit access).

If the lsb is set, then the remaining bits are the pointer, where the bottom bits have to be masked off to actually dereference it (practically, most accesses are offset-based field accesses anyway, so we just subtract 1 from the field offset). The other bit, the 2nd lsb, has recently started being used as a weak pointer marker.

So, the type tag isn't stored in bits of the pointer, but rather in the object being pointed to, not dissimilar to a vtable. This has the side-advantage that we can swap the type of an object in-place, without having to update pointers to it. BigInts are immutable, so they are stored as a heap object which holds the object tag (actually another pointer to a "map", that's a story for another time), a bitfield that stores sign and length, and then data inline (very similar to how we store strings).

(pointer|0x1) -> [BigIntMap, bitfield (length+sign), digit0, digit1, ...]


Thanks for the explanation! If it functions similarly to strings, which are also immutable, when you do any type of operation it causes a reallocation?


I believe so, in this baseline implementation. In fact, that's also true of operations on doubles (non-Smi) in unoptimized code, which are stored as "heap numbers". Avoiding these allocations is one of the things an optimizing compiler can do (and does do, for doubles).

That said, for short-lived BigInts, the allocations may not be as expensive as you might think, since the garbage collector is generational and short-lived objects won't leave the nursery.


Is this expected to arrive in other browsers JS engines?



Yes, in all engines. You can read the current status update that will be presented at the May TC39 meeting in a few weeks: https://docs.google.com/presentation/d/1Bx13eCgM9Rujs4zNdfO9...


It looks like the link within the presentation to WebKit's issue tracking their implementation is incorrect. It links to https://bugs.webkit.org/show_bug.cgi?id=180731, which is about support for BigInt in the Web Inspector, when it should link to the issue tracking the entire feature at https://bugs.webkit.org/show_bug.cgi?id=179001.


Couldn't they just have taken the BoringSSL bignum code and written some wrapper code for it?


How do they do single digit multiplication?

Most(?) 64-bit CPUs have a 64bit × 64bit → 128bit multiplication instruction; but AFAIK the C++ language doesn't have anything like this.


Both clang and gcc expose it via the __[u]int128_t extension.


That's how they did it indeed. They have also fallback code for when this the double-width integer type is not available.

https://github.com/v8/v8/blob/6.8.137/src/objects/bigint.cc#...


MSVC also can do this with _mul128 intrinsic.


Does anyone know if this will effect how V8 parses JSON?


This is the specification as it relates to JSON: https://tc39.github.io/proposal-bigint/#sec-serializejsonpro...

Basically, serialization will throw if there is a BigInt present. But, the .toJSON() callout is supported if a particular environment wants to opt-in to supporting it in some manner.


What happens when the num_digits field overflows?


On a 64-bit machine, the max value of num_digits is 18,446,744,073,709,551,616. Since each digit is its own 64-bit value, if num_digits reached its maximum value, you'd need to allocate 147,573,952,589,676,412,928 bytes of memory for the digits themselves, or a little over 147,573 petabytes.


In such a case you may have problems to represent the number in memory/storage on a computer anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: