Really cool solution! One question: maybe I missed it, but there's no *technical...

omelancon · 2024-11-27T18:27:15 1732732035

Hi, I'm one of the authors of the paper. Thanks for your questions and comments!

There are many reasons why 3-bit tags work well in practice. Importantly, it allows aligning heap objects on 64-bit machine words. Dereferencing a tagged pointer can then be done in a single machine instruction, a MOV offset by the tag.

One of our goals is to make self-tagging as straightforward to implement in existing systems as possible. Thus requiring to move away from the ubiquitous 3-bit tag scheme is definitely a no-go.

Another goal is performance, obviously. It turns out that if the transformation from self-tagged float to IEEE754 float requires more than a few (2-3) machine instructions, it is no longer as advantageous. Thus the choice of tags 000, 011 and 100, which encoding/decoding is a single bitwise rotation.

Also keep in mind that assigning more tags to self-tagging to capture floats that are never used in practice just adds a strain on other objects. That's why we include a usage analysis of floats to guide tag selection in our paper.

In fact, we are currently working on an improved version that uses a single tag to capture all "useful" values. Capturing 3/8 of floats seems unnecessary, especially since one of the tag is only meant to capture +-0.0. The trick is to rotate the tag to bits 2-3-4 of the exponent instead of 1-2-3 and add an offset to the exponent to "shift" the range of captured values.

But in the end, it feels like we are barely scratching the surface and that a lot of fine-tuning can still be done by toying with tag placement and transformation. But I think the next step is a quality over quantity improvement: capture less floats but capture the right ones.

vanderZwan · 2024-11-27T19:26:13 1732735573

Thank you for the explanation! I was not aware of the context behind the choice of three-bit tags.

> The trick is to rotate the tag to bits 2-3-4 of the exponent instead of 1-2-3 and add an offset to the exponent to "shift" the range of captured values.

Maybe I misunderstand, but isn't that a similar idea to what I just described? Adding an offset to "rotate" the ranges of the exponent by a segment, putting the one with zero in the high side? The main difference being that you stick to the upper four bits of the exponent, and that I suggested using one of the upper three bit sequences to mark bits 4-5-6-7 as tag bits? (I mistakenly included the sign bit before and claimed this unboxes 15/16ths of all doubles, it's actually 7/8ths) Which probably has consequences for minimizing instructions, like you mentioned.

> But I think the next step is a quality over quantity improvement: capture less floats but capture the right ones.

I suspect that ensuring NaN and Infinity are in there will be crucial to avoid performance cliffs in certain types of code; I have seen production code where it is known that initiated values are always finite, so either of those two are then used as ways to "tag" a float value as "missing" or something to that degree.

Anyway, looking forward to future results of your research!

omelancon · 2024-11-28T16:23:12 1732810992

> Maybe I misunderstand, but isn't that a similar idea to what I just described? Adding an offset to "rotate" the ranges of the exponent by a segment...

Yes it is similar. It seems to me that there really isn't that many useful operations that can be applied to the exponent beside adding an offset. But that's only a suspicion, do not take my word for it.

> I suspect that ensuring NaN and Infinity are in there will be crucial to avoid performance cliffs...

This is a reasonable assumption. There are in fact ways to rotate and add an offset such that the exponent can overflows/underflows to capture exponents 0 and 0x7ff (for inf and nan) with a single well-positioned tag. Making it work in practice is not as simple, but we are working on it.

vanderZwan · 2024-11-29T11:02:36 1732878156

Yeah, this is one of those situations where speculating about behavior one thing, but well-designed benchmarks and tests may very well give some surprising results. I am very much looking forward to the follow-up paper where you'll share your results :). Good luck with the research to you and you colleagues!

jacobp100 · 2024-11-28T22:45:02 1732833902

Just to add some JS engine info - every engine stores numbers as 32 bit integers if possible, which is usually done by tagging a pointer. Also, JSC needs pointers to be represented exactly as-is, because it will scan the stack for anything that looks like a pointer, and retain it when running the GC

funny_falcon · 2024-11-28T17:26:37 1732814797

Just correction: CRuby uses “Float Self-Tagging” for years.

refulgentis · 2024-11-28T21:45:40 1732830340

This is your 4th comment claiming this: it's not true.

You're right that Ruby uses tags, ex. Objective-C does also and has for a while.

The innovation here is its a tag without the tag bits. That's why its self-tagging, not tagging.

SonOfLilit · 2024-11-29T00:18:00 1732839480

In another comment, this user links a CRuby commit that they claim adds it. It seems legit.

Linked commit contains code for rotating tagged floats so bits 60..62 go to the least significant positions, and a comment about a range of unboxed floats between 1.7...e-77 and 1.7...e77, plus special casing 0.0

e.g. this excerpt:

    #if USE_FLONUM
    #define RUBY_BIT_ROTL(v, n) (((v) << (n)) | ((v) >> ((sizeof(v) * 8) - n)))
    #define RUBY_BIT_ROTR(v, n) (((v) >> (n)) | ((v) << ((sizeof(v) * 8) - n)))
    
    static inline double
    rb_float_value(VALUE v)
    {
      if (FLONUM_P(v)) {
 if (v == (VALUE)0x8000000000000002) {
     return 0.0;
 }
 else {
     union {
  double d;
  VALUE v;
     } t;

     VALUE b63 = (v >> 63);
     /* e: xx1... -> 011... */
     /*    xx0... -> 100... */
     /*      ^b63           */
     t.v = RUBY_BIT_ROTR(((b63 ^ 1) << 1) | b63 | (v & ~0x03), 3);
     return t.d;
 }
      }
      else {
 return ((struct RFloat *)v)->float_value;
      }
    }

vanderZwan · 2024-11-27T15:52:59 1732722779

Also, funny enough, this idea and variations of it look surprisingly easy to implement in JS itself using typed arrays. I won't any claims of impact on performance though...

You need a Float64Array for the values (plus a Uint32Array/Uint8Array using the same buffer to be able to manipulate the integer bits, which technically also requires knowledge of endianness since it's technically not in the JS spec and some funny mobile hardware still exists out there). Mocking a heap is easy enough: the "heap" can be a plain array, the "pointers" indices into the array. Using a Map<value, index> would you "intern" duplicate values (i.e. ensure doubles outside the unboxed range are only "heap" allocated once).