My int is too big

rwmj · on July 15, 2016

Some of this comes about from the choice of LP64 (32 bit ints on 64 bit platforms[0]). There seems no sensible reason why ints are 32 bit on a 64 bit platform, it only introduces inefficiency[1] and these kinds of problems.

The one argument I've seen is there would be no 32 bit type (since the next smallest integer, ie. short, would be 16 bit), but a compiler could easily have a __builtin_int32 mapped to C99 standard int32_t etc.

In C code that we write, we have a rule that every use of plain 'int' must be justified - it's a "code smell". Most places where 'int' is used should be replaced with 'size_t' (if used as an index to an array), or a C99 int32_t etc.

[0] https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_m...

[1] https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759...

haberman · on July 15, 2016

> There seems no sensible reason why ints are 32 bit on a 64 bit platform, it only introduces inefficiency[1]

I don't think that is a fair summary of your link at all.

First of all, the conclusion of that link's analysis is that there is no performance penalty in this case, because the compiler exploits undefined behavior to eliminate any penalty it might otherwise have caused.

Secondly, 32-bit instructions on x64 are smaller than 64-bit instructions. For example "inc eax" is a 2-byte instruction, but "inc rax" is 3 bytes. This is because 64-bit instructions need a REX prefix.

Also, as others have mentioned, there is the issue of RAM usage.

I'm not arguing that 32-bit integers are superior overall. I'm just saying that 64-bit integers aren't strictly better in terms of efficiency. For cases that will never need more than 32 bits, a 32-bit integer is probably better.

wongarsu · on July 15, 2016

>There seems no sensible reason why ints are 32 bit on a 64 bit platform

The reasoning I have always heard is to keep RAM usage constant. If you have a standarts complient C program that uses primarily int as datatype, then with 64 bit ints the 64 bit version of the program uses twice as much memory as the 32 bit version of the same program.

I'm not sure I would call it a great reason, but 32 bit ints may have been the pragmatic thing to do.

caf · on July 16, 2016

In C code that we write, we have a rule that every use of plain 'int' must be justified - it's a "code smell". Most places where 'int' is used should be replaced with 'size_t' (if used as an index to an array), or a C99 int32_t etc.

I completely disagree (except with the size_t bit - I agree with that). In my view, it's usages of int32_t and similar that should be justified: why you do you need exactly 32 bits and 2s complement representation - why does at least 32 bits and unspecific representation not suffice?

cronjobber · on July 16, 2016

> why does at least 32 bits and unspecific representation not suffice?

Because that doesn't help you. Static reasoning becomes harder. Dynamic checks (pretending the exact size is unknown) are possible but .. the exact size is statically known (just not to the programmer). It is a waste.

The sentiment is right, but the solution is metaprogramming, not unhelpfully underspecified types.

rwmj · on July 16, 2016

I didn't say that every use of 'int' should be replaced by 'int32_t'. In fact most uses are replaced with 'size_t' because you're counting something and should be worrying about overflow/wraparound.

Also I didn't say that you should never use 'int'. Yesterday I had to use 'int' because the sscanf %n pattern requires it, ironically an example of exactly what I'm talking about -- it won't be able to handle strings longer than ~2^31 bytes. But in this case the string comes from a place where we know it cannot exceed this limit, so we're OK.

Too · on July 16, 2016

Depends on your application I would say, if you build a database you want it to be able to store more automatically when hardware gets better. If you store physical measurements from sensors you know for sure that this sensor can only give values within a certain range, there you want to ensure you have the range required and it's also redundant to use a larger variable. I would still prefer int_least32 over just int though to be more explicit.

nullc · on July 15, 2016

> There seems no sensible reason why ints are 32 bit on a 64 bit platform

Try storing a few billion of them, they're half the size.

> In C code that we write, we have a rule that every use of plain 'int' must be justified - it's a "code smell"

Current MISRA C requires this.

jakub_h · on July 15, 2016

You could also pack three 20-bit numbers into a 64-bit word, if you needed it. Then again, the C family languages are kind of weird about that.

wolf550e · on July 16, 2016

Unicode code points are 21 bits long. I wonder if something out there stores strings by packing 3 code points per uint64_t.

charleslmunger · on July 16, 2016

Maybe, but almost everything outside the basic multilingual plane is so infrequently used that it's a waste of space to use a fixed-width format.

But I could see it being a useful replacement for UTF-32... If you need to be able to calculate code points in constant time, and you don't mind wasting one bit out of every 64. Self synchronizing, constant time code point count from buffer size, and close to as space efficient as UTF-16, but with none of the surrogate mess.

tossaway1 · on July 15, 2016

Well, 32 bit ints dramatically simplified the porting of some 32-bit software to 64-bit since it didn't cause large arrays of integers to suddenly double in size.

tracker1 · on July 16, 2016

not to mention better wire protocol and driver compatibility without a rewrite.

gleenn · on July 15, 2016

Whenever I get back into Clojure and use type hinting, I usually forget that Clojure doesn't allow ints, only longs. Kind of a strange choice given the JVM supports signed variants of the common word lengths. But then I read articles like this.

I'm a pretty high-level language user (read: I don't do C, et al), so when I read stuff like this it always makes me cringe and be happy Hickey decided to keep it simple. 64 bits, everywhere! (God save us when things go to 128)

lmm · on July 15, 2016

Integer overflow on the JVM has defined behaviour, so it's not the security issue that it is in C.

caf · on July 16, 2016

It's wasn't the undefined-ness of the overflow that was the problem here - the bugs would have been exactly the same if 2s complement wrapping overflow had been mandated.

So these would still have been bugs under the JVM too - crashing from an out-of-bound array access or failing to allocate a stupendously large number of objects. The JVM would stop them from turning into potential arbitrary code execution, though (as some, but not all, of these bugs did).

marcosdumay · on July 15, 2016

> God save us when things go to 128

That will take a while. And even then, people will likely keep using 64 bits for most things.

IshKebab · on July 15, 2016

I doubt we'll ever have more than 2^64 bytes of RAM, so that will probably never happen.

(Please don't make the obvious reply.)

kryptiskt · on July 15, 2016

IBM System/38 and descendants have had 128-bit pointers since the mid-70's.

cpeterso · on July 15, 2016

Larger address spaces are useful, even if you don't have 2^64 bytes of physical RAM. For example, most 32-bit Firefox crashes are OOM crashes due to virtual address space exhaustion.

marcosdumay · on July 15, 2016

Even if we have more than 2^64 bytes (that's 16 EB, not really an astronomical amount) of RAM, it's not obvious that the optimum way to use them will be by placing everything at the same address space.

simcop2387 · on July 15, 2016

Yea i'd fully expect something like PAE to be more useful and common. Even if something is using more than 16EB of ram, i can't imagine it'll be a single process and instead it'll be larger distributed system instead, leaving each process smaller than that. Just filling up 16 EB of ram right now would take:

    < simcop2387> farnsworth: eta[16 EiB, 19200 MiB/s] - #now# -> "years"
    < farnsworth> simcop2387:  2177.63532452879 years

based on marketing info on DDR4. So at the very least it's going to take a while before we even have to worry about a max of 16EiB in address space.

jdright · on July 15, 2016

Someone said that about 64 bits.

mikeash · on July 15, 2016

And 32 bits! Probably 16 bits too....

But, it takes longer each time. With exponential growth (e.g. Moore's Law) then you use up bits linearly. If you double every two years, then you use one bit every two years. That means we have about 32 years from when 32-bit became inadequate to when 64-bit becomes inadequate.

_ph_ · on July 15, 2016

And with Moores Law running out of steam (isn't physics a bitch!) this might not even ever happen. At least not in the sense which drove us up to 64 bit. There are of course plenty of reasons why we would like to do math with 128 or more bits, but for memory addresses, 64 bit are a plenty.

Dylan16807 · on July 16, 2016

64 bits are fine until you want to share address space over a whole bunch of nodes.

We currently have supercomputers with tens of gigabytes per processor and tens of thousands of processors. Already up to 51 bits. We're also working out fast persistent storage like NVMe and crosspoint that allow us to attach a board with terabytes where we used to attach gigabytes of DRAM. Combine those two and you can max out a 64 bit address space with current technology, let alone tomorrow's technology.

LukeShu · on July 16, 2016

Remember that each bit doubles the space. You would need 50,000 processors with 350 terabytes of memory each to hit 64 bits.

Then again, super computers aren't really running the same architecture as the rest of us.

Dylan16807 · on July 16, 2016

So let's crunch the numbers a bit. There are already supercomputers with 40k CPUs, let's go all out and plan to use 80k CPUs. 8TB NVMe cards have been out for a couple years, we should be able to get 16TB cards by the time we're assembling. We should be able to fit four per CPU with a compact design. That gets us to 5EB. That's dangerously close to the 16EB limit, and that's with 2016/2017 technology. And the point where you start having trouble isn't 16EB, it's under 8EB, when you stop being able to use the normal user/kernel split or have all the memory mapped into kernel space.

> Then again, super computers aren't really running the same architecture as the rest of us.

Sure, but there are pushes to make datacenters act more like a giant computer, so there are niceties to a shared address space that aren't restricted to traditional supercomputers.

gaius · on July 16, 2016

You say that but SSI is bound to become fashionable again at some point.

wtbob · on July 16, 2016

> 64 bits, everywhere!

What if you need to represent 2^64? Bignums everywhere!

mytummyhertz · on July 15, 2016

i actually realized that the original fix to the int overflow was UB, not tim.

as usual, tim gets all the credit ;)

(i mean, he did send the email)

jdironman · on July 15, 2016

I have to say this article is writing about topics leagues ahead of the average computer user but in a way they could (well...most of them.) understand.