Hacker News new | past | comments | ask | show | jobs | submit login

Was actually pretty good read. How far are we from 128-bit architecture ?



IBM's AS/400 (and all of its renames) is a 128bit architecture. The huge address space is beneficial for implementing capability security on memory itself, plus using single level store for the whole system (addresses span RAM and secondary storage like disks, NVMe, etc)


One of my mentors is one of the IBM engineers who developed the original AS/400’s capability-based security architecture way back in the early eighties. I can confirm that (according to her) the 128-bit addressing was indeed a very convenient manner of implementing the system. However, nobody ever expected (nor expects, I suspect) that those addresses will ever be used to actually access that amount of memory. It’s a truly astronomical amount of memory, on the order of grains-of-sand-on-countless-planets...


To put it another way it's not just enough to count the grains on sand on a beach, it's enough to count all the atoms in all the grains of sand on planet Earth. Give or take a few orders of magnitude[1].

[1]https://www.explainxkcd.com/wiki/index.php/2205:_Types_of_Ap...


$ echo '2^128' | bc | rev | sed 's/.../&,/g;s/,$//' | rev

340,282,366,920,938,463,463,374,607,431,768,211,456

...rrright.

Actually let me line up the https://en.wikipedia.org/wiki/Orders_of_magnitude_(data)

  340,282,366,920,938,463,463,374,607,431,768,211,456
          ..  ??  YB  ZB  EB  PB  TB  GB  MB  KB
(There's no meaningful notation of size here - the denotations are just to show just how much data you can fit in 128 bits of space.)

Blinks a few times

Ultimately fails to mentally grasp and make useful sense of the number due to its sheer size

As an aside, apparently DNA can store a few TB/PB (I don't remember which). The age of optimizing for individual bytes as a routine part of "good programming" is definitely over, I guess. (I realize this discussion is about address space and not capacity, but still)


> plus using single level store for the whole system (addresses span RAM and secondary storage like disks, NVMe, etc)

wouldn't 64bit (16 Exabytes theoretical max) already allow for this? are there any projects in that direction?


We're still VERY far away to utilize 6 Exabytes of addressable space 64-bit offers us. If we talk about RAM, CPU architecture limits it to 4PB.


What for? AVX512 instructions can work on, well, 512bits of data at a time, while 64bits worth of address lines offer way more ram than available today. What's your use case?


AVX512 is not comparable to 128-bit native. AVX/SSE are split into lanes of maximum 64-bits. You can not compute a 128-bit result only multiple 64 bit ones in parallel.


That's a microarchitectural detail, point is you can view modern CPUs as more than 64bit wide, at least databus-wise.


By that logic the Pentium 3 was a 128-bit CPU. Vector width isn’t how these things are measured because its much harder to make a 128-bit ALU then add two 64-bit ones.


It's not just a detail if you are looking for hardware acceleration on floating point operations on quads (float128 type). AFAIK, nobody has a hardware quad FPU, but there are certainly applications for one. I know that things like big integer arithmetic could greatly be accelerated by a 128-bit computer.


I don't think big integer could be greatly accelerated by 128-bit computer. If a 128-bit add takes two cycles of latency, or it causes the frequency to drop (since you need to drive a longer critical path in the ALU in terms of gate delay, which means you need longer cycle times), then you're going to lose a lot in any code that isn't directly involved in the computation, such as loading memory values.

Furthermore, the upside is only at best 2x. It's likely to be worse, because you're still going to be throttled by waiting for memory loads and stores. Knowing that we have 2 64-bit adders available to us to use each cycle, we can still do 128-bit additions at full throughput, although it requires slightly more latency for the carry propagation.

Hardware quad-precision floating point is a more useful scalar 128-bit value to support.


I agree hardware quad is more useful. The big int problem I was talking about would benefit from hardware quad but not hardware 128-bit ALU. The big int problem I have a bit of knowledge of is squaring for the Lucas-Lehmer algorithm to find huge primes (Mersenne primes). The best algorithm in this space is the IBDWT (irrational base discrete weighted transform). You perform an autoconvolution (compute the FFT, square each term in the frequency domain, and then take the IFFT). You want the FFT lengths to be as short as possible, since FFT is an O(N log N) algorithm. Quads would let you use shorter FFTs since you have more bits available for each element.

Even though it is a big int problem, floating point is used. Their are integer FFT algoritms that are usable (NTTs), but they are much slower than floating point FFTs on modern CPUs.


> AFAIK, nobody has a hardware quad FPU

Power9 has for a few years.


Today I learned... That's awesome, I'd love to try one out in the cloud sometime and get benchmarks.


No particular use case. I think its one of these things when they first built machine then all of sudden bunch of different uses case are found.


Why do I need to build a 128-bit machine to imagine a usecase for it? What you refer to is not applicable to usecases, only to business cases. We first had to build broadband internet before video streaming websites became a viable business model, but surely someone thought of video streaming during the dialup era?


If I recall correctly, Apple was shipping Mac Pros with 128-bit PowerPC processors around two decades ago.


I think they had AltiVec instructions operating on 128 bits, but that's somewhat like Intel SSE - it's not referring to the overall address bus.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: