Hacker News new | past | comments | ask | show | jobs | submit login
Lots of bugs in 32-bit x86 Linux entry code (lwn.net)
153 points by swills on Dec 12, 2019 | hide | past | favorite | 107 comments



Relatedly, Ubuntu no longer builds for i386 as a complete architecture: https://lists.ubuntu.com/archives/ubuntu-devel-announce/2019...

They are still building i386 packages for use on 64-bit kernels, for the sake of precompiled 32-bit software (games etc). That's scenario B from Simon McVittie's reply in this thread, and it sounds like that's not affected.


I was able to upgrade from Lubuntu 16.04 to 18.04 for my 32bit IBM Thinkpad T42 using do_release_upgrade. It’s not downloadable on the website, but I think for 18.04 the build is still there.


I think they've stopped building them in 19.10. 18.04 is an LTS supported for 5 years, so users of i386-only CPUs are fine until 2023.


I have a load of 32bit laptops, "I'll put ChromiumOS on them", nope, that's not a thing any more. OK, err Linux? Apparently not. Could put Windows 10 on them I guess.


You could try Debian, which is essentially Ubuntu minus some themes. I doubt Debian will drop i386 support soon (I just searched and couldn't find hints of an end of support of i386)


But then you'll have all these kernel bugs.


What do you mean?


Ubuntu is a Linux, but Linux is not only Ubuntu. I have loads of x86 boxes - Debian, Puppy Linux, whatever. (Win10 has a 32-bit option? Whoa. #TIL)


Lubuntu 18.04 (and most other Ubuntu derivatives like Xubuntu etc.) is only supported for 3 years.


You can download Ubuntu going back to 4.10 here: http://old-releases.ubuntu.com/releases/


Not sure how well that would work for installing things that aren't in the release ISOs...


I remember being an early adopter of 64-bit Linux with my Athlon64 back in the day. Lots of stuff was broken and I got a lot of debate on whether it was any faster or worthwhile at all. It's really cool to see the technology curve go full circle.


My dad bought me an Anthlon64 machine for Christmas and we put it together. Not know much about anything, I remember being pissed off/confused that I didn’t have Windows XP 64-bit in spite of having that Anthlon64 sticker on the front.

A few years later my brother was starting a web server to host a forum for the Digital TV switch-over focused on the Madison, WI market, shout-out and RIP madcityhd.com

Watching him setup Fedora on the box on the floor of his bedroom and using Compiz wobbly-windows was enough to hook me. I stole his install CD and nuked my drive (much to the irritation of my dad, who knew that I would be stealing one of his nights to reinstall Windows when I eventually realized could no longer play counter-strike), It was fun to see it come full-circle a year or two after I was ticked-off about my Anthlon64 not running a 64 bit operating system, when I started trying other distos and realized, “Hey, I can actually use the 64 bit one now!”

Fun times. I wonder how many kids got hooked on Linux by wobbly windows. I know that’s what brought me in, haha.


Wobbly windows were the main reason I switched to Linux. Been using it for everything ever since... Kind of amazing that could be a "killer app".


There was actually a 64bit version of Windows XP which I inflicted on my dad for a while - sounds like you did well to avoid it ;)

https://en.wikipedia.org/wiki/Windows_XP_Professional_x64_Ed...


Worthwhile? We had PAE for a long time, sure, but I think it was fairly obvious when the Athlon 64 appeared that 4GB was not going to cut it, given some people already had 1GB at home...


PAE was always a terrible hack though. It caused so many problems that I'm super glad native 64 bit became the norm shortly before it would have had to become widespread.


It's not a terrible hack, it's just 32-bit virtual addresses with a larger physical address space.

It only caused problems because some kernels used to expect that all physical memory is mapped at all times (and some hardware could only DMA to 32-bit physical addresses, but that's a problem with 64-bit CPUs as well).


At the time 1GB was a ton of memory and few applications would use anywhere near that. A 64-bit kernel took on fairly quickly but actually compiling 64-bit applications took a lot longer. Consensus at the time was the memory overhead of larger pointers outweighed the benefits of extra registers.


> Consensus at the time

I wouldn't call it consensus, just a persistent claim by a large fraction of developers. They finally got to prove their claims with x32 ABI, which was ILP32 in amd64 mode, including the additional registers. Few if any people used it, nor did it show better performance in practice.

Regarding 64-bit support: by the time amd64 chips shipped people had been writing software for the 64-bit Alpha for 10 years, and 7 years for sparc64. IME most open source software was already 64-bit clean and worked well out-of-the-box. Back then the open source community, and especially GNU projects, heavily emphasized platform and hardware portability.

Perhaps the situation on Windows was different. It also didn't help that Windows kept long 32 bits, which had the effect of breaking code that cast between pointers and long (intptr_t didn't come until C99). The 64-bit ABI for all Unix platforms (AFAIU) carried forward the relationship between long and pointer. I don't think I ever recall seeing Unix or open source code casting a pointer type to int, only long; it was Windows software that presumed pointer->int conversions worked.


> Few if any people used it, nor did it show better performance in practice.

We use it for our specialized analysis framework. The real world performance gain is just shy of 30% which is impressive since all it took was a bunch of compiler flags.

Personally I think x32 is a vastly underused ABI. Every application that is unlikely to use more than 2GB should use it. It is literally free performance.


There were a few benchmarks IIRC where it showed a big improvement, the problem was that x32 ABI was horribly botched.


1GB is still a lot of memory. If anything, we've only learned to squander all available memory.


> If anything, we've only learned to squander all available memory.

If I've learned anything in my computer career, it is that this is an evergreen comment. I remember reading Q-Link posts from Commodore 64 users with this complaint when the wasteful C128 came out.


Go ahead and buffer 4k video streams to disk, I dare you :) There's a reason that even modern appliance platforms like Apple/Android TV need multiple GBs of memory, and it ain't bloat.


If we aim high and say we need 50mbps of bitrate for 4k HDR, we still only need 6-7mb/s. So buffering 30 seconds is still less than 200mb.

It is entirely possible to make a modern system that use far less than they do today. If you're gaming it's a different beast, but for normal OS and programs it's certainly bloat.


What are you calling "disk" here? It's not every day that I see a machine with a disk slower than its network connection.


An aside: that's often true in Google's data centres. So much, that someone wrote a system to sort-of swap out memory to other machines' RAM over the network (instead of to disk, which Google doesn't do in general).


That might describe every rpi running off SD.


I dunno, not sure about the latest Pi models but the network performance was horrid on the earlier ones, like 20Mbits/sec tops.


If you're talking RPi1, perhaps. That's 2012, however...and one of the first things to get better in every further model.


The irony is, on the other end the inverse is true - filesystem cache hit ratio on Netflix servers isn’t particularly high.


I wouldn’t be so sure.


This. The basic stuff they did back then didn't magically inflate. Except, it seems from afar that thats exactly what happened. But it didn't, and it doesn't have to look like that from afar.


Consensus? I would call it dissensus actually.


Completely off topic but...

Althlon64 was a swift kick in the nuts for Intel, but AMD really didn't followup until Ryzen. I wonder if they're going to stick around and fight this time.


Sometimes when the swift kick in the nuts is delivered to an 800-pound gorilla, it doesn't matter if you stay around to fight, you're outclassed.

This time, AMD may have stepped up with a steel pipe. We'll see how well they can use it...


> This time, AMD may have stepped up with a steel pipe. We'll see how well they can use it...

:-))))


Well Intel wasn’t exactly sitting idle during the Athlon 64 era. It was just unfortunate that none of their plans worked out.

NetBurst was a bust. IA-64 flopped too. Both were attempts to progress from the P6 architecture - which they felt had reached its limits.

In the end, the market spoke, they want the P6. No one was willing to rewrite software (or even recompile). So back to the P6 in the form of the Pentium M followed by the Core /Core dual series and finally the i3/5/7/9 that we use today - which you would notice hasn’t improved much per core; as mentioned the P6 style design is pretty much tapped out and all they can do is “squeeze blood from stone” for a few percentage improvements here and there.

AMD recently caught up but frankly, they aren’t doing much better. Performance per core is just on par with Intel. Their main selling point is that they made it cheaper by splitting the L3 cache in 2 - the Ryzen is basically 2 quad-cores glued together for better yields.


> which you would notice hasn’t improved much per core

Not sure which data you refer to? I just made a comparison on a single-threaded integer-heavy code between my old Core2 Duo and a Skylake, and just got x16 normalized perf improvement (from 2.5-cycles/byte to 0.15). Same code, same compiler.

So sure, progress have slowed down and they are adding more and more specialized stuff (AVX-512, AES-NI...) but still.

Edited for clarity.


They've got momentum on their side now, but intel won't bow down without a fight. I expect we'll see some real innovation over the next five years.


The real question isn't if Intel will fight back but how. They've been known to innovate shady marketing tactics just as much as their CPU architecture.


I've still got my AMD Athlon 64 3200+ in a bin somewhere. I remember being in sheer awe that i could just drop in a dual core cpu into my nforce4 based mobo. It felt revolutionary at the time.


Back then, AMD kind of had to license the x64 stuff to Intel for cross license reasons. They aren't under any pressure this time to give away anything.


Really? When did that change? I thought they had some sort of in-perpetuity cross licensing agreement going back to when AMD was a second source for x86 for Intel with Government contracts.


Their advantage this time isn't the ISA. There's no "cross licensing" for the silicon design, fab choice, etc.


For several years I ran a 64bit kernel with 32bit userland.

That way I was compatible with games and most other things without having to deal with increased memory usage and compatibility hacks.


There’s an even better target, x32, but sadly it never got much traction and is deprecated: https://en.wikipedia.org/wiki/X32_ABI

The idea is to expose the 64-bit instruction set and registers, but keep memory and pointers per-process 32-bit as most apps use less than 4 GB of memory.


x32 failed due to the horribly botched ABI that was created.


Let's just hope we don't go through the same thing when we switch to 128-bit. We ought to learn from our mistakes when we do this whole thing again.

And yes, before people flare me for this, don't pretend it isn't possible, recall that once upon a time 640K ought to be enough for anybody.


Why would we want to switch to 128 bit addresses?

Do you know how much memory we can address with 64 bit?

(64 bit for addresses was arguably a mistake. 48 bit word size would probably have been better, but doesn't sound as cool.)

Having said that, 128 bit can make sense for certain calculations. So floating point calculations and GPUs support long registers for some of what they are doing.

(And having said that, Google figured out that they don't actually need all that precision when using GPUs for machine learning, and made TPUs with much smaller words.)


Your modern Intel chip only has 48 addressable virtual address bits. The recent Ice Lake processors support 57 bits. I deal daily with Python processes that have "6.1t" in the "resident" column in top. Hitting the 48-bit 256 TiB limit isn't science fiction to me. I can already hit that with a moderate amount of effort (by memory mapping ~100 large kv-stores). It isn't that I need to do it, but I could imagine that someone does.

It is well and proper that Intel rounded up to 64 bits. It should serve us well for the next 40ish years, which is good enough for my professional lifetime at least.


Yes, that's partially why I mentioned the 48 bits.

It's interesting that you mention hitting that with memory mapping in practice. That is a valid concern.


> Why would we want to switch to 128 bit addresses

Predicting the future is difficult. 30 years ago (I am old enough to remember) it was hard to imagine that every household would have 10s of devices connected to the internet. My university VAX for 16 concurrent student users had less memory than required to show the splash screen when today's phone boots.

So if in 30 years the computing paradigm has changed and we directly address memory over the internet? I must admit that in my imagination we have reached a point where growth will slow down. But I have learned that my imagination is not always good enough.


Direct addressing that way seems unfeasible, it would require a extending the IP stack to support direct addressing for it to work, and it already support 128 bit addresses. A far more feasible model is direct addressing through a unique IP, and we do not need a bigger address space for that.

That said, considering the current amounts of data Google holds, I could see the theoretical point about unique addressing every single byte they have. 64bit addressing only allow for a single order of magnitude of growth in that scenario.


The fact that you can put everything, ever into 128-bit means you can easily shove ten thousand 20-terabyte storage devices into a single 128-bit system's memory-mapped I/O space and still have 3 exabytes left over.

No more packet-switched serial storage I/O. You now have first-class ability to ask for any byte anywhere, really really really fast.

Because I/O request speed is now only limited by the the memory controller (which already goes at TB/sec in x86 hardware), a fast storage controller now has the opportunity to optimize and batch requests downstream to storage devices much much more efficiently. Because if the storage devices go at a certain speed but suddenly the addressing infrastructure is A LOT LOT faster, your optimization window just went through the roof and you can coordinate much more effectively.

I forget the exact architecture, but one of IBM's 128-bit boxen already does this. Various random bits of the hardware use MMIO as a first-class addressing strategy. The OS does the rough equivalent of `disk = mmap2(/dev/sda)` at bootup. Maybe this is a z series thing.


What about DMA over IPv6? Using a single 256-bit address, you would be able to address any byte in any IPv6-enabled computer directly.


The calculations are off. 128 bit addressing allow for more than 3.40 x 10^38 addresses, and your storage example is 1.76 x 10^18 bits. That is, if we address the individual bits (we do not), we don't even have a name for the unit denoting that magnitude of space addressable left when using 128 bit addresses.



Thing is, an “address” could be much more than an address. One example is the CHERI hardware capability stuff and its Arm counterpart, Morello: sizeof(void *) is 16 there.

And yeah, getting real world software, such as Postgres or Nginx, to work requires some fixes, but it’s really not that bad.


> Why would we want to switch to 128 bit addresses?

How else are you going to have more than 16 exabytes of RAM?


Exactly, you probably won't.

And I don't mean that in an absolute sense. People might very well get up to those orders of magnitude of RAM; but you are unlikely to have that amount of RAM available to a single processor.

The extra margin of the 64 bit might help with address space randomization, though.


Memory banking is the obvious and ancient solution. PAE is another.


PAE is a kludge because the architects needed to add another tiny page table level to keep 32 bits of virtual space, but didn’t really want to add another level.

The concept of having more physical address bits than virtual bits is reasonable, although it falls apart a bit with virtualization. The idea of having magic registers that fill themselves in for you and can’t be read at all (such as the PAE PDPTR registers) is a bad idea that unfortunately repeats itself in x86 design. Architects: don’t do this.


"Why would we want to switch to 128 bit addresses?" - eru, 2019

I am gonna hold onto your quote, for enjoyment and giggles in 2032 :^)


This is only 13 years from now. I really doubt that people will need 16 EiB by then in a single processor for a single process. Plus by that point you will probably have moved to message passing between distributed systems or something.


My prediction is what we'll mostly see more of the same, with existing trends continuing.

Ie more mobile, and more parallelism on the server side.

But yeah, no 16 EiB on a single processor.

(Though we might see people memory map crazy amounts, without ever actually accessing all of them, of course.)


Nobody ever thought that, and mindlessly repeating just means we don't learn anything.


Hah!

> And the developers in question should have an appropriate degree of nostalgic adoration of segments, gates, and other delights from the i386 era.


> We need real CI resources

It would be easy to boot the 32-bit kernel in a VM as part of a CI pipeline and then run some automated tests to catch issues like this.

(Unless by "real" he means non-virtualised? Is there anything about these bugs that would only reproduce on bare metal?)


I think the issue is not the hardware needed but simply the cost of the extra compute. Linux doesn't have an "official" CI server but relies on third parties running it, and none of them want to spend $ on compute for an architecture they will never use.


The hardware cost should be negligible. The relevant cost is labor.


Not always. Cache speculation bugs for example I dont think are covered by emulations. (At least not explicitly. If the mmu model is the same and transparent enough I suppose those could 'pass through' but only if they're implemented on the same plat form.


You can certainly exploit Meltdown from a VM. The hypervisor tries hard to stay out of the way, and on a properly configured VM the vmexits should be few enough while running compute-bound code that microarchitectural side channels are very well exploitable.


For clarity, in this context I think @ vetrom was talking about emulation, not a virtual machine. A virtual machine makes use of the native instruction set (relying on the hardware) to create a machine within a machine and naturally can only create virtual machines of identical hardware capabilities (or a subset thereof). Emulation does all of that in software and could emulate even a completely different architecture.


Though to get fast emulation, you can re-use techniques from virtual machines (and vice versa).

Btw, emulation is how you can, in principle, run Linux on an 80286. (Normally you'd need a 80386 at least to get memory protection.)

Of course, Linux on the 80286 is gonna be even slower than Linux in your browser. (https://bellard.org/jslinux/)

Using the emulation idea you can go almost arbitrarily primitive. But the 286 is a nice target, because it was a sometimes requested feature in Linux's past.


Okay, but the original comment said VM and almost every desktop and server has a processor that can natively run x86 in a VM.


"I strongly suspect that there is at least one bug left."

:)


That probably would be a valid addition to any bug fix in a large project, to be honest.


That was a cheeky statement by the author if I've ever seen one.


Maybe because nobody uses 32-bit x86 anymore?


No, that's the i386 architecture, which is still well supported AFAICT. x86_32 is a funny architecture that requires 64 bit chips but uses 32 bit pointers.

http://www.h-online.com/open/features/Kernel-Log-x32-ABI-get...


This post is about i386 / x86_32. What you're referring to is "x32," a different thing entirely, which uses the x86-64 instruction set. You can tell this post is not about "x32" because it talks about "segments, gates, and other delights from the i386 era," which don't exist on the x86-64 architecture regardless of how long your pointers are, and it says things are "fine on a 64-bit kernel" (x32 uses a 64-bit kernel, it's a userspace ABI only).


No, see the source from Linux kernel:

https://github.com/torvalds/linux/blob/master/arch/x86/Kconf...

For all x86 processors, now there is a name "X86_32" which "depends on !64BIT" (i.e. 64BIT flag has to be off).

and there is a name "X86_64" which "depends on 64BIT" (i.e. 64BIT flag being on).

Under new convention X86 is a common prefix for both 32 and 64-bit kernels for x86 processors, and the suffix _32 means the kernel uses only 32bit instructions, and _64 that it uses 64bit instructions.

And also, there's handling of an old arch name "i386" which is recognized to mean 64BIT is off and if only "x86" is seen as arch name then there is a prompt that asks 64BIT yes or no. It also notes that the old name used for kernel with 64BIT off was "i386" and the old name used for kernel with 64BIT on was "x86_64".

Finally, the support for X32 ABI (allowing running executables which use 64-bit instructions but only "short" 32bit pointers in 64-bit kernel https://en.wikipedia.org/wiki/X32_ABI) is enabled in my 64-bit distro:

    ~$ sudo grep -P G_X86_X?\\d{2}= /boot/config-$(uname -r)
    CONFIG_X86_64=y
    CONFIG_X86_X32=y
The first line means this is a 64bit kernel for x86 processors, the second that X32 support is enabled on that kernel.

In short, there are two kinds of X86 kernels:

X86_32 -- formerly known as i386

X86_64 -- formerly known as x86_64

and an option for existence of X32 ABI in X86_64 called CONFIG_X86_X32.


Beatuiful, high-value reply. Thank you for making the connection to upstream and unpacking things with good, concise exposition.


No, that’s “x32”. “x86_32” is not an official name for anything, but based on the contents of the post, I believe Andy Lutomirski is using it to refer to 32-bit x86, aka i386.


And I believe he meant i686. Latest Intel CPU's with 32bit pointers.


x86-64 procesors can generally be run in IA32 compatibility mode. And intel still makes some IA32-only chips for embedded.


Your comment should be the top comment. I had no clue i386 and x86_32 where different. I though the article was saying 32bit linux is broken.... but no some strange "alternative" mode on linux is broken.... who cares.


The comment is wrong. It is about i386.


There were 32-bit netbooks being sold the last time I looked in 2015.


Unfortunately, upstream doesn't really notice if 32-bit x86 breaks. This isn't just a kernel problem; glibc managed to push out a release back in 2017 that broke memchr on 32-bit Atom in a way that tended to cause programs calling it to segfault. It turns out that amongst a few other things, including Python, the glibc build process itself relies on memchr working... I don't think they actually tested the code path in question after modifying it.


Of course it was tested [on the developers' x86_64 laptop(s)]!


Sure, but the ones that pay the bills are servers and companies which don't care one bit about i386.


Agreed, however open source has always relied on enthusiasts for consumer-grade hardware support.


The really fun part was when they launched the first 64 bit Atom CPUs (Bay Trail) and then a bunch of them got shipped with only 32 bit UEFI and cannot boot in 64 bit mode.


They have to _boot_ with a 32-bit UEFI bootloader, but can run a 64-bit kernel and userland. Fedora Linux even allows for this while keeping secure boot on; Debian supports it as well but the interaction with secure boot is buggy, so you need to turn it off. Not sure about other distros, however.


Early 64-bit Intel Macs also shipped with a 32-bit UEFI limitation, but they were able to boot into a 64-bit system.


Jesus was closer to dinosaur times than we are to 2015.


Casual computer users don't buy new computers each year. Pretty sure set top boxes and routers with 32-bit x86 processors have been sold since then, too.


Are you Him?


My old Atom laptops still run perfectly fine. Thanks, Debian.


If only... I have been hoping for a native x64 version of Visual Studio for almost a decade, and signs do not look good for that ever coming to fruition. As it is, loading up solutions that consume nearly 2GB of RAM are still bringing the IDE to its knees and making it thrash and page.


Edit: I am possibly wrong here, and confusing this with an issue that was reported earlier


Url changed from https://lwn.net/Articles/805539/, which points to this.

Submitted title was "Essentially no upstream development resources dedicated to x86_32 Linux"


Thank you!


Oh, the freeloading. How much more CI do we need...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: