Overview of cross-architecture portability problems

denotational · 2024-09-23T17:26:52 1727112412

Missed my favourite one: differences in the memory consistency model.

If you’re using stdatomic and the like correctly, then the library and compiler writers have your back, but if you aren’t (e.g. using relaxed ordering when acquire/release is required), or if you’re rolling your own synchronisation primitives (don’t do this unless you know what you’re doing!), then you’re going to have a bad day.

twoodfin · 2024-09-24T17:55:22 1727200522

When Alpha was a vaguely viable platform, it was fantastically helpful to throw it into a build farm to find these kinds of issues.

johnklos · 2024-09-23T18:25:54 1727115954

One of the larger problems is purely social. Some people unnecessarily resist the idea that code can run on something other than x86 (and perhaps now Arm).

Interestingly, some apologists say that maintaining portability in code is a hinderance that costs time and money, as though the profit of a company or the productivity of a programmer will be dramatically affected if they need to program without making assumptions about the underlying architecture. In reality, writing without those assumptions usually makes code better, with fewer edge cases.

It'd be nice if people wouldn't go out of their way to fight against portability, but some people do :(

gwbas1c · 2024-09-23T20:09:34 1727122174

I argue the opposite: It's important to know the purpose of the program, including the computer that the program will run on.

I can understand polite disagreements about target architectures: You might write a windows desktop program and disagree with someone if you should support 32-bit or be 64-bit only.

BUT: Outside of x64 and ARM, your program (Windows Desktop) is highly unlikely to run on any other instruction set.

The reality is that it's very time consuming and labor-intense to ship correctly-working software. Choosing and limiting your target architecture is an important part of that decision. Either you increase development cost/time (and risk not getting a return on your investment,) or you ship something buggy and risk your reputation / support costs.

johnklos · 2024-09-24T01:55:12 1727142912

> windows

Everything you write makes sense in the context of Windows, but only for Windows. That's certainly not true for open source non-Windows specific software.

gwbas1c · 2024-09-24T15:36:50 1727192210

> That's certainly not true for open source non-Windows specific software.

Well, there's a lot of nuance in such a statement.

If you're developing an open-source library; then it's up to the vendor who uses your software to make sure that they test according to the intended purpose of the program.

If you're developing an open-source program for end users, then it's also "important to know the purpose of the program, including the computer that the program will run on." This means being very clear, when you offer the program to end users, your level of testing / support.

For example, in one of my programs: https://github.com/GWBasic/soft_matrix?tab=readme-ov-file#su...

> I currently develop on Mac. Soft Matrix successfully runs on both Intel and Apple silicon. I have not tested on Windows or Linux yet; but I am optimistic that Soft Matrix will build and run on those platforms.

Now, for Soft Matrix I developed a Rust library for reading / writing .wav files. (https://github.com/GWBasic/wave_stream) I've never made any effort to test on anything other than Mac (Apple Silicon / Intel.) If someone wants to ship software using 32-bit Linux on PowerPC, it's their responsibility to test their product. Wave_stream is just a hobby project of mine.

saurik · 2024-09-24T16:29:11 1727195351

I actually think that's the point, though: it isn't about testing, it is about not going out of your way to make assumptions. Clearly, your code isn't riddled with #ifdef macOS, or you wouldn't be saying it probably works on Windows. And, as you coded it in Rust using Cargo, as annoying as I may sometimes find their cross compilation behavior, it by and large works: if I take your code I bet I can just ask it to compile for something random, and I'll see what happens... it might even work, and you don't actively know of obvious reason why it can't.

In contrast, a lot of libraries I use--hell: despite myself caring about this, I have become part of the problem (though I'd argue in my defense it is largely because of my dependencies making it hard to even build otherwise)--come with a bespoke build system that only even is able to compile for a handful of targets and where the code is riddled with "ifdef intel elif arm else error" / "ifdef windows elif linux elif apple else error". I can pretty much tell you: my code isn't going to work on anything I didn't develop it for, and would require massive intervention to support something even as trivial sounding as FreeBSD, because none of the parties involved in its code really cared about making it work.

So like, while you might be arguing against the premise, in practice, you could be the poster-child for the movement ;P. You developed your software using an ecosystem that values portability, you didn't go out of your way to undermine such, and you aren't actively poo-pooing people who might want to even try it on other systems (as say every major project from Google, where they either consider a platform to be fully tested and supported or something no one should discuss or even try to fix) you're merely warning them it wasn't tested, and I frankly bet if I showed up with a patch that fixed some unknown issue compiling it for RISC-V you'd even merge the change ;P.

gwbas1c · 2024-09-24T21:32:22 1727213542

I spent over a decade developing cross-platform software.

Even now (professionally) I'm getting back into it. My current job is in C#, and we're a little late to the .Net 8 game. It's much cheaper to host .Net 8 on Linux than Windows, so now we're getting into it.

Turns out timezone names are different between Windows and Linux; and our code uses timezone names.

TZubiri · 2024-09-23T20:24:57 1727123097

Also, develop on your target architecture. I see too many people trying to use their arm macs to develop locally. Just ssh into an x86 thing

rqtwteye · 2024-09-23T20:03:01 1727121781

I am for portability but if you decide to go that route, you need to do a ton of testing on different platforms to make sure it really works. That definitely costs time and effort. There are a lot of ways to do it wrong.

DanielHB · 2024-09-24T12:38:08 1727181488

Even setting up your automated tests on multiple platforms can cost a ton of money on CI infra alone. Besides increased server costs some architectures cost a lot more to rent on cloud-CI systems (cough macOS) and setting up the local infra to run tests on-prem can be a huge endeavor and each new platform added has fixed costs.

Nevermind your cloud CI randomly dropping support for one of your archs.

JodieBenitez · 2024-09-24T05:17:27 1727155047

> In reality, writing without those assumptions usually makes code better, with fewer edge cases.

Writing code for portability is usually the easy part. Testing your software on multiple architectures is a lot more work though.

solatic · 2024-09-24T05:40:17 1727156417

And more to the point, just because you write for portability doesn't guarantee that it will work. If you're not able to test on multiple architectures, then you're not really able to assert that your code is actually portable.

There's just no business case for portability. Writing for portability makes your code more difficult to read (because you have to include all these edge cases) and maintain, and the business needs to pay for hardware for the additional architectures to test on. For what? Some nebulous argument that it'll help catch exploits? Most InfoSec departments have far better bang-for-the-buck projects in their backlog to work on.

johnklos · 2024-09-25T20:44:18 1727297058

Who said anything about asserting that your code is actually portable?

There are two extremes here, and not being one doesn't mean the other:

1) Writing code with bad coding practices where alignment, endianness, word size, platform assumptions, et cetera, are all baked in unapologetically, saying that doing anything else would incur financial cost

2) Writing code that's as portable as possible, setting up CI and testing on multiple platforms and architectures, and asserting that said code is actually portable

One can simply write code that's as portable as possible and let others test and/or say when there's an issue. That's quite common.

As someone who compiles software on architectures that a good number of people don't even know exist, I think it's absolutely ridiculous to suggest that portability awareness without multi-platform and multi-architecture CI and testing isn't worthwhile.

hanikesn · 2024-09-23T16:01:17 1727107277

Looks like non 4k page sizes are missing, which tripped out some software running on asahi Linux.

saagarjha · 2024-09-23T18:08:37 1727114917

Most software doesn’t really care. This list only has common issues (though the issues it goes over seem more like problems from 2010…)

magicalhippo · 2024-09-23T19:26:06 1727119566

One I recall, working on a C++ program that we distributed to Windows, Linux and PowerPC OSX at the time, was how some platforms had memory zero-initialized by the OS memory manager, and some didn't.

Our code didn't mean to take advantage of this, but it sometimes meant buggy code would appear fine on one platform as pointers in structures would be zeroed out, but crash on others where they weren't.

As I recall, it was mostly that and the endianess that caused the most grief. Not that there was many issues at all since we used Qt and Boost...

microtherion · 2024-09-24T23:15:38 1727219738

On many platforms this can depend on the size and timing of the allocation. As soon as a new PAGE has to be allocated (either because the allocation is large, or because the rest of the heap became full), it is zero initialized (pretty sure all OSes do this, to prevent cross-process spying), but if it's just a malloc inside an existing page, it does not clear previous cruft.

Archit3ch · 2024-09-23T17:57:35 1727114255

The fun of discovering size_t is defined differently on Apple Silicon.

saagarjha · 2024-09-23T18:07:45 1727114865

Differently from what?

Archit3ch · 2024-09-24T13:59:33 1727186373

Not sure. This was upstream code written by a person without a Mac.

jrtc27 · 2024-09-24T14:53:08 1727189588

64-bit size_t is the same on Darwin as GNU/Linux (unsigned long), but uint64_t is not (Darwin defines it as unsigned long long, not unsigned long). Perhaps this is what you remember?

AStonesThrow · 2024-09-24T00:12:12 1727136732

In my misspent college years, I was coding game servers in C. A new iteration came about, and the project lead had coded it on a VAX/BSD system, where I had no access.

Under whatever C implementation on VAX/BSD, a NULL pointer dereference returned "0". Yep, you read that right. There was no segfault, no error condition, just a simple nul-for-null!

This was all fine and dandy until he handed it over for me to port to SunOS, Ultrix, NeXT Mach BSD, etc. (Interesting times!)

I honestly didn't find a second implementation or architecture where NULL-dereference was OK. Whatever the standards at the time, we were on the cusp of ANSI-compliance and everyone was adopting gcc. Segmentation faults were "handled" by immediately kicking off 1-255 players, and crashing the server. Not a good user experience!

So my first debugging task, and it went on a long time, was to find every pointer dereference (and it was nul-terminated strings) and wrap them in a conditional "if (ptr != NULL) { ... }"

At the time, I had considered it crazy/lazy to omit those in the first place and code on such an assumption. But C was so cozy with the hardware, and we were just kids. And there was the genesis for the cynical expression "All The World's A VAX!"

Pinus · 2024-09-24T06:36:18 1727159778

At least some versions of HP-UX running on the HPPA architecture did the same thing. Writing to a NULL pointer would SIGSEGV, but reading got you a zero. There was a linker flag to turn off this "feature", but when I tried that, I started getting SIGSEGV:s all over various libraries I had no control over...

AStonesThrow · 2024-09-24T10:28:08 1727173688

If I recall correctly, we were always careful to allocate memory and assign it to the pointer when creating a new string. But the test for whether a string-typed property existed was just to dereference that pointer.

Interestingly, the maintainer had written this as an expansion to an existing codebase, which started on a NeXT and was already handling those NULL pointers properly, so it was a mismatch of coding style; a regression, I suppose.

Our next challenge was the database being read entirely into RAM, and as the players expanded the virtual world, the Resident Set Size went out of control and ate all resources on the workstation.

Then a fellow came along and wrote a "port concentrator" add-on, because at the time, process limits permitted only 64 open file descriptors, which capped the number of inbound TCP connections for active players. The port concentrator spawned 4 workers and muxed a total of 256 connections, for a pioneering MMORPG environment!

m463 · 2024-09-24T06:50:18 1727160618

couldn't you just have mapped page zero? :)

AStonesThrow · 2024-09-24T09:56:00 1727171760

I am sorry, I searched for "page zero" in the ANSI C specifications and couldn't find it. My source code printouts begin with page 1...

benchloftbrunch · 2024-09-24T02:00:26 1727143226

Something not mentioned is that on Windows `long` is always 32 bits (same size as `int`), even on 64-bit architectures.

malkia · 2024-09-24T02:18:25 1727144305

Is it still the case that wasm is still 32-bit?

melchizedek6809 · 2024-09-24T04:07:47 1727150867

Memory64 is supported by a lot of runtimes now, although it isn't fully standardized yet (see https://github.com/WebAssembly/proposals), so no idea how reliable the implementations actually are, since I haven't had a need for that much memory yet