The point of splice(2) is to be a fast-path; it's not in POSIX, so software that...

bonzini · on May 26, 2022

You cannot in general use autoconf to detect runtime behavior.

derefr · on May 26, 2022

Many of autoconf's more fraught checks work by attempting to compile and run little C programs to see what happens. These checks enable not just static analysis (seeing what the compiler does), but also "edge-case analysis" ala a fuzzer — e.g. seeing if the runtime environment of the compilation environment is one where passing certain printf(3) format-specifiers makes it choke, etc.

So if splice(2) no longer works when the source is e.g. /dev/random, then autoconf could attempt to compile+run a program that splice(2)s from /dev/random to a known-working sink, and see whether that program SIGSEGVs or not when run; and use that to decide whether to allow an --enable-splice configure flag to be passed, vs. bailing out if such a flag is passed.

Of course, there's the implicit assumption that if you're passing such a flag, the build environment is going to be the deploy environment; or at least, the build environment's feature-set will be a subset of the deploy environment's, such that the build environment's runtime features can be used as a conservative underestimate of the deploy environment's runtime features.

This "build is a subset of deploy" is usually a sensible assumption. Build environments are controllable, while deploy environments are arbitrary; so anyone who wants to make a build that works on many different deploy environments, can just set up their build environment to have the "lowest common denominator" of the features of the systems they want to target.

(Compare and contrast: microarchitectural optimizations. Same story.)

bonzini · on May 26, 2022

> then autoconf could attempt to compile+run a program that splice(2)s from /dev/random to a known-working sink

And then the next day you, or your user, update your kernel, the result is out of date and your program crashes. That's just not how autoconf (or cmake or meson for that matter) are used.

> running autoconf checks inside a target-machine emulator

Run tests are quite rare. Older autoconf used to use printf to detect the size or alignment of a type, but for 15-20 years it has instead been doing binary search (basically "guess the number") so that only compilation tests are needed instead.

(I am a former autoconf developer and GCC build system maintainer).

rootw0rm · on May 26, 2022

I don't think they're suggesting runtime use, but rather building software after the splice(2) change. That's how I read it at least.

vlovich123 · on May 26, 2022

Aside from the “where you build isn’t where you run problem”, there’s the “this syscall doesn’t work when running on filesystem X or accessing /dev/urandom”.

Not only is the autoconf solution not fixing the problem, it’s placing a massive undue burden on developers. Linus has been inconsistent here. Telemetry would have been helpful here in aiding this work (ie support it with the slow path but report the event so that distro maintainers could provide feedback on broken paths). Once you think you’ve eliminated the long tail of issues, then remove and see if anything remains broken that telemetry didn’t catch.

bonzini · on May 26, 2022

It would break if compiled on old kernel and run on new kernel. When CI is containerized the kernel might not even have anything to do with the distro that you're building on.

baisq · on May 26, 2022

Doesn't that happen often anyway?

bonzini · on May 26, 2022

Nope. You can run programs to detect some behavior, for example printing sizeof(unsigned long) or checking how some rounding is performed. Those tests of course may affect the program behavior at runtime, but what they test is still the compilation environment.

tinus_hn · on May 27, 2022

That however is not how the linux kernel guarantee of the stability of userspace APIs works.