For loaders definitely yes! On my local Fedora laptop, qemu takes about 60ms to simply run 'qemu-system-x86_64 -version'. Almost all of the time is taken up in glibc's loader, resolving the ~170 shared libraries.
While 60ms may not sound like a lot, I've been trying to get qemu boot times down to the hundreds of milliseconds (for lightweight VMs and sandboxing). It's been quite successful, but 60ms is now a significant chunk of the total boot time.
Edit: Sure I'm aware I could compile my own custom qemu, or statically link, but that's really not the point. I want qemu to boot very fast for everyone, and each of those shared libraries is a feature that someone, somewhere wants.
The problem is that the glibc loader has to look through each of those 170 libraries for every single symbol.
On Windows, OS X, Solaris, and probably lots of other OSes, the executable format supports saying "This function needs g_string_append from libglib-2.0.so.0" and "That function needs SDL_Init from libsdl-1.2.so.0". The GNU dynamic linker (and the BSD dynamic linker, incidentally) only supports saying "Hi, I'd like libglib-2.0.so.0 and libsdl-1.2.so.0 at some point; okay, now I'd like g_string_append and SDL_Init". This means the linker has to go looking in SDL_Init in GLib, despite everyone knowing that SDL_Init is going to be in SDL and nowhere else.
Over a decade ago, the OpenOffice.org developers found that it started faster on Windows than on GNU/Linux, and also started faster via WINE than natively on GNU/Linux, and that implementing direct binding would cut startup time on GNU/Linux from about four seconds to two: https://sourceware.org/ml/binutils/2005-10/msg00436.html
It was rejected out-of-hand by Ulrich Drepper because prelink exists, even though prelink didn't actually solve the problems.
Now that Ulrich hasn't been involved with glibc for a few years, and prelink seems to be dead, it's probably time for someone to revive that patchset.
Even without him I still consider a lot of Linux development attitude to be "har, har, the noob doesn't know that he has to hexagonize the fongebangler in the WERTR_FREW file before he can use the Backspace key, no we aren't going to change the defaults to match the 99.9999% of the keyboards of the world."
I never understood how they enjoy hexagonizing all the fongebanglers every time they install all their systems. Because they surely have to. And don't make me started about these who develop Linux GUIs.
It's a poisonous culture, where it's more important to look smart than to be smart. Note that Ulrich's answer was not just completely self-assured but also technically incorrect, that he didn't respond to the polite questions about his answer, and that prelink eventually turned out to be a big misfeature.
I suspect they're quietly pentagonizing all the fongebanglers every time they install their systems, confused why there are only five points instead of six, and too worried about their reputation to suggest that maybe someone should make fongebanglers automatically figure out the right polygon, because then they'd they'll admit they don't know actually how to fix it themselves.
Yes, thanks, only his answer to the post you linked to is more than enough to demonstrate how wrong it was allowing him to be responsible for anything. How many years did he manage to block the improvements?
To get back to the topic we discuss: the patch sent there (and rejected by UD) speeds up loading everywhere glibc is used! And that's the pure CPU speedup, no matter how fast SSD you have!
Reading the question of Michael Meeks, it was just a proof of concept, maybe the whole mechanism can be improved even more, he comments his own work:
"Is there a better way to achieve what I do in dl-deps.c? the umpteen-string compares are clearly highly evil"
Anybody knows the status of uclibc? How do they do it? If nobody knows, anybody willing to measure how much the same version of OO needs on the same hardware with glibc and uclibc?
Isn't it a feature that you can replace an implementation at link time? E.g. implement your own g_string_append or whatever. If all symbols were bound to libraries, that would be impossible.
Yes, that's a feature, with LD_PRELOAD or similar. But you don't need to support SDL being able to arbitrarily replace GLib symbols, without opting in to such a replacement. Two options:
1. Distinguish preloaded libraries from normal libraries (IIRC, glibc doesn't really do this). Conduct searches in the list of preloaded libraries, followed by the specific library where the program thinks it is. In the normal case, where LD_PRELOAD is empty, this adds no overhead and you still get to look through 1 library instead of 170. In the most common abnormal case, you still only look at 2 libraries instead of 171.
2. Have libraries explicitly say "I am replacing this symbol from this library", instead of just happening to have a name collision and be loaded first. OS X does this via the interpose section: the format is an array of structures of two pointers, the new function and the old one. The old "pointer" is usually a dynamic relocation, which means all the information in a dynamic relocation is present, including info about which library you're interposing on. The linker looks at all interpose sections of libraries at startup (again, in the common case, this is empty) and applies those overrides when it's about to resolve something to the old function.
It is a feature that is useful for many standard C library functions, a few functions in other commonly used libraries, and next to no functions in application libraries in cases like the parent's OOo example.
A reasonable approach to designing shared libraries would be to allow interposition as a non-default feature for explicitly marked symbols, and use fast direct binding by default.
For the case you describe, I can imagine that it could be possible to implement on the system level some caches of the necessary module info for any application that is started, which would be valid at least until something new is installed on the system and in case of somebody developing something, he'd have to disable it for the application he's testing.
That's what "prelink" did, but IMHO the cure was worse than the disease. prelink would modify all your executables in /usr/bin, breaking them in some cases (for example if they contained any non-standard ELF section). It also required invasive changes in RPM and SELinux. Prelink is dead upstream and was dropped from Fedora a few releases back.
For reference, macOS has something like prelink in the form of the dyld shared cache, where a daemon just links together every shared library in the standard system directories and stores the result in a separate cache file. Then that file is essentially mmapped into every process on the system, so the dynamic linker has very little work to do at process startup. (For whatever reason, executables aren't included, so it still has to link those, but that's just an implementation quirk.) This works quite well, though it wastes a ton of disk space (essentially duplicating each library on disk).
Too bad macOS is really slow to start processes anyway, compared to Linux, because the kernel sucks.
I can imagine that there are surely some weak points, still I'd like to know if you can or will provide some specifics about macOS' kernel suckiness which are technical enough. I expect that you can as you wrote about dyld shared cache. Thanks.
While 60ms may not sound like a lot, I've been trying to get qemu boot times down to the hundreds of milliseconds (for lightweight VMs and sandboxing). It's been quite successful, but 60ms is now a significant chunk of the total boot time.
Edit: Sure I'm aware I could compile my own custom qemu, or statically link, but that's really not the point. I want qemu to boot very fast for everyone, and each of those shared libraries is a feature that someone, somewhere wants.