> A linker typically only includes the parts of the library it needs for each bi...

akira2501 · on Jan 27, 2024

> It is a one-off step, though.

Yes, but often a one off step that sets all your calls to call through a pointer, so each call site in a dynamic executable is slower due to an extra indirection.

> For large, very large and frequently used dynamic libraries, caching can be employed to reduce such overhead.

The cache is not unlimited nor laid out obviously in userspace, and if you have a bunch of calls into a library that end up spread all over the mapped virtual memory space, sparse or not, you may evict cache lines more than you otherwise would if the functions were statically linked and sequential in memory.

> as the 100% code coverage is exceedingly rare.

So you suffer more page faults than you otherwise have to in order to load one function in a page and ignore the rest.

inkyoto · on Jan 28, 2024

> Yes, but often a one off step that sets all your calls to call through a pointer, so each call site in a dynamic executable is slower due to an extra indirection.

That is true, however in tight loops or in hot code paths it is unwise to instigate a jump anyway (even into a subroutine in the close locality). If the overhead of invoking a function in the performance sensitive or critical code is considered high, the code has to be rewritten to do away with it, and it is called microoptimisation. This will also be true in the case of the static linking.

Dynamic libraries do not cater for microoptimisations (which are rare) anyway. They offer greater convenience with a trade-off over the maximum code peformance gains.

> The cache is not unlimited nor laid out obviously in userspace […]

I should have made myself clearer. I was referring to the pre-linked shared library cache, not the CPU cache. The pre-linked shared library cache reduces the process startup time and offer better user experience. The cache has nothing to do with performance.

> So you suffer more page faults than you otherwise have to in order to load one function in a page and ignore the rest.

I will experience significantly fewer page faults if my «strlen» code comes from a single address in a single memory page from 10k processes invoking it (the dynamic library case) as opposed to 10k copies of the same «strlen» sprawled across 10k distinct memory pages at 10k distinct memory addresses (the static linking case).