> A linker typically only includes the parts of the library it needs for each binary […]
It is exactly the same with the dynamic linking due to the demand paging available in all modern UNIX systems: the dynamic library is not loaded into memory in its entirety, it is mapped into the process's virtual address space.
Initially, there is no code from the dynamic library loaded into memory until the process attempts to access the first instruction from the required code at which point a memory fault occurs, and the virtual memory management system loads the required page(s) into the process's memory. A dynamic library can be 10Gb in size and appear as a 10Gb in the process's memory map but only 1 page can be physically present in memory. Moreover, under the heavy memory pressure the kernel can invalidate the memory page(s) (using LRU or a more advanced memory page tracking technique) and the process (especially true for background or idlying processes) will reference zero pages with the code from the dynamic library.
Fundamentally, dynamic linking is the deferred static linking where the linking functions are delegated to the dynamic library loader. Dynamic libraries incur a [relatively] small overhead of slower (compared to statically linked binaries) process startup times due to the dynamic linker having to load the symbol table, the global offset table from the dynamic library and performing the symbol fixup according to the process's own virtual memory layout. It is a one-off step, though. For large, very large and frequently used dynamic libraries, caching can be employed to reduce such overhead.
Dynamic library mapping into the virtual address space != loading the dynamic library into memory, they are two disjoint things. It almost never happens when the entire dynamic library is loaded into memory as the 100% code coverage is exceedingly rare.
Yes, but often a one off step that sets all your calls to call through a pointer, so each call site in a dynamic executable is slower due to an extra indirection.
> For large, very large and frequently used dynamic libraries, caching can be employed to reduce such overhead.
The cache is not unlimited nor laid out obviously in userspace, and if you have a bunch of calls into a library that end up spread all over the mapped virtual memory space, sparse or not, you may evict cache lines more than you otherwise would if the functions were statically linked and sequential in memory.
> as the 100% code coverage is exceedingly rare.
So you suffer more page faults than you otherwise have to in order to load one function in a page and ignore the rest.
> Yes, but often a one off step that sets all your calls to call through a pointer, so each call site in a dynamic executable is slower due to an extra indirection.
That is true, however in tight loops or in hot code paths it is unwise to instigate a jump anyway (even into a subroutine in the close locality). If the overhead of invoking a function in the performance sensitive or critical code is considered high, the code has to be rewritten to do away with it, and it is called microoptimisation. This will also be true in the case of the static linking.
Dynamic libraries do not cater for microoptimisations (which are rare) anyway. They offer greater convenience with a trade-off over the maximum code peformance gains.
> The cache is not unlimited nor laid out obviously in userspace […]
I should have made myself clearer. I was referring to the pre-linked shared library cache, not the CPU cache. The pre-linked shared library cache reduces the process startup time and offer better user experience. The cache has nothing to do with performance.
> So you suffer more page faults than you otherwise have to in order to load one function in a page and ignore the rest.
I will experience significantly fewer page faults if my «strlen» code comes from a single address in a single memory page from 10k processes invoking it (the dynamic library case) as opposed to 10k copies of the same «strlen» sprawled across 10k distinct memory pages at 10k distinct memory addresses (the static linking case).
It is exactly the same with the dynamic linking due to the demand paging available in all modern UNIX systems: the dynamic library is not loaded into memory in its entirety, it is mapped into the process's virtual address space.
Initially, there is no code from the dynamic library loaded into memory until the process attempts to access the first instruction from the required code at which point a memory fault occurs, and the virtual memory management system loads the required page(s) into the process's memory. A dynamic library can be 10Gb in size and appear as a 10Gb in the process's memory map but only 1 page can be physically present in memory. Moreover, under the heavy memory pressure the kernel can invalidate the memory page(s) (using LRU or a more advanced memory page tracking technique) and the process (especially true for background or idlying processes) will reference zero pages with the code from the dynamic library.
Fundamentally, dynamic linking is the deferred static linking where the linking functions are delegated to the dynamic library loader. Dynamic libraries incur a [relatively] small overhead of slower (compared to statically linked binaries) process startup times due to the dynamic linker having to load the symbol table, the global offset table from the dynamic library and performing the symbol fixup according to the process's own virtual memory layout. It is a one-off step, though. For large, very large and frequently used dynamic libraries, caching can be employed to reduce such overhead.
Dynamic library mapping into the virtual address space != loading the dynamic library into memory, they are two disjoint things. It almost never happens when the entire dynamic library is loaded into memory as the 100% code coverage is exceedingly rare.