Hacker News new | past | comments | ask | show | jobs | submit login
LD_PRELOAD: The Hero We Need and Deserve (jessfraz.com)
352 points by ingve on Feb 17, 2019 | hide | past | favorite | 139 comments



Awesome post! `LD_PRELOAD` is a powerful tool for program instrumentation.

It's worth noting, though, that using `LD_PRELOAD` to intercept syscalls doesn't actually intercept the syscalls themselves -- it intercepts the (g)libc wrappers for those calls. As such, an `LD_PRELOAD`ed function for `open(3)` may actually end up wrapping `openat(2)`. This can produce annoying-to-debug situations where one function in the target program calls a wrapped libc function and another doesn't, leaving us to dig through `strace` for who used `exit(2)` vs. `exit_group(2)` or `fork(2)` vs. `clone(2)` vs. `vfork(2)`.

Similarly, there are myriad cases where `LD_PRELOAD` won't work: statically linked binaries aren't affected, and any program that uses `syscall(3)` or the `asm` compiler intrinsic to make direct syscalls will happily do so without any indication at the loader level. If these are cases that matter to you (and they might not be!), check out this recent blog post I did on intercepting all system calls from within a kernel module[1].

[1]: https://blog.trailofbits.com/2019/01/17/how-to-write-a-rootk...


There is a very recent development in this area -- there is now a way to do it without ptrace and instead entirely using seccomp[1]. It's somewhat more complicated (then again, ptrace is far from simple to get right) but gives you the benefit that you don't need to use ptrace (which means that debuggers and upstart will work).

It's going to see a lot of use in container runtimes like LXC for faking mounts and kernel module loading (and in tools like remainroot for rootless containers), but it will likely also replace lots of uses of LD_PRELOAD.

[1]: https://youtube.com/watch?v=sqvF_Mdtzgg


There's another way to intercept syscalls without going as far as a kernel module, using debugging API ptrace. There's a pretty neat article about how to implement custom syscalls using ptrace: https://nullprogram.com/blog/2018/06/23/


Yup! I discuss the pros and cons of using `ptrace` within that post.

It's all about the use case: if being constrained to inferior processes and adding 2-3x overhead per syscall doesn't matter, then `ptrace` is an excellent option. OTOH, if you want to instrument all processes and want to keep instrumentation overhead to a bare minimum, you more or less have to go into the kernel.


I've been looking for a while for a way to capture all file opens and network ops to profile unknown production workloads similar to proc. explorer on Windows, which I believe is implemented using ETW. Unfortunately strace seems to be out of the question purely because of the performance impact. Is the performance impact due to strace or ptrace itself?


It's ptrace itself: every traced syscall requires at least one (but usually 3-4) ptrace(2) calls, plus scattered wait(2)/waitpid(2) calls depending on the operation.

If you want to capture events like file opens and network traffic, I'd take a look at eBPF or the Linux Audit Framework.


I recommend bpftrace as an entry point to working with bpf

https://github.com/iovisor/bpftrace


This is really cool. Unfortunately the 4.x kernel requirement wouldn't work for the majority of my work since RHEL is still on 3 :|


If you have RHEL 7.6 or later, you have bpf


Most infamously, Golang programs use their own syscall wrappers, so you can't hook that with LD_PRELOAD.


You can use [1] from Intel to successfully intercept all syscalls from a given library, by default libc only. This library actually works by disassembling libc and replacing all `syscall` instructions with a jump to a global intercept function that you can write yourself. Incidentally it's also an LD_PRELOADed library.

[1] https://github.com/pmem/syscall_intercept


Yep, I give them a shout-out at the bottom of that post -- using capstone to instrument system call sites at runtime is great (if a little crazy).

I didn't actually realize it was an Intel project. I wonder how it stacks up against Pin.


LD_PRELOAD is a fantastic tool.

At a previous job, we wanted binary reproducibility - that is to say, building the same source code again should result in the same binary. The problem was, a lot of programs embed the build or configuration date, and filesystems (e.g. squashfs) have timestamps too.

Rather than patch a million different packages and create problems, we put together an LD_PRELOAD which overrode the result of time(). Eventually we faked the build user and host too.

End result: near perfect reproducibility with no source changes.

I've also used it for reasons similar to the GM Onstar example in the article -- adding an "interposer" library to log what's going on.

I've pulled similar stunts with pydbg on a Windows XP virtual machine -- sniffing the traffic between applications and driver DLLs (even going as far as sticking a logger on the ASPI DLLs). That and the manufacturer's debug info got me enough information to write a new Linux driver for a long-unsupported SCSI device which only ever had Win9x/XP drivers.


Thank you! I may yet figure out the protocol of my APS film scanner.


Well if I could figure out the protocol of the Polaroid Digital Palette (specifically the HR-6000 but the ProPalette and CI-5000S use the same SCSI protocol)...

Look for any debug data you can turn on in the driver and correlate that against whatever you see going to the scanner. Try to save timestamps if you can, then merge the two logs.

I was a little surprised that while Polaroid had stripped the DLL symbols, they'd left a "PrintInternalState()" debug function which completely gave away the majority of the DP_STATE structure fields.

After that, I reverse-engineered and reimplemented the DLL (it's a small DLL), swapped the ASPI side for Linux and wrote a tool that loaded a PNG file and spat the pixels at the reimplemented library.

And then someone sent me a copy of the Palette Developer's Kit...

(Incidentally I'd really love to get hold of a copy of the "GENTEST" calibration tool, which was apparently included on the Service disk and the ID-4000 ID Card System disks)


Wow, do you use these for anything?

I shoot 135 film and some medium format, I have tried Super 8 and would love to start shooting 16mm film - but having a film recorder and actually use it something?!

:-D What can you do, what would you do?

If I was filthy rich I'd project 35mm movies in my living room. :)


That's evil. I like it.


Was this with an operating system?


Embedded Linux "thing".


I'll share my story. I used to work at a popular Linux website hosting control panel company. Back in the early 2000's "frontpage extensions" were a thing that people used to upload their websites.

Unfortunately, frontpage extensions required files to exist in people Linux home directories, and people would often mess them up or delete them. People would need their frontpage extension files "reset" to fix the problem. Fortunately, Microsoft provided a Linux binary to reset a users frontpage extension files.

Unfortunately, it required root access to run. Also unfortunately, I discovered that a user could set up symlinks in their home directory to trick the binary into overwriting files like /etc/passwd.

We ended up actually releasing a code change that would overwrite getuid with LD_PRELOAD so that the Microsoft binary would think it was running as root, just to prevent it from being a security hazard.


So, it didn’t need root, but insisted on it? A MS binary no less.


It was very much in keeping of the Microsoft of the era. Not out of maliciousness. Just a general lack of interest or knowledge of any non-Windows platform, but a recognition that if Frontpage was going to be as dominant as they wanted, they at least needed to vaguely support it.

Think the worst case of "Well it works on my machine"


Ah, I remember how bad this era of Microsoft was.

Debian have a similar tool called "fakeroot" which is part of their packaging process.


Was there no way to jail or chroot the binary?


In the 'early 2000s' there was no security-focused containerization available on Linux.


There's a well-known libfaketime library, that can forge the current system time.

https://github.com/wolfcw/libfaketime

Here's my friend's LD_PRELOAD hack, it pushes the idea further: hooking gettimeofday() to make a program to think that the time goes faster or slower. Useful for testing.

https://github.com/m13253/clockslow


How about an entire business being setup around this little gem https://www.vornexinc.com/our-overview.htm

EDIT : Not libfaketime but the LD_PRELOAD recipe


I believe libfaketime also supports speeding up and slowing down time


>Useful for testing.

And as a speed hack for Quake 2!


Yeah, that hack was a great insight!!


Love it!


I've implemented some sort of "poor man's Docker" using LD_PRELOAD, back then in 2011 when Docker wasn't a thing. It works by overriding getaddrinfo (IIRC) and capturing name lookups of "localhost", which are then answered by an IP address that's taken from an env variable. The intended use is the parallelization of automated testing of a distributed system: by creating lots of loopback devices with individual IPs and assigning those to test processes (via the LD_PRELOAD hack), I could suddenly test as many instances of the software system next to each other as I wanted, on the same machine (the test machine is some beefy dual-socket server with lots of CPU cores and RAM). Each instance (which consists of clients and several processes that provide server services, thus they're by default configured to bind themselves to specific ports on localhost, as it is common for dev and test purposes) would then be able to route its traffic over its own loopback device, and I was spared of having to somehow untangle the server ports of all the different services just in order to be able to parallelize them on a single machine and of the configuration hell that would have come with this. It helped that processes by default inherit the env variables from their parents that spawned them - that made it a lot easier to propagate the preload path and the env variable containing the loopback IP to use. I just had to provide it to the top-most process, basically.

Today, one would use Docker for this exact purpose, putting each test run into its own container (or even multiple containers). But since the LD_PRELOAD hack worked so well, the project in which I implemented the above is still using it (although they're eyeing a switch to Docker, in part because it also makes it easier to separate non-IP-related resources such as files on the filesystem, but mostly because knowledge about Docker is more widespread than about such ancient tech as LD_PRELOAD and how to hack into name resolution of the OS).


Here's my ldpreload hack: rerouting /dev/rand to dev/urand -- because I disagree with gpg's fears on entropy. Now it's as fast as generating a private key with ssh-keygen or openssl:

https://github.com/matthewaveryusa/dev_random_fix


You can also just simply delete /dev/random and symlink it to urandom. Or delete it and create a character device at /dev/random that uses urandom's major/minor numbers.


My layman's understanding of the two is that /dev/urandom will happily output more bits than it has been seeded with, and so is unsuitable for use in cryptography, as it can output correlated values. Is my understanding here incorrect?


(edit: I see my parent post is being downvoted. How can this be? The commenter is just asking a question...)

It is incorrect. Both /dev/urandom and /dev/random are connected to a CSPRNG. Once a CSPRNG is initialized by SUFFICIENT unpredictable inputs, it's forever unpredictable for (practically) unlimited output (something 2^128). If the CSPRNG algorithm is cryptographically-secure, and the implementation doesn't leak its internal state, it would be safe to use it for almost all cryptographic purposes.

However, the original design in the Linux kernel was paranoid enough, it blocks /dev/random (even if a CSPRNG can output unlimited random bytes) if the kernel thinks the the output has exceeded the estimated uncertainty from all the random events. Most cryptographers believe if a broken CSPRNG is something you need to protect yourself from, you already have a bigger trouble, and it's unnecessary from a cryptographic point-of-view to be paranoid about a properly-initialized CSPRNG. /dev/random found on other BSDs is (almost) equivalent to Linux's /dev/urandom.

However, /dev/urandom has its own issues on Linux. Unlike BSD's implementation, it doesn't block even if the CSPRNG is NOT initialized during early boot. If you automatically generate a key for, e.g. SSH, at this point, you'll have serious troubles - predictable keys, so reading from /dev/random still has a point, although not for 90% of the programs. I think it's a prefect example of being overly-paranoid about unlikely dangers, while overlooking straightforward problems that are likely to occur.

The current recommended practice is to call getrandom() system call (and arc4random()* on BSDs) when it's available, instead of reading from raw /dev/random or /dev/urandom. It blocks, until the CSPRNG is initialized, otherwise it always outputs something.

*and no, it's not RC4-based, but ChaCha20-based on new systems.


> /dev/random found on other BSDs is equivalent to Linux's /dev/urandom.

This isn't quite true. The BSDs random (and urandom) block until initially seeded, unlike Linux's urandom. Then they don't block. (Like the getrandom/getentropy behavior.)

> The current recommended practice is to call getrandom() system call (and arc4random() on BSDs) when it's available, instead of reading from raw /dev/random or /dev/urandom. It blocks when the CSPRNG is initialized, but otherwise it always outputs something.

+1 (I'd phrase that as "blocks until the CSPRNG is initialized," which for non-embedded systems will always be before userland programs can even run, and for embedded should not take long after system start either).


Thanks for the correction, fixed. It was a bit difficult for me to paraphrase it...


> it blocks /dev/random (even if a CSPRNG can output unlimited random bytes) if the kernel thinks the the output has exceeded the estimated uncertainty from all the random events. Most cryptographers believe if a broken CSPRNG is something you need to protect yourself from, you already have a bigger trouble,

Not just that, but if you have a threat model where you actually need information theoretic security (e.g. you're conjecturing a computationally unbounded attacker or at least a quantum computer)-- the /dev/random output is _still_ just a CSPRNG and simply rate limiting it doesn't actually make a strong guarantee about the information theoretic randomness of the output. To provide information theoretic security the function design would need to guarantee that at least some known fraction of the entropy going in actually made it to the output. Common CSPRNGs don't do this.

So you could debate if information theoretic security is something someone actually ever actually needs-- but if you do need it, /dev/random doesn't give it to you regardless.

[And as you note, urandom doesn't block when not adequately seeded ... so the decision to make /dev/random block probably actually exposed a lot of parties to exploit and probably doesn't provide strong protection even against fantasy land attacks :(]


> simply rate limiting it doesn't actually make a strong guarantee about the information theoretic randomness of the output. To provide information theoretic security the function design would need to guarantee that at least some known fraction of the entropy going in actually made it to the output. Common CSPRNGs don't do this.

This is an interesting point I hadn't thought about before, so thanks for that. I suppose if you're generating a OTP or something like that, there might be some small advantage to using /dev/random, but the probability of it making a difference is pretty remote.

The one thing I haven't been able to figure out is why Linux hasn't "fixed" both /dev/random and /dev/urandom to block until they have sufficient entropy at boot and then never block again. That seems like obviously the optimal behavior.


Blocking could potentially result in the system getting stuck during boot and simply staying that way. Compatiblity is a bear. The getentropy syscall does the reasonable thing.


It's important to note here that Linux's behaviour is broken, plain and simple: /dev/random blocks even if properly seeded, and /dev/urandom doesn't block even if improperly seeded.

The Real Solution™ is to make /dev/random and /dev/urandom the same thing, and make them both block until properly seeded. And replace the current ad-hoc CSPRNG with a decent one, e.g. Fortuna. There were patches almost 15 years ago implementing this (https://lwn.net/Articles/103653/), but they were rejected.

There's simply no good reason not to fix Linux's CSPRNG.


I think getrandom(2) is a fine choice, but if you are using the C library (as opposed to using asm directives to make syscalls), getentropy(3) is even better. No need to think about the third `flags` handler or read a long section about interruption by a signal handler.


Python had an internal fight of its own about how to handle it right, too.

https://www.python.org/dev/peps/pep-0524/


Yes and no. Much of cryptography is based on psuedorandom number generators, which output more bits than they are seeded with. If these PRNGs are not secure, then almost any piece of cryptography you actually use would be insecure independent of your choice to use random or urandom.

Unless all of your cryptography is information-theoretically secure, there is no problem using a PRNG.

If you happen to be using a an information-theoretically secure algorithm than you are theoretically weaker using a limited entropy PRNG; but there is no practical implications of this.


The only information theoretically secure encryption algorithm is a one-time pad seeded with true randomness. In fact, you cannot achieve information theoretic security using a pseudorandom generator of any kind.


Its not an encryption algorithm, but Shamirinformation theoretically secure 's secret sharing is also information theoretically secure.


Here's mine:

https://github.com/dsaul/UDELibXprop-Legacy

I was writing a dock program over a decade ago, and java programs didn't put the PID on the window, whereas everything else did.

Had to fix it somehow...


I am curious. How do you know if this is secure or not? Is there any publication or article available for this slightly time-saving but potentially dangerous choice?


1. The official man page.

The /dev/random interface is considered a legacy interface, and /dev/urandom is preferred and sufficient in all use cases, with the exception of applications which require randomness during early boot time; for these applications, getrandom(2) must be used instead, because it will block until the entropy pool is initialized.

2. https://www.2uo.de/myths-about-urandom/


Not that I disagree with you, but which are the official man pages for /dev/urandom? It's my recollection that the advice therein varies from OS to OS.


This page is part of release 4.16 of the Linux man-pages project. A description of the project, information about reporting bugs, and the latest version of this page, can be found at https://www.kernel.org/doc/man-pages/.

And only Linux has /dev/urandom.


BSDs (incl. macOS) have /dev/urandom, but it's the same thing as /dev/random. Both don't ever block after they've been filled initially at boot time.


> this slightly time-saving but potentially dangerous choice?

The one and only danger is during the machine's boot process, because while /dev/random and /dev/urandom use the same data:

* on linux /dev/random has a silly and unfounded entropy estimator and will block at arbitrary points (used to be a fad at some point, but cryptographers have sworn off it e.g. Yarrow had an entropy estimator but Fortuna dropped it)

* also on linux, /dev/urandom never blocks at all, which includes a cold start, which can be problematic as that's the one point where the device might not be seeded and return extremely poor data

In fact the second point is the sole difference between getrandom(2) and /dev/urandom.

If you're in a steady state scenario (not at the machine boot where the cold start entropy problem exists) "just use urandom" is the recommendation of pretty much everyone: tptacek, djb, etc…

https://www.2uo.de/myths-about-urandom/

https://sockpuppet.org/blog/2014/02/25/safely-generate-rando...

http://blog.cr.yp.to/20140205-entropy.html (see bottom of page)


> In fact the second point is the sole difference between getrandom(2) and /dev/urandom.

AFAIK, there's another important difference: getrandom(2) doesn't use a file descriptor (so it'll work even if you're out of file descriptors, or in other situations where having an open fd is inconvenient), and it doesn't need access to a /dev directory with the urandom device.


https://sockpuppet.org/blog/2014/02/25/safely-generate-rando...

(Note, that's from 2014; today I would recommend getrandom() instead.)


You can just replace the /dev/random device file with /dev/urandom


Why don't you replace it system-wide?


It's a shared system and I'm not going to prohibit the free exercise of other people's religions even though mine is clearly better.


Agree, couldn't this be done with a bind mount or something?


The catch is old programs that depend on the blocking behavior of /dev/random during early boot could be an issue. Unlikely to be a problem on a server, though...


If they're doing the replacement at runtime, the cold start is probably long done with.


A bind mount? Just open up the source code and add a "u" where necessary.


> A bind mount? Just open up the source code and add a "u" where necessary.

So, recompile every program and every subsequent update to use this functionality or...bind mount. One of these sounds easier than the other.


One useful tool for the toolbox, to be used very carefully, after thinking about the consequences:

libeatmydata: https://github.com/stewartsmith/libeatmydata

It disables fsync, o_sync etc, making them no-ops, essentially making the programs writes unsafe. Very dangerous. But very useful when you're trying to bulk load data in to a MySQL database, say as preparation of a new slave (followed by manual sync commands, and very careful checksumming of tables before trusting what happened)


Useful for running tests against a throwaway MySQL database too!


I've been using LD_PRELOAD for fun and profit for a long, long time. Its simplicity is due to the simplicity of the C ABI. Its power is due to dynamic linking.

C is one programming language. C w/ ELF semantics and powerful link-editors and run-time linker-loaders is a rather different and much more powerful language.

I won't be sad to see Rust replace C, except for this: LD_PRELOAD is a fantastic code-injection tool for C that is so dependent on the C ABI being simple that I'm afraid we'll lose it completely.


I don't see the C ABI going anywhere for a while.


Doesn’t rust use the C ABI under the covers?


You can easily write and call functions that abide the C ABI in Rust, but the set of types permitted in those signatures is much smaller (only #[repr(C)]-compatible types) than in ordinary Rust functions. The Rust ABI is more complicated and won't be stabilized anytime soon.


Holy crap, was that article annoying to read! The author should really cut back on meme image macros.


This is Jess’ personality - check out her Twitter account. I don’t mind it, but I’ve followed her for a while so I’m used to it. Honestly I find it to be a refreshing break from typical the typically stiff writing I see. She’s smart and doesn’t need to hide behind stodgy writing in order to make herself seem smarter.


I don't really mind it on twitter, as twitter is anything but serious, and you can't really have any coherent text there.

But such elements in a regular article simply harm its coherence and readability for anyone that does not spend much of their time in (rather noisy and immature, IMHO) communities which feature "meme image macros" heavily.

(Bonus negative points if some of the images are animated. That makes me think that the author actively hates the readers.)


Since I knew what LD_PRELOAD did then I was rather amused that the author had a similar epiphany over it as I did many years ago, which made it a great read -- something to relate to.

I agree that the article is emotional, but the annoyance or not is so very subjective.


And the intro going for about 1/3 of the total article before even knowing what we are talking about (yeah, I click on article when I’m intrigued by the title, when it seems programming related)


I'll also join in and share my projects using LD_PRELOAD. These also work on macOS through its equivalent DYLD_INSERT_LIBRARIES.

https://github.com/d99kris/stackusage measures thread stack usage by intercepting calls to pthread_create and filling the thread stack with a dummy data pattern. It also registers a callback routine to be called upon thread termination.

https://github.com/d99kris/heapusage intercepts calls to malloc/free/etc detecting heap memory leaks and providing simple stats on heap usage.

https://github.com/d99kris/cpuusage can intercept calls to POSIX functions (incl. syscall wrappers) and provide profiling details on the time spent in each call.


I recently gave a small talk about this and listed the applications I had for it in the past few years:

- Test low memory environment

- Add memory tracking

- Ignore double frees (fix broken programs)

- Cache allocations / lookaside lists

- Trace all file operations

- Seamlessly open compressed files with fopen()

- Speed up time() as program sees it

- Offset time() to bypass evaluation periods

- Alternative PRNG

- Intercept/reroute network sockets

- Trace various API calls (useful when debugging graphics APIs)

- Force parameters to some API calls

- Set a custom resolution not supported by program

- Switch between HW and SW cursor rendering

- Framelimiting and FPS reporting

- Replace a library with a different one through a compat layer

- Frame buffer postprocessing (e.g. reshade.me)

- Overlays (e.g. steam)

E: format


For a Windows equivalent:

https://github.com/Microsoft/Detours/wiki

It's a bit more unwieldy to use, because it doesn't just replace all matching symbols (it's not how symbol lookup works for DLLs in Win32) - the injected DLL has to be written specifically with Detours in mind, and has to explicitly override what it needs to override. But in the end, you can do all the same stuff with it.


the injected DLL has to be written specifically with Detours in mind

Does it? I've almost (ie: haven't :P) used Detours but https://github.com/Microsoft/Detours/wiki/OverviewIntercepti... reads like it can rewrite standard function prologues.


I didn't phrase that unambiguously -"injected DLL" in this case means "the DLL with new code that is injected", not "the DLL that the code is being injected into". With LD_PRELOAD, all you need to override a symbol is an .so that exports one with the same name. With Detours, you need to write additional code that actually registers the override as replacing such-and-such function from such-and-such DLL. But yes, the code you're overriding doesn't need to know about any of that.


Ah, I get it. Thanks.


Librespot uses LD_PRELOAD to find and patch the encryption/decryption functions used in Spotify's client so the protocol can be examined in wireshark (and ultimately reverse engineered). I am not the original author, he wrote a MacOS version using DYLD_INSERT_LIBRARIES to achieve something similar.

https://github.com/librespot-org/spotify-analyze/blob/master...


In Ruby world a lot of people use LD_PRELOAD to change the default malloc to jemalloc (or tcmalloc): https://github.com/jemalloc/jemalloc


Another interesting preloaded library is stderred, which turns output to standard error red: https://github.com/sickill/stderred


I once used LD_PRELOAD to utilize an OpenGL "shim" driver (for an automated test suite). The driver itself was generated automatically from the gl.h header file.


If everyone is giving examples, of LD_PRELOAD — it has serious production use at scale in HPC, particularly for profiling and tracing. Runtimes such as MPI provide a layer designed for instrumentation to be interposed, typically with LD_PRELOAD (e.g. the standardized PMPI layer for MPI). Another example is the entirely userspace parallel filesystem that OrangeFS (né PVFS2) provides via the "userint" layer interposing on Unix i/o routines. That sort of facility is a major reason for using dynamic linking, despite the overheads of dynamically loading libraries for parallel applications at scale. I'm not sure if a solution could be hooked in with LD_PRELOAD, but Spindle actually uses LD_AUDIT: https://computation.llnl.gov/projects/spindle


I like the use of LD_PRELOAD in this paper: Long et al., Automatic Runtime Error Repair and Containment via Recovery Shepherding, PLDI 2014, http://people.csail.mit.edu/rinard/paper/pldi14.pdf

The authors have a small library that sets up some signal handlers for things like divide by zero and segmentation faults. They LD_PRELOAD this library when starting a buggy binary (they test things like Chromium and the GIMP), and when the program tries to divide by zero or read from a null pointer, their signal handlers step in and pretend that the operation resulted in a value of 0. The program can then carry on without crashing and usually does someting meaningful. Tadaa, automatic runtime error repair!


My favorite: https://github.com/musec/libpreopen is a library for adapting existing applications that open() and whatnot from all over everywhere to the super strict capability based Capsicum sandbox on FreeBSD. I'm working on https://github.com/myfreeweb/capsicumizer which is a little wrapper for launching apps with preloaded access to a list of directories from an AppArmor-like "profile".


LD_PRELOAD is extremely helpful in troubleshooting libraries. Around 2007, qsort on RHEL was slower than SUSE. I raised a case with Redhat along with a test case; but Redhat was not helpful, as it was not reproducible.

So, I copied glibc.so from a SUSE machine to that RHEL machine and ran the test case with LD_PRELOAD, compared with the RHEL glibc. I showed these results to Redhat. Eventually, a patch was applied to glibc on their side.


I personally just hate LD_PRELOAD because it's very difficult to turn it off and keep it off. I am glad others find uses for it and that's great,but I hate the privilege escalation attack surface it opens up,I get that it has uses,but there needs to be a simple way to disable it for hardened systems.


I feel like I've learned something new from this, I'd never heard of this before.

Would this work with Go or Rust binaries?


LD_PRELOAD only works for binaries that are dynamically linked (LD_PRELOAD is actually handled by the link loader not the kernel[1]), and you can only use it to overwrite dynamic symbols IIRC.

It definitely doesn't work with Go, and Rust might work but I'm not sure they use the glibc syscall wrappers.

[1]: http://man7.org/linux/man-pages/man8/ld.so.8.html


Rust uses glibc by default; you can use MUSL but you have to opt in.


I'm aware of that, I guess my point was that Rust probably doesn't use a lot of glibc (like most C programs would) so the utility of LD_PRELOAD is quite minimal.

I don't know enough about .rlib to know whether you could overwrite Rust library functions, but that's a different topic.


Rust uses glibc to call into the kernel like anything else. The standard library is built on top of it.


Right, but does that mean it's only used as a way of getting syscall numbers (without embedding it like Go does) or is it the case that you could actually LD_PRELOAD random things like nftw(3) and it would actually affect Rust programs? I'll be honest, I haven't tried it, but it was my impression that Rust only used glibc for syscall wrappers?


We don’t provide nftw like functionality in std, and so you can’t replace it as it would have never even been called. But for example, malloc and free are used, not sbrk directly: https://github.com/rust-lang/rust/blob/master/src/libstd/sys...


Not that it matters which particular libc you use, any libc is going to have dynamic symbols named 'open', 'read' etc. that can be hooked :)


MUSL is statically linked, and so you don’t have those dynamic symbols. That’s the point!


oh. I understand why that's the default, but it should be possible to dynamically link musl on a distro like Alpine, right?


Musl supports dynamic linking. But it also supports static linking (which glibc doesn't really support because of NSS and similarly fun features) -- hence why Rust requires musl to statically link Rust binaries.


My understanding is that there’s some complications there, but I’m not fully aware of all the details, honestly.


Here's mine: https://github.com/andrewrk/malcheck/

It uses LD_PRELOAD and a custom malloc so that you can find out all the horrible ways that application developers did not plan to run out of memory.


We use a sort of similar trick (not via LD_PRELOAD, though) to inject faults in M_NOWAIT malloc() calls in the FreeBSD kernel. FreeBSD kernel code tends to be a bit better than most userspace code I've seen as far as considering OOM conditions, though it is not perfect.


Isn't LD_PRELOAD's hipness outweighed by the Pandora's box of security issues it gives rise to?


I know that the linker will ignore LD_PRELOAD for suid binaries, what other kinds of issues are there?


What security issues are those? Is there anything you can do with LD_PRELOAD that you cannot do in other ways such as modifying binaries before executing them?


As a regular ol' GNU/Linux user, you cannot modify binaries in /usr/bin (or /bin), but you can definitely influence their behavior by "LD_PRELOAD=blah /usr/bin/thing".


Except you (non-root) can only do that for yourself, and thus you can only make them do things you could make them do anyway.


If you can do that, you can (generally) do `cp /bin/foo ./ && modify foo && ./foo`


It depends on assumptions in the way a system is hardened. For example, a home directory mounted noexec. In theory, LD_PRELOAD will not mmap a file in a noexec area. But if you can find an installed library with functions that mirror some other application you have, and you can LD_PRELOAD that library before executing the target application, you might be able to force the library to call unexpected routines. (That's a stretch, granted)

Another would be possible RCE. Say you can get a server-side app to set environment variables, like via header injection. Then say you can upload a file. Can you make that server-side app set LD_PRELOAD to the file, and then wait for it to execute an arbitrary program?


I needed to calculate the potential output file size tar would produce, so what better way than using tar itself to calculate it. It just required hooking read, write, and close.

https://github.com/G4Vi/tarsize


I’ve found LD_PRELOAD immensely useful patching new code into binaries, saves the pain of trying to squeeze code into the existing binary.


Wow, that looks like a security nightmare.


It doesn’t work with setuid binaries


Meh? It only works on your own programs.


What do you mean? There are many examples of people using LD_PRELOAD to patch the behaviour of other's binaries.


Sure, but not across a security boundary.

Being able to override some library function such that running my text editor does $BADTHING isn't very interesting from a security perspective: if I have the capability to do that, I could also just run a program that does $BADTHING directly. Why bother with additional contortions to involve the text editor?


Malicious program without LD_PRELOAD can still copy the binary to other folder and sufficiently change the menu to point to the copy. Then modify the copy by binary patching to do whatever. Or run it via modified qemu to do whatever. The main problem is the lack of a proper sandbox and that all programs in user session generally have the same permissions.


No, I mean programs that you are executing as your own user.


I remember back in the day when LD_PRELOAD first became the go-to userland rootkitting method


audio generation from malloc and read: https://github.com/gordol/ld_preload-sounds

also for OSX: DYLD_INSERT_LIBRARIES


One issue on OSX is multi-level namespaces; I had to recompile with a flag to disable them in order to hook malloc/free, for example.



In case you can’t recompile, you should be able to do the same thing with `DYLD_FORCE_FLAT_NAMESPACE=1`.


Where have you been all my life?


One other useful thing : making a trash can for Linux, by wrapping unlink


If I were to provide ld preload based security cover (take any binary and secure it with ld-preload), will that be acceptable to corporates? Or does that increase the attack surface?


How do you plan to secure a binary with LD_PRELOAD?


one feature to break all software. superb


Wait until they find out how Go apps work...


Given Jess used to work at Docker, much of which is written in Go, and has spoken at various Go conferences, she already knows.


...what's this a reference to?


Static linking, which makes this trick not work.


And the fact that Go will embed raw syscalls in its binaries, which is also somewhat annoying.


Which is unsupported on anything other than Linux and various BSDs.


On FreeBSD, it's supported but kinda sucks. e.g., porting to a new CPU architecture is hell (I contributed to the FreeBSD/aarch64 go port, someone else picked it up now…)

The libc is the stable ABI on pretty much any OS that's not called Linux, just use it.


Okay, but now you're using a C library as the basis for your non-C programming language. I understand why it is that way, but that kind of sucks.


Ehh… does it kind of suck? The ABI of libc's syscall wrappers is basically "here's some ELF symbols to call with some arguments using the operating system's preferred calling convention". The only really "C" thing about it, other than the name, is struct layouts of various arguments.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: