As I've said in other threads https://news.ycombinator.com/item?id=38847750#38862450, I highly recommend writing an ELF by hand at least once. It's a great exercise to understand the basic parts of an executable. It's also helpful if you want to go the opposite direction of this article - bottom-up instead of top-down.
Lots of other great discussion in various threads on that other HN post.
To explain the format to myself and others, I also created an interactive visualization for the bytes in the file. It helps me to click on a byte, see an explanation for it and see related bytes elsewhere in the file highlighted. https://scratchpad.avikdas.com/elf-explanation/elf-explanati...
I’ve had such fun making interactive educational visualisations like this. My life’s work is going into an interactive simulation of the USB protocol. Unfortunately I’m yet to bang it out over a weekend.
Thanks for the feedback. I replied in a sibling comment about how I made it.
For the bug, feel free to email me at avik at avikdas dot com if you'd like. The behavior I verified just now (for me) is that if you click one byte to highlight it, then clicking any other byte in the same group will remove the highlighting.
Similarly, I'd recommend writing a simple ELF loader. There's a fair bit of implementation complexity in dynamic linking, but if you only support static ELFs then it's straight-forward.
I've seriously considered writing an ELF loader that uses a special symbol (like _resolve) where dynamic library resolving is done imperatively. The flexibility from libdl always feel underwhelming.
I had written a static ELF loader for reasons, but when I was no longer to compile a static version of the binary I wanted to load, I found it wasn't too hard to load the system's dynamic loader instead. That's kind of the best of both worlds --- I can run a dynamicly linked binary, and I didn't have to do the linking and relocations.
Yes, likewise I wrote a reader that simply tried to parse every bit of a complex ELF binary to report its structure and quickly found myself in poorly documented territory. It’s an education if you want it.
I assume there's a better modern source (assuming for some reason you don't want to reference libbfd &c, but really, you at least want to cross-check with it) but BITD there was an AT&T System V book - I think it was https://books.google.com/books?id=mrImAAAAMAAJ but only at about 70% confidence, it's part of a series and might have been one of the others - that was "arcane but true" for ELF on existing platforms (at the time, which was the mid 1990s, which is why I hope there's a better starting point now...)
Modifying existing ELFs can also be extremely educational and fun. It's a bit frustrating at first because it's more or less impossible to debug this stuff when it doesn't work but when things finally start working it's awesome. Turns out it's possible to patch ELFs in all sorts of interesting ways. With the auxiliary vector you can even have introspection at runtime: Linux gives us the address of the program header table and from there you can get to anywhere. Just gotta extend a LOAD segment to cover the whole binary.
For example I wrote tools to embed lisp modules and code right into my lisp's interpreter executable. The embedded segment is loaded from the ELF automatically, the interpreter just finds it and runs the code. I'm so proud of this little feature I wrote an article about it.
These tricks can be easily done simply and portably, without caring about the underlying executable format.
One trick is that you can reserve some global array in the executable, which is prefixed by a byte sequence that doesn't occur anywhere. A small utility can find that byte sequence and write custom data after it to create a customized executable.
I think that, also, many executable formats don't mind if something is appended to the executable. If the executable somehow knows its original size (you can write that size somewhere using the previous trick: no grotty executable format parsing required), it can open itself, seek to that offset and read the data.
I think this might be how CLISP creates an .exe file on Windows; I think it takes the base clisp.exe and combines it with the lispinit.mem image into one file.
This is neat but has a limitation in that it cannot be expanded after the program is compiled and linked. Resizing the array would invalidate all pointers that follow as well as render incorrect any code that takes its size.
This can be solved with a layer of indirection: just append the data to the executable and write its size and file offset in the array. That way the data block can be freely resized. That's the solution people told me to use and indeed the one that I usually see in existing repositories. The problem is you run into some additional complexity later which results in the loss of portability and thus the main reason to choose this method.
> it can open itself
That's the crux of the issue. How does an executable open itself? That's where portability goes out the window. I've seen source code that opens argv[0] which is under the control of the parent program and therefore unreliable. I've seen code which opens /proc/self/exe which is Linux specific. I've seen code that calls Win32 API functions to get the path to the executable. All this just so it can open and load into memory a file which the kernel has already loaded, just so it can read some additional data off of it.
My solution sidesteps that question entirely. It just adds a LOAD segment for the embedded data which instructs the kernel to map it in automatically before the program even runs. There's no need to open, seek or read anything, it will already be there by the time execution begins.
The auxiliary vector contains a pointer to the program's segments table so it can reach the data from there. Then it's just a matter of walking this table looking for the custom descriptor segment. It's all done in a structured way, using the standard magic number locations and ranges. There's no chance of a magic number being recognized by mistake.
The only possible portability issue is the availability of the AT_PHDR, AT_PHENT and AT_PHNUM entries in the auxiliary vector. I'm not sure if they're standard. I know Linux has them and it's all I personally care about but if these entries do turn out to be standard then I can confidently say that my method is portable to any ELF-based operating system.
> How does an executable open itself? That's where portability goes out the window.
It's a way easier nut to crack than parsing executable formats. Various OSs have a method for this:
Linux: /proc/self/exe
Windows: GetModuleFileNameW(NULL, buf, MAX_PATH)
MacOS: _NSGetExecutablePath(buf, &size)
Solaris: getexecname()
BSDs: ...?
These are just "ask and you shall receive" APIs; nothing to analyze.
You may want to solve that problem anyway for other reasons: for instance, if you want your programming language to have a relocatable installation which finds related files using its own path.
On a system where you cannot solve this problem at all, you can use trick 1. Have an array somewhere in the program where you write the installation path. Your installer has to do this, and if the program is moved after installation, some utility program has to fix that up. (I am not crazy about the idea because it changes the checksum of the file at the installation site. You can no longer use an off-the-shelf program like sha256sum to check the integrity of the executable against the provisioned materials upstream.)
Put the offset at the end! Famously, the table of contents of a .zip file are the end rather than the beginning, which has many useful properties (such as being able to patch the contents by only appending to an existing file). And you can concatenate an executable and a .zip and get a file which is both.
Yes. Cosmopolitan libc has support for exactly this. It contains a lot of platform-specific hacks in order to open the executable though. I went through the implementation.
I think the problem is this notion of a "memory image". It would be so much easier if the kernel just copied the entire file into memory and called it a day.
Doing it is basically a hand assembly. One reads the documentation, selects the bytes needed using a processor data sheet, orders them into the various sections, populates the ELF fields and then it really does boil down to typing them all in.
Pre-ELF times, on say an 8 bit Apple 2, the machine code monitor, allowed input of the program bytes directly. Those are then executed.
Storing to disk is only a bit more involved, and there is another opportunity! Disk sector editors allow one to create a file...
ELF is platform agnostic, and has been used in operating systems on nearly every existing CPU platform since mid 90's (with a few notable exceptions being OS X, AIX, the embedded world and Windows).
> When the program starts running, you might think it starts at main. It doesn’t! It actually goes to _start. This does a bunch of Very Important Things that I don’t understand very well, including calling main. So I won’t explain them.
The way I understand it, the symbol main is a C-specific thing. The symbol _start is a language-agnostic entry point for the binary that will in this case call main.
A convention of i.e. calling the entry point _start with main's argc/argv would make the format a lot less flexible.
Technically the name _start is not special either. The binary lists its entry point address in a header and that’s where the OS starts execution from. That symbol is just called _start by convention by C and other languages, which is what the linker uses to set the entry point when writing the ELF headers, but if you’re writing your own linker scripts you could call the entry point whatever you want.
Technically it doesn't even need to be in the .text section, it could be anywhere in the address space. You'll get a segfault if it's not somewhere executable though (assuming you're on a system with an appropriately configured MMU)
yes and then you'll have a bad time, but at the same time per convention _start is where .text begins. You can see where it starts with readelf --file-header <executable> and look at Entry point address field. You can change it, yes.
No, it's not even a convention, _start is most commonly not where .text begins.
Compiling a static hello world binary on my system (aarch64 fedora 39, gcc -static hello.c -o hello), .text starts at 0x410080, e_entry is at 0x4103c0, and the _start symbol is also at 0x4103c0. This is not unusual at all.
A common hack to reduce ELF size is actually to start the first section (possibly the .text) right on the elf header, as this circumvents the alignment requirements.
Absolutely. And only available on hosted C. Freestanding C lets you have any entry point you want.
> The symbol _start is a language-agnostic entry point for the binary that will in this case call main.
That's just the linker's default. You can set it to a nicer symbol with -Wl,--entry="${symbol}" and GCC even supports setting it directly with no need for the unsightly -Wl.
Also, the entry point is actually a pointer, not a symbol. The linker just takes the address of the symbol you specify and sets the ELF entry point to that.
> calling the entry point _start with main's argc/argv
In addition to argument count and argument vector, the stack also contains the environment vector and the auxiliary vector. The process startup code can be as simple as popping all that stuff off the stack and into the appropriate registers and then calling a C function of your choosing. Note that the entry point is not itself a function: there's nothing to return to. The entry point code finishes with an exit system call to ensure clean process termination when main returns a status code. This is how things work on Linux at least.
> Things that I don’t understand very well, including calling main. So I won’t explain them.
It depends on the language runtime, but a common task will be initializing global non-0 statics. For languages like Rust/C/C++ you can also inject variables to be initialized via linker flags. Before start if the program is dynamically linked then I believe the linker runtime is run to resolve the links and then transfer control to _start.
Basically hacks on hacks on hacks added organically to offer extensibility and the hacks have enough social adoption and are good enough that we stick with them.
> Basically hacks on hacks on hacks added organically to offer extensibility and the hacks have enough social adoption and are good enough that we stick with them.
The more I learn about the deep depths of modern computing, the more I realize that they're actually full of inelegant legacy cruft.
The style guide at both my previous and current employers explicitly forbids having global non-0 statics for this exact reason: code that runs before main() is very unusual. Many assumptions do not hold.
A far better way is to use function-local statics. A static variable inside a function is initialized when execution reaches that point when the function is being called. Furthermore, such initialization is thread safe so that one initialization happens despite multiple concurrent calls of the function.
The only exception to that style guide rule is the new constinit in C++20. It is sometimes called linker-initialized to make it even clearer that the program didn't do anything to initialize it, the linker did.
> Furthermore, such initialization is thread safe so that one initialization happens despite multiple concurrent calls of the function.
IIRC, there are some popular compilers in which initialization of static variables inside functions is not thread safe (even though AFAIK the C++ standard said they should be).
The compiler I was thinking of was indeed MSVC. From a quick web search, it seems that more recent versions of MSVC have changed them to be thread safe by default, so if you can make sure that your code will never be compiled on older MSVC versions (and that nobody will ever use the compiler option which disables the thread-safe initialization), it might be fine to depend on it.
True as of msvc 2015 which is 8.5 years old at this point. I agree it’s weird that you can disable it but libraries retaining correctness in the face of random compiler flags is hard (eg ffast-math is a common one that can break your floating point library)
Does that mean this doesn't work in a freestanding environment? Yet another reason to assiduously avoid global variables. I suppose that's why I never ran into this issue.
> Does that mean this doesn't work in a freestanding environment? Yet another reason to assiduously avoid global variables. I suppose that's why I never ran into this issue.
Why do you say that? A freestanding environment will still enter through _start and execute the compiler generated code to initialize globals before main. What I can’t recall is how compile time non-0 values are initialized - I think it could be part of the bss and initialized by the loader instead (but freestanding environments would implement that too as part of being a target for a language) but both them and runtime initialized globals initialized between _start and main would work.
Basically freestanding targets might not give you access to runtime APIs (eg POSIX) but the language is still the language and all features defined as language features should work and it’s the responsibility of the compiler and target environment to provide that guarantee.
Your post gave me the impression the compiler generates initializers for static globals and calls them during program initialization. Is this not correct?
> A freestanding environment will still enter through _start and execute the compiler generated code to initialize globals before main.
Freestanding C generally implies not linking in any standard library code though. It doesn't make any sense to choose the freestanding C dialect only to link in the standard hosted C library.
The _start symbol is provided by the C library's so called standard system startup files. The standard _start calls all the initialization functions that the C program expects. If you override it with your own _start, then none of these functions will be called. I thought you meant the static data initialization was handled in one of these functions.
Wow, great read. I worked on the Windows DLL loader and we had to implement similar mechanics for similar reasons. The PE image format makes some part of this a little easier, but the complexity is essentially the same.
I started my blog in 2012, when I shifted my academic career from Mathematics to Computer Science. This topic was literally the first thing that I studied:
Never regretted going down this deep rabbit hole. IIRC, Julia also has a math background. Maybe it's the desire for bottom-up reasoning that leads math folks
towards experiments like this. Great to see her making this approachable for a large audience.
Explain that to the German judges that fined some poor fella for finding passwords in a binary by [doing the equivalent of] running strings on it. They claim he 'circumvented' the software's 'security measures'.
You probably know this, but in C, if you wrote the comparison like this
if (password[0] == 'Q'
&& password[1] == 'w'
/* the rest of the letters... */
&& password[8] == '3'
// C strings are 0 terminated
&& password[9] == '\0')
{
...
}
It will probably compile to something like this on x86-64, assuming a "password" is a pointer held in RDI (although I didn't look at a compiler's output):
cmp byte [rdi], 0x51 ; 'Q'
jne false
;; the rest of the letters...
cmp byte [rdi + 9], 0x0 ; '\0'
jne false
The benefit is that the strings utility wouldn't see the string "Qwerty123" anywhere in the binary.
Unicode and bounds checking code probably complicate this generated assembly in other languages.
It would be nice if there were some way to write a macro or an inline function so that you could write a comparison like
if inlineStringEqualP(mySecretPassword, "Qwerty123") {
...
And then it would expand into something like the assembly I wrote above, so that way, the string literal "Qwerty123" isn't embedded in the final binary executable anywhere. I bet it's possible with C++ templates somehow, but it would be messy.
You can run the `strings` command on most executables (or PDFs) and get an output of the strings represented in the file. Of course you can obfuscate some of those strings if you do things right but a lot of people who don't know about `strings` could write a password protected feature in a compile bit of code and be embarrassed to see how easy it is to find out what the password is.
The other replies are pretty good. You can find all sorts of goodies in string data inside a binary: hostnames, URL fragments, error messages or templates, credentials. Pretty much any string constants that a program might use.
It's even quite interesting to run strings on programs I wrote and I do it regularly. For example it can be faster to get the version string from an .exe using strings if you know what to grep for than to run "wine program.exe --version".
Not a criticism, not even a nit-pick, but a reflection
"(binaries are kind of the definition of platform-specific, so this is all platform-specific) (this is true!)
When "Actually Portable Executable" took the (geek) proved that the same binary could run on a bunch of platforms, that was a surreal moment I still haven't mentally recovered from.
Here we spent decades trying to solve the cross-platform problem, in so many fractals of ways (Java, cross-platform libraries, etc etc) and the solution was right under our noses all this time.
I personally am not convinced that portable binaries are a net positive. I believe in the era of fast computers that source distribution and local compilation is superior to binary distribution. Unfortunately, much of the software we rely on is so large, and compilers so relatively slow, that binary distribution is something of a necessary evil. I'd rather see more effort towards simpler software components (that naturally compile fast) and faster compilers than portable binaries.
I may have misunderstood, but I'm pretty sure APE is not a "binary format" per se.
It is a script that can be executed on any system. That script can then LOAD a binary. IIRC the original needed to decode it from base64 before it could be loaded.
It's a script that starts with an EXE header ("MZ"). Having both EXE and ELF headers at the same location is obviously impossible, since they start with different bytes.
A possible, although very limited way to have an actual binary program execute on different platforms would be to create a DOS .COM file (which has no header, just the raw machine code) with a valid ELF header. It would then also work on 32-bit Windows via its built-in DOS emulator, and presumably on 64-bit Windows with WSL2.
The start bytes for a 32-bit ELF header decode as 16-bit x86 into:
7F 45 JG +45h
4C DEC SP
46 INC SI
01 01 ADD [BX+DI],AX
01 xx ADD ...
The first instruction is a jump past the end of the ELF header, unfortunately it's conditional. But we have 9 reserved bytes to continue this code, which is enough to undo the effects of the DEC and ADD instructions and then jump to the same address. I've written a 138 byte "Hello world" that works on Linux, DOS and also CP/M-80 that way.
It's possible to have the code that executes under Linux be a small (less than 2K bytes) loader program that creates 16-bit code and data segments and installs a handler for SIGSEGV. It can then jump into the same code that would run under 16-bit DOS, trapping every INT 21h and translating the most important syscalls into their Linux equivalents, kind of like a minimal version of Wine.
I have a proof of concept for that, it only handles the "read", "write" and "exit" syscalls, which is enough to write something like rot13 or hexdump. With a lot more work, it could be possible to produce really non-trivial software that runs in such a restricted environment...
The format of executable files fascinated me back in the early 90s, to the point that I spent weeks writing (in Modula 2) a DOS and Windows executable file viewer that I named VEXE, releasing it as shareware in 1991.
It found a niche following among crackers, even deserving a mention in a +ORC tutorial, https://gist.github.com/callowaysutton/48bdf0245e17e72d41a15..., probably because it could detect various encryption and compression methods used to prevent the reverse engineering of those programs.
For a person with a heavy Python background, can anyone suggest a resource/book that would be a good applied intro to practical low-level programming? I've recently started learning Rust and realized I need to catch up on many things. I haven't taken any compiler course, so maybe that's the reason I am missing so much information. For example, I had no idea that symbols in a binary were a thing or what the difference between ELF/MACH-O was
For folks interested in this topic who have not seen Cosmopolitan and RedBean, αcτµαlly pδrταblε εxεcµταblε (2020) is a great read too: https://justine.lol/ape.html
Nothing in a computer is magic. It was all designed by humans, every single one of which was once a clueless noob. No one is born understanding this stuff.
> This does a bunch of Very Important Things that I don’t understand very well, including calling main. So I won’t explain them.
Honestly, this line was the best in the whole article. It felt like at that moment I knew the person talking to me wasn't trying to prove that they were some sage (personally guilty here) but instead of was someone who wanted to show me something cool that we could both enjoy.
"It is no exaggeration to regard this as the most fundamental idea in programming: The evaluator, which determines the meaning of expressions on a piece of paper, is just another piece of paper." --SICP
The actual /behavior/ of computers, though, tends to emerge from the confluence of complex processes that humans /can't/ understand...our AGI leverages this emergence to enable problem solving in domains where complexity exceeds human capabilities.
Lots of other great discussion in various threads on that other HN post.