How is a binary executable organized? Let's explore it (2014)

jstrieb · 2024-02-02T18:42:18 1706899338

As I've said in other threads https://news.ycombinator.com/item?id=38847750#38862450, I highly recommend writing an ELF by hand at least once. It's a great exercise to understand the basic parts of an executable. It's also helpful if you want to go the opposite direction of this article - bottom-up instead of top-down.

Lots of other great discussion in various threads on that other HN post.

akdas · 2024-02-02T20:27:25 1706905645

Writing an ELF file by hand is something I did recently: https://github.com/avik-das/garlic/blob/master/recursive/elf...

To explain the format to myself and others, I also created an interactive visualization for the bytes in the file. It helps me to click on a byte, see an explanation for it and see related bytes elsewhere in the file highlighted. https://scratchpad.avikdas.com/elf-explanation/elf-explanati...

cqqxo4zV46cp · 2024-02-03T03:42:40 1706931760

I’ve had such fun making interactive educational visualisations like this. My life’s work is going into an interactive simulation of the USB protocol. Unfortunately I’m yet to bang it out over a weekend.

raarts · 2024-02-03T08:23:42 1706948622

Wow. Cool! I’m a CS teacher and definitely going to use this. Thanks for your work! (Anyone aware of a Windows or Mac equivalent?)

akdas · 2024-02-03T18:59:29 1706986769

Thank you! My passion is teaching, and I created interactive visualizations for a CS theory class I taught: https://cstheory.avikdas.com/

If you're interested in collaborating for your classes, reach out to avik at avikdas dot com

bruce343434 · 2024-02-03T00:01:49 1706918509

That's an awesome interactive page! Did you write it by hand, or did you use some sort of generator/tool?

akdas · 2024-02-03T01:28:00 1706923680

I wrote it all by hand :)

Lately, I've been using Svelte for interactive visualizations (see my post on using a tool called Astro with Svelte: https://avikdas.com/2023/12/30/interactive-demos-using-astro... ). But this one is all hand-written JS!

matheusmoreira · 2024-02-03T04:07:35 1706933255

You and your web site are huge inspirations for me.

akdas · 2024-02-03T18:57:34 1706986654

Thank you! That means a lot to me

sunk1st · 2024-02-03T01:10:17 1706922617

I agree it’s very nice. I’d also like to know how it was done.

Also if you click two bytes that are in the same caption group one after another then it bugs out.

akdas · 2024-02-03T03:11:55 1706929915

Thanks for the feedback. I replied in a sibling comment about how I made it.

For the bug, feel free to email me at avik at avikdas dot com if you'd like. The behavior I verified just now (for me) is that if you click one byte to highlight it, then clicking any other byte in the same group will remove the highlighting.

ddingus · 2024-02-03T02:51:49 1706928709

Wow! Very nice work. This is super educational resource. Again nice work.

Retr0id · 2024-02-02T18:47:21 1706899641

Similarly, I'd recommend writing a simple ELF loader. There's a fair bit of implementation complexity in dynamic linking, but if you only support static ELFs then it's straight-forward.

mkoubaa · 2024-02-02T19:24:37 1706901877

I've seriously considered writing an ELF loader that uses a special symbol (like _resolve) where dynamic library resolving is done imperatively. The flexibility from libdl always feel underwhelming.

toast0 · 2024-02-03T05:50:41 1706939441

I had written a static ELF loader for reasons, but when I was no longer to compile a static version of the binary I wanted to load, I found it wasn't too hard to load the system's dynamic loader instead. That's kind of the best of both worlds --- I can run a dynamicly linked binary, and I didn't have to do the linking and relocations.

Isamu · 2024-02-02T22:53:02 1706914382

Yes, likewise I wrote a reader that simply tried to parse every bit of a complex ELF binary to report its structure and quickly found myself in poorly documented territory. It’s an education if you want it.

eichin · 2024-02-02T23:23:55 1706916235

I assume there's a better modern source (assuming for some reason you don't want to reference libbfd &c, but really, you at least want to cross-check with it) but BITD there was an AT&T System V book - I think it was https://books.google.com/books?id=mrImAAAAMAAJ but only at about 70% confidence, it's part of a series and might have been one of the others - that was "arcane but true" for ELF on existing platforms (at the time, which was the mid 1990s, which is why I hope there's a better starting point now...)

matheusmoreira · 2024-02-03T04:06:50 1706933210

Any good resources on the matter? I'm gonna need to write a fully featured ELF loader for my language soon. I need to prepare.

matheusmoreira · 2024-02-03T04:19:41 1706933981

Modifying existing ELFs can also be extremely educational and fun. It's a bit frustrating at first because it's more or less impossible to debug this stuff when it doesn't work but when things finally start working it's awesome. Turns out it's possible to patch ELFs in all sorts of interesting ways. With the auxiliary vector you can even have introspection at runtime: Linux gives us the address of the program header table and from there you can get to anywhere. Just gotta extend a LOAD segment to cover the whole binary.

For example I wrote tools to embed lisp modules and code right into my lisp's interpreter executable. The embedded segment is loaded from the ELF automatically, the interpreter just finds it and runs the code. I'm so proud of this little feature I wrote an article about it.

https://www.matheusmoreira.com/articles/self-contained-lone-...

Would be cool if mainstream languages adopted this method.

kazinator · 2024-02-03T07:01:52 1706943712

These tricks can be easily done simply and portably, without caring about the underlying executable format.

One trick is that you can reserve some global array in the executable, which is prefixed by a byte sequence that doesn't occur anywhere. A small utility can find that byte sequence and write custom data after it to create a customized executable.

I think that, also, many executable formats don't mind if something is appended to the executable. If the executable somehow knows its original size (you can write that size somewhere using the previous trick: no grotty executable format parsing required), it can open itself, seek to that offset and read the data.

I think this might be how CLISP creates an .exe file on Windows; I think it takes the base clisp.exe and combines it with the lispinit.mem image into one file.

matheusmoreira · 2024-02-03T10:59:42 1706957982

> reserve some global array in the executable

This is neat but has a limitation in that it cannot be expanded after the program is compiled and linked. Resizing the array would invalidate all pointers that follow as well as render incorrect any code that takes its size.

This can be solved with a layer of indirection: just append the data to the executable and write its size and file offset in the array. That way the data block can be freely resized. That's the solution people told me to use and indeed the one that I usually see in existing repositories. The problem is you run into some additional complexity later which results in the loss of portability and thus the main reason to choose this method.

> it can open itself

That's the crux of the issue. How does an executable open itself? That's where portability goes out the window. I've seen source code that opens argv[0] which is under the control of the parent program and therefore unreliable. I've seen code which opens /proc/self/exe which is Linux specific. I've seen code that calls Win32 API functions to get the path to the executable. All this just so it can open and load into memory a file which the kernel has already loaded, just so it can read some additional data off of it.

My solution sidesteps that question entirely. It just adds a LOAD segment for the embedded data which instructs the kernel to map it in automatically before the program even runs. There's no need to open, seek or read anything, it will already be there by the time execution begins.

The auxiliary vector contains a pointer to the program's segments table so it can reach the data from there. Then it's just a matter of walking this table looking for the custom descriptor segment. It's all done in a structured way, using the standard magic number locations and ranges. There's no chance of a magic number being recognized by mistake.

The only possible portability issue is the availability of the AT_PHDR, AT_PHENT and AT_PHNUM entries in the auxiliary vector. I'm not sure if they're standard. I know Linux has them and it's all I personally care about but if these entries do turn out to be standard then I can confidently say that my method is portable to any ELF-based operating system.

kazinator · 2024-02-03T18:42:41 1706985761

> How does an executable open itself? That's where portability goes out the window.

It's a way easier nut to crack than parsing executable formats. Various OSs have a method for this:

Linux: /proc/self/exe

Windows: GetModuleFileNameW(NULL, buf, MAX_PATH)

MacOS: _NSGetExecutablePath(buf, &size)

Solaris: getexecname()

BSDs: ...?

These are just "ask and you shall receive" APIs; nothing to analyze.

You may want to solve that problem anyway for other reasons: for instance, if you want your programming language to have a relocatable installation which finds related files using its own path.

On a system where you cannot solve this problem at all, you can use trick 1. Have an array somewhere in the program where you write the installation path. Your installer has to do this, and if the program is moved after installation, some utility program has to fix that up. (I am not crazy about the idea because it changes the checksum of the file at the installation site. You can no longer use an off-the-shelf program like sha256sum to check the integrity of the executable against the provisioned materials upstream.)

versteegen · 2024-02-03T11:26:24 1706959584

Put the offset at the end! Famously, the table of contents of a .zip file are the end rather than the beginning, which has many useful properties (such as being able to patch the contents by only appending to an existing file). And you can concatenate an executable and a .zip and get a file which is both.

matheusmoreira · 2024-02-03T11:33:55 1706960035

Yes. Cosmopolitan libc has support for exactly this. It contains a lot of platform-specific hacks in order to open the executable though. I went through the implementation.

I think the problem is this notion of a "memory image". It would be so much easier if the kernel just copied the entire file into memory and called it a day.

ddingus · 2024-02-03T02:49:00 1706928540

Seconded.

Doing it is basically a hand assembly. One reads the documentation, selects the bytes needed using a processor data sheet, orders them into the various sections, populates the ELF fields and then it really does boil down to typing them all in.

Pre-ELF times, on say an 8 bit Apple 2, the machine code monitor, allowed input of the program bytes directly. Those are then executed.

Storing to disk is only a bit more involved, and there is another opportunity! Disk sector editors allow one to create a file...

...and so it goes!

cylinder714 · 2024-02-03T07:57:15 1706947035

Chris Wellons' "A Magnetized Needle and a Steady Hand," a piece on building an ELF executable from scratch:

https://nullprogram.com/blog/2016/11/17/

mito88 · 2024-02-03T00:28:57 1706920137

is ELF unique to x86/x64?

inkyoto · 2024-02-03T00:55:39 1706921739

It is not, and it has never been.

ELF is platform agnostic, and has been used in operating systems on nearly every existing CPU platform since mid 90's (with a few notable exceptions being OS X, AIX, the embedded world and Windows).

sva_ · 2024-02-02T18:50:16 1706899816

> When the program starts running, you might think it starts at main. It doesn’t! It actually goes to _start. This does a bunch of Very Important Things that I don’t understand very well, including calling main. So I won’t explain them.

The way I understand it, the symbol main is a C-specific thing. The symbol _start is a language-agnostic entry point for the binary that will in this case call main.

A convention of i.e. calling the entry point _start with main's argc/argv would make the format a lot less flexible.

a-priori · 2024-02-02T19:00:28 1706900428

Technically the name _start is not special either. The binary lists its entry point address in a header and that’s where the OS starts execution from. That symbol is just called _start by convention by C and other languages, which is what the linker uses to set the entry point when writing the ELF headers, but if you’re writing your own linker scripts you could call the entry point whatever you want.

Keyframe · 2024-02-02T19:19:31 1706901571

to extend on it, _start is where .text begins and address of that is set by linker

a-priori · 2024-02-02T19:23:09 1706901789

The entry point can be anywhere in the .text section, and often won't be at the beginning of the section.

Retr0id · 2024-02-02T20:21:19 1706905279

Technically it doesn't even need to be in the .text section, it could be anywhere in the address space. You'll get a segfault if it's not somewhere executable though (assuming you're on a system with an appropriately configured MMU)

Keyframe · 2024-02-02T21:18:48 1706908728

yes and then you'll have a bad time, but at the same time per convention _start is where .text begins. You can see where it starts with readelf --file-header <executable> and look at Entry point address field. You can change it, yes.

Retr0id · 2024-02-02T22:33:36 1706913216

No, it's not even a convention, _start is most commonly not where .text begins.

Compiling a static hello world binary on my system (aarch64 fedora 39, gcc -static hello.c -o hello), .text starts at 0x410080, e_entry is at 0x4103c0, and the _start symbol is also at 0x4103c0. This is not unusual at all.

logdahl · 2024-02-02T21:30:51 1706909451

A common hack to reduce ELF size is actually to start the first section (possibly the .text) right on the elf header, as this circumvents the alignment requirements.

Keyframe · 2024-02-02T21:42:37 1706910157

probably not even mandatory.. lots of /usr/bin stuff on my ubuntu machine have __libc_start_main only

matheusmoreira · 2024-02-03T04:00:18 1706932818

> the symbol main is a C-specific thing

Absolutely. And only available on hosted C. Freestanding C lets you have any entry point you want.

> The symbol _start is a language-agnostic entry point for the binary that will in this case call main.

That's just the linker's default. You can set it to a nicer symbol with -Wl,--entry="${symbol}" and GCC even supports setting it directly with no need for the unsightly -Wl.

Also, the entry point is actually a pointer, not a symbol. The linker just takes the address of the symbol you specify and sets the ELF entry point to that.

> calling the entry point _start with main's argc/argv

In addition to argument count and argument vector, the stack also contains the environment vector and the auxiliary vector. The process startup code can be as simple as popping all that stuff off the stack and into the appropriate registers and then calling a C function of your choosing. Note that the entry point is not itself a function: there's nothing to return to. The entry point code finishes with an exit system call to ensure clean process termination when main returns a status code. This is how things work on Linux at least.

vlovich123 · 2024-02-02T19:20:32 1706901632

> Things that I don’t understand very well, including calling main. So I won’t explain them.

It depends on the language runtime, but a common task will be initializing global non-0 statics. For languages like Rust/C/C++ you can also inject variables to be initialized via linker flags. Before start if the program is dynamically linked then I believe the linker runtime is run to resolve the links and then transfer control to _start.

Basically hacks on hacks on hacks added organically to offer extensibility and the hacks have enough social adoption and are good enough that we stick with them.

seanw444 · 2024-02-02T19:54:40 1706903680

> Basically hacks on hacks on hacks added organically to offer extensibility and the hacks have enough social adoption and are good enough that we stick with them.

The more I learn about the deep depths of modern computing, the more I realize that they're actually full of inelegant legacy cruft.

kccqzy · 2024-02-02T20:04:57 1706904297

The style guide at both my previous and current employers explicitly forbids having global non-0 statics for this exact reason: code that runs before main() is very unusual. Many assumptions do not hold.

A far better way is to use function-local statics. A static variable inside a function is initialized when execution reaches that point when the function is being called. Furthermore, such initialization is thread safe so that one initialization happens despite multiple concurrent calls of the function.

The only exception to that style guide rule is the new constinit in C++20. It is sometimes called linker-initialized to make it even clearer that the program didn't do anything to initialize it, the linker did.

cesarb · 2024-02-02T20:34:23 1706906063

> Furthermore, such initialization is thread safe so that one initialization happens despite multiple concurrent calls of the function.

IIRC, there are some popular compilers in which initialization of static variables inside functions is not thread safe (even though AFAIK the C++ standard said they should be).

vlovich123 · 2024-02-02T22:40:50 1706913650

I’m not aware of this problem in MSVC, clang and gcc and those are the most popular afaik.

cesarb · 2024-02-02T23:08:46 1706915326

> I’m not aware of this problem in MSVC

The compiler I was thinking of was indeed MSVC. From a quick web search, it seems that more recent versions of MSVC have changed them to be thread safe by default, so if you can make sure that your code will never be compiled on older MSVC versions (and that nobody will ever use the compiler option which disables the thread-safe initialization), it might be fine to depend on it.

vlovich123 · 2024-02-03T16:25:45 1706977545

True as of msvc 2015 which is 8.5 years old at this point. I agree it’s weird that you can disable it but libraries retaining correctness in the face of random compiler flags is hard (eg ffast-math is a common one that can break your floating point library)

secondcoming · 2024-02-02T22:55:39 1706914539

The issue with Meyers Singletons is that every time they're accessed a flag must be checked first. This is bad in hop loops.

vlovich123 · 2024-02-03T16:29:25 1706977765

It’s not as bad as you might think because the CPU should speculate through it pretty easily.

matheusmoreira · 2024-02-03T03:44:44 1706931884

> initializing global non-0 statics

Does that mean this doesn't work in a freestanding environment? Yet another reason to assiduously avoid global variables. I suppose that's why I never ran into this issue.

> Basically hacks on hacks on hacks

And then there's this insanity right here:

https://blogs.oracle.com/solaris/post/init-and-fini-processi...

I like to imagine that I won't have to actually implement this when I write my ELF loader. Someone please tell me no modern software uses this.

vlovich123 · 2024-02-03T16:33:29 1706978009

> Does that mean this doesn't work in a freestanding environment? Yet another reason to assiduously avoid global variables. I suppose that's why I never ran into this issue.

Why do you say that? A freestanding environment will still enter through _start and execute the compiler generated code to initialize globals before main. What I can’t recall is how compile time non-0 values are initialized - I think it could be part of the bss and initialized by the loader instead (but freestanding environments would implement that too as part of being a target for a language) but both them and runtime initialized globals initialized between _start and main would work.

Basically freestanding targets might not give you access to runtime APIs (eg POSIX) but the language is still the language and all features defined as language features should work and it’s the responsibility of the compiler and target environment to provide that guarantee.

matheusmoreira · 2024-02-03T20:11:42 1706991102

> Why do you say that?

Your post gave me the impression the compiler generates initializers for static globals and calls them during program initialization. Is this not correct?

> A freestanding environment will still enter through _start and execute the compiler generated code to initialize globals before main.

Freestanding C generally implies not linking in any standard library code though. It doesn't make any sense to choose the freestanding C dialect only to link in the standard hosted C library.

The _start symbol is provided by the C library's so called standard system startup files. The standard _start calls all the initialization functions that the C program expects. If you override it with your own _start, then none of these functions will be called. I thought you meant the static data initialization was handled in one of these functions.

neerajsi · 2024-02-03T06:05:58 1706940358

Wow, great read. I worked on the Windows DLL loader and we had to implement similar mechanics for similar reasons. The PE image format makes some part of this a little easier, but the complexity is essentially the same.

heinrichhartman · 2024-02-02T21:08:29 1706908109

I started my blog in 2012, when I shifted my academic career from Mathematics to Computer Science. This topic was literally the first thing that I studied:

https://heinrichhartmann.com/archive/Dissecting-Hello-World....

Never regretted going down this deep rabbit hole. IIRC, Julia also has a math background. Maybe it's the desire for bottom-up reasoning that leads math folks towards experiments like this. Great to see her making this approachable for a large audience.

zerotolerance · 2024-02-02T19:07:54 1706900874

Julia's articles are always excellent. I've always had great results teaching people that compiled code doesn't keep secrets by demoing `strings`.

sva_ · 2024-02-02T22:30:33 1706913033

Explain that to the German judges that fined some poor fella for finding passwords in a binary by [doing the equivalent of] running strings on it. They claim he 'circumvented' the software's 'security measures'.

https://www.theregister.com/2024/01/19/germany_fine_security...

actionfromafar · 2024-02-02T19:36:46 1706902606

Can you elaborate?

dilyevsky · 2024-02-02T19:42:46 1706902966

If you put something like

  if mySecretPassword == "Qwerty123" {
     ...

then "Qwerty123" will be easily seen by strings utility. Which is pretty obvious but I'm guessing some junior folks will be surprised.

trealira · 2024-02-04T22:43:56 1707086636

You probably know this, but in C, if you wrote the comparison like this

  if (password[0] == 'Q'
   && password[1] == 'w'
   /* the rest of the letters... */
   && password[8] == '3'
   // C strings are 0 terminated
   && password[9] == '\0')
  {
      ...
  }

It will probably compile to something like this on x86-64, assuming a "password" is a pointer held in RDI (although I didn't look at a compiler's output):

  cmp byte [rdi], 0x51 ; 'Q'
  jne false
  ;; the rest of the letters...
  cmp byte [rdi + 9], 0x0 ; '\0'
  jne false

The benefit is that the strings utility wouldn't see the string "Qwerty123" anywhere in the binary.

Unicode and bounds checking code probably complicate this generated assembly in other languages.

It would be nice if there were some way to write a macro or an inline function so that you could write a comparison like

  if inlineStringEqualP(mySecretPassword, "Qwerty123") {
    ...

And then it would expand into something like the assembly I wrote above, so that way, the string literal "Qwerty123" isn't embedded in the final binary executable anywhere. I bet it's possible with C++ templates somehow, but it would be messy.

dilyevsky · 2024-02-10T16:02:53 1707580973

I think that was the very first crackmes i ever solved =)

xutopia · 2024-02-02T20:20:34 1706905234

You can run the `strings` command on most executables (or PDFs) and get an output of the strings represented in the file. Of course you can obfuscate some of those strings if you do things right but a lot of people who don't know about `strings` could write a password protected feature in a compile bit of code and be embarrassed to see how easy it is to find out what the password is.

zerotolerance · 2024-02-02T22:19:53 1706912393

The other replies are pretty good. You can find all sorts of goodies in string data inside a binary: hostnames, URL fragments, error messages or templates, credentials. Pretty much any string constants that a program might use.

versteegen · 2024-02-04T06:30:03 1707028203

It's even quite interesting to run strings on programs I wrote and I do it regularly. For example it can be faster to get the version string from an .exe using strings if you know what to grep for than to run "wine program.exe --version".

shzhdbi09gv8ioi · 2024-02-02T19:42:35 1706902955

man strings

actionfromafar · 2024-02-02T19:41:22 1706902882

Not a criticism, not even a nit-pick, but a reflection

"(binaries are kind of the definition of platform-specific, so this is all platform-specific) (this is true!)

When "Actually Portable Executable" took the (geek) proved that the same binary could run on a bunch of platforms, that was a surreal moment I still haven't mentally recovered from.

Here we spent decades trying to solve the cross-platform problem, in so many fractals of ways (Java, cross-platform libraries, etc etc) and the solution was right under our noses all this time.

norir · 2024-02-02T20:15:59 1706904959

I personally am not convinced that portable binaries are a net positive. I believe in the era of fast computers that source distribution and local compilation is superior to binary distribution. Unfortunately, much of the software we rely on is so large, and compilers so relatively slow, that binary distribution is something of a necessary evil. I'd rather see more effort towards simpler software components (that naturally compile fast) and faster compilers than portable binaries.

csdvrx · 2024-02-02T20:49:18 1706906958

You can have both. APE are generally faster and smaller.

Fat APEs (aarch64 + x86_64) are larger, but interesting in their own way.

saagarjha · 2024-02-03T10:47:22 1706957242

Not in general.

vitiral · 2024-02-02T23:41:05 1706917265

I may have misunderstood, but I'm pretty sure APE is not a "binary format" per se.

It is a script that can be executed on any system. That script can then LOAD a binary. IIRC the original needed to decode it from base64 before it could be loaded.

So... it's an executable binary loader

rep_lodsb · 2024-02-03T16:49:55 1706978995

It's a script that starts with an EXE header ("MZ"). Having both EXE and ELF headers at the same location is obviously impossible, since they start with different bytes.

A possible, although very limited way to have an actual binary program execute on different platforms would be to create a DOS .COM file (which has no header, just the raw machine code) with a valid ELF header. It would then also work on 32-bit Windows via its built-in DOS emulator, and presumably on 64-bit Windows with WSL2.

The start bytes for a 32-bit ELF header decode as 16-bit x86 into:

    7F 45    JG    +45h
    4C       DEC   SP
    46       INC   SI
    01 01    ADD   [BX+DI],AX
    01 xx    ADD   ...

The first instruction is a jump past the end of the ELF header, unfortunately it's conditional. But we have 9 reserved bytes to continue this code, which is enough to undo the effects of the DEC and ADD instructions and then jump to the same address. I've written a 138 byte "Hello world" that works on Linux, DOS and also CP/M-80 that way.

It's possible to have the code that executes under Linux be a small (less than 2K bytes) loader program that creates 16-bit code and data segments and installs a handler for SIGSEGV. It can then jump into the same code that would run under 16-bit DOS, trapping every INT 21h and translating the most important syscalls into their Linux equivalents, kind of like a minimal version of Wine.

I have a proof of concept for that, it only handles the "read", "write" and "exit" syscalls, which is enough to write something like rot13 or hexdump. With a lot more work, it could be possible to produce really non-trivial software that runs in such a restricted environment...

vitiral · 2024-02-04T13:47:24 1707054444

Don't tell jart this or she'll probably add it to cosmopolitan

actionfromafar · 2024-02-03T01:23:21 1706923401

Same same, but different…

How can the syscalls work the same in linux, windows and n my macos?!

It’s bananas

vitiral · 2024-02-04T13:48:32 1707054512

They don't. APE loads a different syscall table depending on the platform. Or something like that, I don't claim to fully understand the internals

sergejf · 2024-02-02T21:28:46 1706909326

The format of executable files fascinated me back in the early 90s, to the point that I spent weeks writing (in Modula 2) a DOS and Windows executable file viewer that I named VEXE, releasing it as shareware in 1991.

It found a niche following among crackers, even deserving a mention in a +ORC tutorial, https://gist.github.com/callowaysutton/48bdf0245e17e72d41a15..., probably because it could detect various encryption and compression methods used to prevent the reverse engineering of those programs.

EmilioPeJu · 2024-02-03T11:24:24 1706959464

If you are curious about how small a ELF binary file can be, you might like the following amusing article: https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...

setheron · 2024-02-03T04:57:06 1706936226

(Self promotion) Check out my tool which let's you explore ELF using SQL

https://github.com/fzakaria/sqlelf

alkh · 2024-02-03T05:56:18 1706939778

For a person with a heavy Python background, can anyone suggest a resource/book that would be a good applied intro to practical low-level programming? I've recently started learning Rust and realized I need to catch up on many things. I haven't taken any compiler course, so maybe that's the reason I am missing so much information. For example, I had no idea that symbols in a binary were a thing or what the difference between ELF/MACH-O was

fragmede · 2024-02-02T20:27:35 1706905655

cat-ing a binary to the terminal is a recipe for sadness. I like | hd, which is hexdump -C, though that's just as impenetrable to the naked eye.

adolph · 2024-02-02T21:57:06 1706911026

For folks interested in this topic who have not seen Cosmopolitan and RedBean, αcτµαlly pδrταblε εxεcµταblε (2020) is a great read too: https://justine.lol/ape.html

https://redbean.dev/

EmilioPeJu · 2024-02-03T11:23:09 1706959389

A book about this topic which I enjoyed is "Learning Linux Binary Analysis" by Ryan O'neil.

dishsoap · 2024-02-03T02:15:26 1706926526

I think ELF should absolutely be mentioned in the title.

as1009 · 2024-02-02T22:51:26 1706914286

Great thread

as1009 · 2024-02-02T22:51:48 1706914308

great thread, thank you!

randall · 2024-02-02T21:47:54 1706910474

Amazingly helpful!

lisper · 2024-02-02T19:55:59 1706903759

> Executables aren’t magic.

Nothing in a computer is magic. It was all designed by humans, every single one of which was once a clueless noob. No one is born understanding this stuff.

JoshuaRogers · 2024-02-02T20:28:43 1706905723

> This does a bunch of Very Important Things that I don’t understand very well, including calling main. So I won’t explain them.

Honestly, this line was the best in the whole article. It felt like at that moment I knew the person talking to me wasn't trying to prove that they were some sage (personally guilty here) but instead of was someone who wanted to show me something cool that we could both enjoy.

Wonderful write up.

infinite8s · 2024-02-02T21:15:27 1706908527

"It is no exaggeration to regard this as the most fundamental idea in programming: The evaluator, which determines the meaning of expressions on a piece of paper, is just another piece of paper." --SICP

latexr · 2024-02-02T21:16:56 1706908616

> Nothing in a computer is magic.

I think that’s covered by the text, in the sentence right after that one (emphasis mine):

> ELF is a file format like any other!

olsher · 2024-02-02T21:31:18 1706909478

The actual /behavior/ of computers, though, tends to emerge from the confluence of complex processes that humans /can't/ understand...our AGI leverages this emergence to enable problem solving in domains where complexity exceeds human capabilities.