Tiny ELF Files: Revisited in 2021

JoshTriplett · on Oct 13, 2021

It's possible to make the code slightly smaller, by relying on Linux zeroing registers when the program starts. That's part of the Linux ABI, and couldn't be changed without breaking programs, so it's safe to rely on.

Reducing the size of the code allows embedding it in less of the header, giving more options for code layout.

JoshTriplett · on Oct 13, 2021

Using this, I managed to get the file down to 114 bytes, while still printing "Hello, world!\n" and returning 0:

    [bits 64]
    file_load_va: equ 4096 * 40
    
    db 0x7f, 'E', 'L', 'F'
    db 2
    db 1
    db 1
    entry_point:
      mov al, 1
      mov esi, file_load_va + message
      jmp code_chunk_2
    dw 2
    dw 0x3e
    dd 1
    dq entry_point + file_load_va
    dq program_headers_start
    code_chunk_2:
      mov edi, eax
      mov dl, message_length
      syscall
      mov al, 60
      xor edi, edi
      syscall
    db 0 ; usable
    db 0 ; usable
    dw 0x38
    dw 1
    ; We simply deleted the three two-byte fields that used to be here. The only
    ; one that mattered, the number of section headers, will still be zero due to
    ; the upper two bytes of the field at the start of the program header being
    ; zero.
    
    program_headers_start:
    ; These next two fields also serve as the final six bytes of the ELF header.
    dd 1 ; Program header type: must be 1 (loadable segment)
    dd 5 ; Program header flags: must be 5 (readable and executable)
    dq 0 ; Offset of loadable segment in the file
    dq file_load_va ; Address in memory to load the segment into ; could change
    message_length: equ 14
    message:
    db `Hello, w`
    ; size in file then size in memory; can be anything non-zero and equal
    last_bytes: equ `orld!\n`
    dq last_bytes
    dq last_bytes
    dq 0 ; alignment; usable

This compiles and runs, and it's 114 bytes:

    $ nasm -f bin hello.asm -o hello && chmod a+x hello && ./hello
    Hello, world!
    $ ls -l hello
    -rwxr-xr-x 1 josh josh 114 Oct 13 10:28 hello

Getting the file any smaller would require finding a way to overlap the program header further inside the ELF header. As the article observes, that seems challenging given the validation the kernel does.

JoshTriplett · on Oct 14, 2021

Managed to get it down to 105 bytes by further overlapping the program header into the ELF header:

    [bits 64]
    file_load_va: equ 4096 * 40
    
    db 0x7f, 'E', 'L', 'F'
    entry_point:
      inc al
      mov esi, file_load_va + message
    pass2:
      xor edi, 1
      jmp code_chunk_2
    dw 2
    dw 0x3e
    code_chunk_2:
      mov dl, message_length
      jmp code_chunk_3
    dq entry_point + file_load_va
    dq program_headers_start
    code_chunk_3:
      syscall
      mov al, 60
      jmp pass2
    db 0 ; usable
    db 0 ; usable
    db 0 ; usable
    program_headers_start:
    dd 1 ; Program header type: must be 1 (loadable segment)
    db 0x5 ; Program header flags: low bits must be 5 (readable and executable); high bytes don't matter
    dw 0x38
    dw 1
    ; High 7 bytes of offset of loadable segment
    db 0
    db 0
    db 0
    db 0
    db 0
    db 0
    db 0
    dq file_load_va ; Address in memory to load the segment into ; could change
    message_length: equ 14
    message:
    db `Hello, w`
    ; size in file then size in memory; can be anything non-zero and equal
    last_bytes: equ `orld!\n`
    dq last_bytes
    dq last_bytes
    dq 0 ; alignment; usable

yalue · on Oct 13, 2021

Better yet, another commenter [1] found that you can clobber the number of section header entries, as long as the size of a section header entry is 0. So, now the smallest size is two bytes shorter: 112 bytes for a full "Hello, world!", with an 8-byte "alignment" field to spare!

I'll need to update this article. The only annoying part will be scribbling over the hexdump output again.

[1] https://news.ycombinator.com/item?id=28849023

akavel · on Oct 13, 2021

I wonder if there are some nice tools for "scribbling over hexdump" somewhere, and also rendering pretty output based on that. It tends to be really helpful both when synthesizing/assembling some binary formats, as well as debugging/decoding/disassembling existing ones (and then ideally also writing blogposts based on that). I saw some "annotation" tool like this in one disassembler I tried once, but it wasn't super great, and didn't allow for easy tweaking & moving of annotation groups after doing some changes in the output. I'm pretty sure this is something that's done very often by reverse-engineering people, so I'd assume tools like this should already be popular, just I don't know how to find them? I know there's also some Lua API with support for disassembling many protocols in WireShark, but I don't suppose it's easy to prototype & quickly iterate new formats in it (?)

For some really beautiful hand-made annotated binary format hexdump, see e.g.: https://github.com/corkami/pics/blob/master/binary/DalvikEXe...

If someone knows of tools fitting more or less what I described above, I'd be super grateful for some recommendations!!

mistrial9 · on Oct 13, 2021

chuckles did that daily in 1986-87 on 68020 asm

bregma · on Oct 13, 2021

It irks me that articles like this (and the originals it is based on) claim to creating tiny ELF files when they are, in fact, just tricking a particular version of the Linux kernel into loading non-ELF binaries that claim to be valid. It's klever and all, but it's not doing what's claimed.

There are many OS kernels out there that use the ELF format for executable binaries. Many of them are not one particular Linux kernel. The fact that there is a series of articles based on the changing Linux kernel no longer accepting the old "tiny ELF" files is important.

I'll give kudos to the authors of these articles because they're doing something clever and fun. I just feel the urge to clarify to the internet that they're poking at exploits in a Linux kernel loader by offering it non-standard binary files and not creating tiny ELF files at all.

Hello71 · on Oct 13, 2021

The original article is extremely clear on this fact.

The title:

> A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux

as opposed to ELF Executables for non-Linux systems, or a.out executables for Linux. In the middle of the article, when the illegal shenanigans start:

> Unless, that is, we could change the contents of the structures to make them match even further....

> How many of these fields is Linux actually looking at, anyway? For example, does Linux actually check to see if the e_machine field contains 3 (indicating an Intel 386 target), or is it just assuming that it does?

At the end of the article:

> Of course, half of the values in this file violate some part of the ELF standard, and it's a wonder that Linux will even consent to sneeze on it, much less give it a process ID. This is not the sort of program to which one would normally be willing to confess authorship.

I don't see how the author could possibly be any clearer that this is Linux-specific and will probably not work (unmodified) on non-Linux systems.

yalue · on Oct 13, 2021

What about the file in the article makes it a "non-ELF binary"? The only thing I can think of is putting junk data in place of bytes the ELF spec designates as "padding" and expects to be 0. Other than that, it seems totally reasonable that putting garbage in place of a section-header offset with no headers, a physical address, and an alignment field wouldn't make it an acceptable ELF.

It's entirely on the Linux kernel to not verify these fields. However, its failure to verify these doesn't make the file not an ELF. It just makes it an ELF with a stupid alignment requirement that Linux happens to ignore.

LargoLasskhyfv · on Oct 13, 2021

Somewhere else Justine probably lols by being αcτµαlly pδrταblε εxεcµταblε with hello.bin at 55 bytes.

[1] https://justine.lol/ape.html

moonchild · on Oct 13, 2021

> mov rsi, file_load_va + message: [...] This instruction ends up taking 10 bytes: two bytes for the opcode, and a full 8 bytes for the address. We can replace this with mov esi, file_load_va + message to save 5 bytes

Nasm should perform this optimization automatically.

  $ cat t.s
  bits 64
  mov rsi, 0xffffff
  $ nasm -o t.o t.s
  $ ndisasm -b 64 t.o
  00000000  BEFFFFFF00        mov esi,0xffffff
  $ nasm --version
  NASM version 2.15.05 compiled on Sep 24 2020

yalue · on Oct 13, 2021

Odd. For some reason, my version of nasm didn't do that, and instead opted for the lengthier 10-byte instruction shown in the article's objdump output. Maybe it's just an older version of nasm.

moonchild · on Oct 13, 2021

Additionally, there is an instruction ‘mov r64,imm32’ which nasm would select even if it did not have this crazy peephole; that is only 7 bytes, not 10 (rex, op, modr/m, id).

abaines · on Oct 13, 2021

I had a go at this myself a few years ago [0]. But I wanted a dynamically linked ELF instead of a static one so that I could load SDL, OpenGL, etc. That requires extras like a DYNAMIC section which takes up quite a bit more space.

I ended up at 728 bytes without any self-extracting techniques. It played a nice animation though.

I have not tested it recently, I expect it won't run any more as it used "bad things", like relying on ecx having a specific value when the program started, but the ideas should still be relevant.

[0]: https://github.com/baines/demostuff

breadbox · on Oct 13, 2021

Nice to see another approach to this subject! Kudos to the author.

I have a couple of responses to specific points brought up in the article.

The author suggests that the original 45-byte executable no longer works on modern systems. If so, this is news to me. Admittedly my current machine is a bit behind the cutting edge (4.15), but what's there should still work. If people are finding the current version to fail for them, I'd appreciate some details on their setup.

* I respectfully disagree that 32-bit executables are "less relevant" today; I suspect they will continue to be supported for many, many years to come. Of course for a new explorer 64-bit executables are far more interesting, but when you're shaving bytes at a time, you can't beat a 32-bit executable.

* Many people are unaware that my original essay is only the first of a series that I wrote. (All of the essays are linked at the bottom of the original.) I note that my smallest 64-bit ELF executable without introducing invalid fields is also 120 bytes, so that's cool.

* However, by taking advantage of unvalidated fields, I was able to produce a working 64-bit ELF executable that is 84 bytes in size. The overlapping is a bit tricky, but I've verified that it continues to work on my box. See http://www.muppetlabs.com/~breadbox/software/tiny/return42.h... -- all variations of my return-42 executables are collected there.

* My smallest 64-bit ELF executable that prints "hello, world\n" (no punctuation: I always use the string from K&R) is 98 bytes. I don't have the assembly for that one posted on my site, but it uses the same layout as the 86-byte executable.

an-unknown · on Oct 13, 2021

PlaidCTF 2020 had an interesting challenge, where you had to write a minimal 64bit shared object file which had to spawn a shell when preloaded to any (dynamically linked) program. There are write-ups describing how people did it: https://ctftime.org/task/11305

Naturally, the files from that challenge were bigger than the minimal ELF file presented here, because dynamic linking requires more sections.

colatkinson · on Oct 13, 2021

Funnily enough, I tried the same thing a while back [0], and got basically the same minimum size (112 bytes in my case). Though it's not nearly as impressive as the article, since all mine did was _exit(42).

I suppose I have no choice but to spend a few hours to try and shave off a couple more bytes now.

[0]: https://github.com/colatkinson/tiny_x64

yalue · on Oct 13, 2021

Interesting! So, it looks like you can clobber the number of section header entries, because, with this alignment, the _size_ of a section header is 0. Cool!

roca · on Oct 13, 2021

One tiny correction: the return value of write() is the number of bytes written. That's less than 256 so overwriting the low byte of rax still works.

breadbox · on Oct 13, 2021

Unless `write()` returns an error, in which case rax will contain a negative value. I've had to ditch that shortcut many times because of this.

unwind · on Oct 13, 2021

Wow, that is crazy. It seems to rely on having the headers mapped into the address space of the process they describe, is that behavior specified somewhere?

When I've worked with loading ELFs, in embedded contexts, the loader just used the headers for itself, but they never ended up in the final code being run so a trick like this would just crash. Interesting.

JoshTriplett · on Oct 13, 2021

> It seems to rely on having the headers mapped into the address space of the process they describe, is that behavior specified somewhere?

There's a field in the header for the offset of the loadable segment in the file. This binary sets that offset to zero, so the kernel loads the entire file into memory, headers and all.

unwind · on Oct 13, 2021

Nice! Thanks.

Edit: That sounds like either an exploit or a murky corner case, interesting.

lmm · on Oct 13, 2021

Not really. The header tells the kernel "load this byte range into memory" and it does, that's how running any program works. Overlapping the program with the headers is the trick, but that was already mentioned.

dfox · on Oct 13, 2021

In non-embedded contexts it is usual for ELF executables to have program header that includes ELF header and program headres and maps this to begining of address space used by that binary. At least on Linux the address where it is mapped to is accessible as symbol __ehdr_start.

5- · on Oct 13, 2021

see also http://www.sizecoding.org/wiki/Linux for the 'practical' (demoscene) application of the original 32-bit results.

lifthrasiir · on Oct 13, 2021

Note that Linux demos typically use much smaller x86-32 instead of modern x86-64 as in the article.