Best of show – abuse of libc

Groxx · on Jan 8, 2021

>Format specifiers can take extra “arguments”. - "%hhn": store the number of bytes written mod 256 to the char pointer ...

Oh boy. I'll put that down for my "thing I don't think I wanted to know" of the day.

rightbyte · on Jan 8, 2021

There is some innocent beauty in the twistedness of printf - especially with GNU extensions.

flatiron · on Jan 8, 2021

So I can port doom to gnu printf?

npongratz · on Jan 8, 2021

GNU's printf is Turing complete[0]... so "yes."

[0] Mentioned (but not directly linked) by TFA:

https://www.usenix.org/conference/usenixsecurity15/technical...

stavros · on Jan 9, 2021

I've always wanted a Turing-complete printing function.

josefx · on Jan 9, 2021

We already have post script for that.

bluGill · on Jan 9, 2021

I'd start by creating a llvm backend for printf. Should be fun.

anthk · on Jan 9, 2021

Port a zmachine first ;).

tomjakubowski · on Jan 8, 2021

GNU's printf specifier language is Turing complete, I believe.

alisonkisk · on Jan 8, 2021

Theres a great example of what you can do with this, submitted and discussed on HN here: https://news.ycombinator.com/item?id=25690319

notretarded · on Jan 8, 2021

>Format specifiers can take extra “arguments”. - "%hhn": store the number of bytes written mod 256 to the char pointer ...

Oh boy. I'll put that down for my "thing I don't think I wanted to know" of the day.

snerp · on Jan 9, 2021

There is some innocent beauty in the twistedness of printf - especially with GNU extensions.

dfcowell · on Jan 9, 2021

So I can port doom to gnu printf?

kortilla · on Jan 9, 2021

GNU's printf specifier language is Turing complete, I believe.

a1369209993 · on Jan 9, 2021

GNU's printf is Turing complete[0]... so "yes."

[0] Mentioned (but not directly linked) by TFA:

https://www.usenix.org/conference/usenixsecurity15/technical...

SahAssar · on Jan 9, 2021

GNU's printf specifier language is Turing complete, I believe.

gu5 · on Jan 9, 2021

What is up with this thread? These comments are duplicated from the top thread...

dang · on Jan 9, 2021

Is it possible that it's a joke based on the material of the OP? Maybe we're trapped in some sort of International Obfuscated Internet Thread Contest...

Edit: ok, I'm guessing it's a joke about Turing completeness. Loops, you know.

saagarjha · on Jan 9, 2021

It’s a joke about the recursion introduced here: https://news.ycombinator.com/item?id=25691615.

Bootvis · on Jan 9, 2021

Well executed jokes on HN. 2021 already is a crazy year.

gu5 · on Jan 9, 2021

Ohhhh... Now I get it.

exikyut · on Jan 9, 2021

(The usernames are different)

FartyMcFarter · on Jan 9, 2021

Presumably it needs a loop around it, so it's not Turing-complete by itself?

npongratz · on Jan 9, 2021

No need to use a loop around it, printf can take care of that pesky detail for you! To quote [0] (my emphasis):

> To achieve full Turing-complete computation, we need a way to loop a format string. This is possible by overwriting the pointer inside printf() that tracks which character in the format string is currently being executed. The attacker is unlucky in that at the time the “%n” format specifier is used, this value is saved in a register on our 64-bit system. However, we identify one point in time in which the attacker can always mount the attack. The printf() function makes calls to puts() for the static components of the string. When this function call is made, all registers are saved to the stack. It turns out that an attacker can overwrite this pointer from within the puts() function. By doing this, the format string can be looped.

> An attacker can cause puts() to overwrite the desired pointer. Prior to printf() calling puts(), the attacker uses “%n” format specifiers to overwrite the stdout FILE object so that the temporary buffer is placed directly on top of the stack where the index pointer will be saved. Then, we print the eight bytes corresponding to the new value we want the pointer to have. Finally, we use more “%n” format specifiers to move the buffer back to some other location so that more unintended data will not be overwritten.

[0] https://www.usenix.org/system/files/conference/usenixsecurit..., Appendix B "Printf is Turing-complete".

saagarjha · on Jan 9, 2021

Or, you know, you can just use printf to overwrite the return address and ROP your way to a shell.

xyproto · on Jan 9, 2021

Herein lies madness

moonchild · on Jan 9, 2021

Not portable, but if you can get the address of the stack you can force printf to overwrite the return address, obviating the loop.

shakna · on Jan 9, 2021

It does not need the loop. It might be easier to understand by looking at something like this. [0] That printf allows for this kind of Turing complete control flow is well known [1].

[0] https://github.com/HexHive/printbf

[1] http://nebelwelt.net/publications/files/15SEC.pdf

moonchild · on Jan 9, 2021

That code is wrapped in a loop - https://github.com/HexHive/printbf/blob/master/src/pbf_pre.c...

fulafel · on Jan 9, 2021

The implementation is accidentalyl turing-complete because you can exploit it to get arbitrary memory writes. But the language as specified is not Turing-complete.

londons_explore · on Jan 9, 2021

Are there any scanners out there that will detect user input ending up as the format string of a printf?

Perhaps a scanner than I can run against all of github, and then rank results by the number of times that code is exposed on a high value server connected to the internet...?

Someone · on Jan 9, 2021

Any modern C compiler will already warn you if the format string isn’t a string literal (https://stackoverflow.com/questions/32362918/error-format-st...)

I don’t think it’s worth the effort to extend that to look for tainted strings, not because it wouldn’t be useful, but because it would be hard to do (as an extreme example: is data read from a file user input? It could be a file containing internationalization info)

The (relatively) few programs that construct format strings on the fly will have to add pragmas to disable these warnings.

blue-dragonfly · on Jan 9, 2021

A winner from 1993 is very interesting too:

https://www.ioccc.org/years.html#1993_dgibson

It implements Conway's Game of Life by creating a DSL using the C preprocessor and printf. The output is a program (several initial boards are supplied to bootstrap) which is the input program to be compiled and run to create the next generation. This is the program for a second generation:

    LIFE

    L _ _ _ _ _
    L _ _ O _ _
    L _ _ _ O _
    L _ O O O _
    L _ _ _ _ _

    GEN 2 STAT 328960
    END

Each symbol like "LIFE" is a macro, the board is the program.

dang · on Jan 8, 2021

General thread here: https://news.ycombinator.com/item?id=25651942

navaati · on Jan 8, 2021

Thanks for all, dang !

aftbit · on Jan 8, 2021

Personally I find it amusing that the 0-signal comment "Thanks for all dang" is upvoted while the opposite 0-signal comment "Thanks for nothing dang" is downvoted. I mean, I think dang is chill, but neither of these really contributes to the discussion any more than the other, so shouldn't they have the same score? Upvotes really are a popularity contest these days.

dang · on Jan 9, 2021

This was addressed by pg over a decade ago: https://news.ycombinator.com/newswelcome.html

Empty comments can be ok if they're positive. There's nothing wrong with submitting a comment saying just "Thanks." What we especially discourage are comments that are empty and negative—comments that are mere name-calling.

If you think in terms of what's good/bad for community it may make more sense.

(I hope it's clear this applies whether or not the mods were mentioned in either a positive or negative way.)

saagarjha · on Jan 8, 2021

It's not clear what the votes are. However, the latter was written by someone who has banned for what appears to be their habit of leaving low-value comments.

thedufer · on Jan 9, 2021

The negative comment is from a hellbanned account (which is, frankly, unsurprising). It's not even possible to downvote it, as far as I can tell, due to being DOA.

michaelcampbell · on Jan 9, 2021

Always have been.

acekingspade · on Jan 9, 2021

It's hard to believe that this is the same person with multiple widely-cited ML papers[0]. It's jaw-dropping how talented someone can be.

https://scholar.google.com/citations?user=q4qDvAoAAAAJ&hl=en...

onurgu · on Jan 9, 2021

Thank you for the pointer, it is really amazing.

saagarjha · on Jan 9, 2021

For those interested in more Turing complete format strings, look no further than the "sprint" challenge from this year's Google CTF Quals: https://ctftime.org/task/12834. It's sprintf in a loop this time and the program simulates a maze: https://github.com/google/google-ctf/tree/master/2020/quals/...

enedil · on Jan 9, 2021

The author works at Google, so I suspect he's the same who created this challenge. Really enjoyable, although I didn't manage to solve it during contest.

rramadass · on Jan 9, 2021

The printf format string is actually a little language. In the book The Practice of Programming, Kernighan and Pike show how you can devise a similar format string to pack/unpack network packets.

badsectoracula · on Jan 8, 2021

Up next: a C compiler that compiles to printf statements :-P

hahajk · on Jan 8, 2021

https://github.com/HexHive/printbf

well this is a brainfuck interpreter inside printf. I’m pretty sure there are plenty of c-to-bf transpilers.

felixr · on Jan 8, 2021

This is by the same author as the ioccc entry and also one of authors of the paper showing the turing completeness of printf http://nebelwelt.net/publications/#15SEC

lifthrasiir · on Jan 9, 2021

Ah, that explains everything. I have already seen this technique before and wondered why this entry has to be the best of show---I don't doubt it is worth the prize, just that it didn't sound very novel. But it all makes sense if the technique is not well known and authors tried to revitalize that.

klyrs · on Jan 8, 2021

That's fun, but esoteric languages in general and brainfuck in specific tend to lack things you'd want out of c: file system access, system calls, etc.

archi42 · on Jan 9, 2021

Hm, I think you could add numeric syscalls, similar to what happens at the asm level. E.g. put the syscall id and some parameters on the "stack", then let the interpreter run the syscall with a new "instruction" e.g. '!'. This could even substitute '.' (putchar) and ',' (getchar), since these are very much just syscalls. So that would reduce the number of instructions by one (to 7).

Oh, getting to 6 would also be fun: One might replace '[' and ']' with a conditional branch '?'. It just needs two parameters: condition and (signed) number of instructions to jump. Adds the bonus (much like normal asm) to write moch more ~~horribly abusive~~ flexible control flow than a structured "while(*ptr)".

enedil · on Jan 9, 2021

It's already implemented: https://github.com/ajyoon/systemf There is even an HTTP server built with it.

archi42 · on Jan 9, 2021

Why am I not even surprised...? I thought about writing a sentence about how (relatively) easy it would be to build a verified compiler (think CompCert-for-brainfuck); I'd guess the outcome is one of (a) "someone already did that as well, here is the link" or (b) "I spent the weekend with that, here is the project on github". The Internet is awesome, as are people :)

jcande · on Jan 9, 2021

That was my approach for an analogous program that uses memcpy instead of printf. I didn't go with the jit-style you describe, however. If you're curious here's how I setup the syscalls https://github.com/jcande/xenocryst/blob/master/src/gadgets.... and here's the main loop https://github.com/jcande/xenocryst/blob/master/src/exec.c#L...

lathiat · on Jan 9, 2021

There is somewhere a compiler that outputs to all sorts of crazy languages including awk, sed, printf, etc.. but I can't find it right now. Hopefully someone knows what I'm talking about.

I feel like it did LLVM IR to a bunch of languages or something like that.. but my memory is faulty.

lifthrasiir · on Jan 9, 2021

You are looking for ELVM: https://github.com/shinh/elvm/ (I have seen many others, but in terms of activity it seems the most maintained one.)

lathiat · on Jan 9, 2021

Thankyou, that is the one! :) I love projects like that.

GrumpySloth · on Jan 9, 2021

I'm writing my first C99 compiler. IOCCC sounds like a great source of test material.

olliej · on Jan 9, 2021

%n has frequently been used as an attack vector - generally in the context of the other poor practice of printf(<attacker controlled string>, ...)

It’s actually intentionally disallowed in some libc implementations.

lxe · on Jan 8, 2021

How did printf end up here in the first place? Decades of feature additions, or were these features a part of an early spec?

wahern · on Jan 8, 2021

%n was defined in C89, the first C standard: http://port70.net/~nsz/c/c89/c89-draft.html#4.9.6.1

Looking at old source code, the earliest implementation I found is 4.3BSD Tahoe (1988). See https://www.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-Tahoe/usr/... Second oldest I found was Tenth Edition [Research] Unix (1989). See ocvt_n at https://www.tuhs.org/cgi-bin/utree.pl?file=V10/libc/stdio/vf... I couldn't find support in earlier implementations archived on that site.

segfaultbuserr · on Jan 9, 2021

> Looking at old source code, the earliest implementation I found is 4.3BSD Tahoe (1988).

You are the HN historian of the day.

wahern · on Jan 9, 2021

Another interesting factoid is that macOS only supports %n if the format string is located in read-only memory. Per printf(3) on macOS:

> For this reason, a format argument containing %n is assumed to be untrustworthy if located in writable memory (i.e. memory with protection PROT_WRITE; see mprotect(2)) and any attempt to use such an argument is fatal. Practically, this means that %n is permitted in literal format strings but disallowed in format strings located in normal stack- or heap-allocated memory.

The manual page seems correct:

  % cat test.c
  #include <stdio.h>
  int main(void) {
    printf((char[]){ "%n" }, &(int){ 0 });
    return 0;
  }
  % cc -o test test.c                                          
  % ./test                                                     
  zsh: abort      ./test

saagarjha · on Jan 9, 2021

Someone should inform The Open Group about this violation of POSIX ;)

Another fun fact: glibc does this too, if you compile with -D_FORTIFY_SOURCE=2. However, since Linux lacks the nice vm_region APIs the code opens up /proc/self/maps :/

astrange · on Jan 9, 2021

dyld on Darwin has an API to ask if any pointer is to a read-only section of a binary. It’s useful because you can e.g. skip strcpys and other allocations.

saagarjha · on Jan 9, 2021

Hmm, can you tell me more? I can't think of any situation where skipping on a strcpy is legal, since you provide the second buffer and so the copy must occur. And I know that there is heavy uniquing going on for things like selectors and CFStrings at compile time, but where is the dyld API being used at runtime?

astrange · on Jan 9, 2021

Oh, I meant strdup. Look for stdupIfMutable() calls in libobjc.

pjmlp · on Jan 9, 2021

That is the beauty of POSIX, write once, debug everywhere, fix with plenty of spaghetti #ifdefs.

Narishma · on Jan 9, 2021

I'm pretty sure the compilers from Microsoft and Borland supported %n earlier than that. The earliest one I have easy access to that supports it is Microsoft C 4.0 from 1986.

gambiting · on Jan 9, 2021

Does anyone know why it was introduced in the first place? I mean.....the return value of printf gives you the exact same information, no? Why give printf the ability to write anything in the first place?

gambiting · on Jan 9, 2021

Ah, I found out why - %n prints out the number of characters printed up to the point where the %n is. Printf returns the total number of characters printed.

segfaultbuserr · on Jan 8, 2021

Fun fact, on glibc, an extension feature is that you can define your own custom conversion specifiers for printf().

kderbyma · on Jan 8, 2021

awesome. I didn't know about that printf hack....time for some fun experiments

saagarjha · on Jan 8, 2021

Be careful, though, you don't want anyone to hack you through printf ;)

segfaultbuserr · on Jan 9, 2021

In the late 90s, looking for "printf(string)" [0] in the code was a great way to discover remote code execution 0days ;-)

[0] should be "printf("%s", string)".

stevekemp · on Jan 9, 2021

Very much so, it took a long time for this to become obvious as a security problem.

My memory wuftpd was the first big program to suffer from this class of attacks.

segfaultbuserr · on Jan 9, 2021

It reminds me of a talk by infosec researcher "The Grugq" about opsec techniques used by blackhat hackers. Its subtitle was because jail is only for wuftpd, I couldn't stop laughing at it.