Hacker News new | past | comments | ask | show | jobs | submit login
Best of show – abuse of libc (ioccc.org)
361 points by mooreds on Jan 8, 2021 | hide | past | favorite | 77 comments



>Format specifiers can take extra “arguments”. - "%hhn": store the number of bytes written mod 256 to the char pointer ...

Oh boy. I'll put that down for my "thing I don't think I wanted to know" of the day.


There is some innocent beauty in the twistedness of printf - especially with GNU extensions.


So I can port doom to gnu printf?


GNU's printf is Turing complete[0]... so "yes."

[0] Mentioned (but not directly linked) by TFA:

https://www.usenix.org/conference/usenixsecurity15/technical...


I've always wanted a Turing-complete printing function.


We already have post script for that.


I'd start by creating a llvm backend for printf. Should be fun.


Port a zmachine first ;).


GNU's printf specifier language is Turing complete, I believe.


Theres a great example of what you can do with this, submitted and discussed on HN here: https://news.ycombinator.com/item?id=25690319


>Format specifiers can take extra “arguments”. - "%hhn": store the number of bytes written mod 256 to the char pointer ...

Oh boy. I'll put that down for my "thing I don't think I wanted to know" of the day.


There is some innocent beauty in the twistedness of printf - especially with GNU extensions.


So I can port doom to gnu printf?


GNU's printf specifier language is Turing complete, I believe.


GNU's printf is Turing complete[0]... so "yes."

[0] Mentioned (but not directly linked) by TFA:

https://www.usenix.org/conference/usenixsecurity15/technical...


GNU's printf specifier language is Turing complete, I believe.


What is up with this thread? These comments are duplicated from the top thread...


Is it possible that it's a joke based on the material of the OP? Maybe we're trapped in some sort of International Obfuscated Internet Thread Contest...

Edit: ok, I'm guessing it's a joke about Turing completeness. Loops, you know.


It’s a joke about the recursion introduced here: https://news.ycombinator.com/item?id=25691615.


Well executed jokes on HN. 2021 already is a crazy year.


Ohhhh... Now I get it.


(The usernames are different)


Presumably it needs a loop around it, so it's not Turing-complete by itself?


No need to use a loop around it, printf can take care of that pesky detail for you! To quote [0] (my emphasis):

> To achieve full Turing-complete computation, we need a way to loop a format string. This is possible by overwriting the pointer inside printf() that tracks which character in the format string is currently being executed. The attacker is unlucky in that at the time the “%n” format specifier is used, this value is saved in a register on our 64-bit system. However, we identify one point in time in which the attacker can always mount the attack. The printf() function makes calls to puts() for the static components of the string. When this function call is made, all registers are saved to the stack. It turns out that an attacker can overwrite this pointer from within the puts() function. By doing this, the format string can be looped.

> An attacker can cause puts() to overwrite the desired pointer. Prior to printf() calling puts(), the attacker uses “%n” format specifiers to overwrite the stdout FILE object so that the temporary buffer is placed directly on top of the stack where the index pointer will be saved. Then, we print the eight bytes corresponding to the new value we want the pointer to have. Finally, we use more “%n” format specifiers to move the buffer back to some other location so that more unintended data will not be overwritten.

[0] https://www.usenix.org/system/files/conference/usenixsecurit..., Appendix B "Printf is Turing-complete".


Or, you know, you can just use printf to overwrite the return address and ROP your way to a shell.


Herein lies madness


Not portable, but if you can get the address of the stack you can force printf to overwrite the return address, obviating the loop.


It does not need the loop. It might be easier to understand by looking at something like this. [0] That printf allows for this kind of Turing complete control flow is well known [1].

[0] https://github.com/HexHive/printbf

[1] http://nebelwelt.net/publications/files/15SEC.pdf



The implementation is accidentalyl turing-complete because you can exploit it to get arbitrary memory writes. But the language as specified is not Turing-complete.


Are there any scanners out there that will detect user input ending up as the format string of a printf?

Perhaps a scanner than I can run against all of github, and then rank results by the number of times that code is exposed on a high value server connected to the internet...?


Any modern C compiler will already warn you if the format string isn’t a string literal (https://stackoverflow.com/questions/32362918/error-format-st...)

I don’t think it’s worth the effort to extend that to look for tainted strings, not because it wouldn’t be useful, but because it would be hard to do (as an extreme example: is data read from a file user input? It could be a file containing internationalization info)

The (relatively) few programs that construct format strings on the fly will have to add pragmas to disable these warnings.


A winner from 1993 is very interesting too:

https://www.ioccc.org/years.html#1993_dgibson

It implements Conway's Game of Life by creating a DSL using the C preprocessor and printf. The output is a program (several initial boards are supplied to bootstrap) which is the input program to be compiled and run to create the next generation. This is the program for a second generation:

    LIFE

    L _ _ _ _ _
    L _ _ O _ _
    L _ _ _ O _
    L _ O O O _
    L _ _ _ _ _

    GEN 2 STAT 328960
    END
Each symbol like "LIFE" is a macro, the board is the program.



Thanks for all, dang !


Personally I find it amusing that the 0-signal comment "Thanks for all dang" is upvoted while the opposite 0-signal comment "Thanks for nothing dang" is downvoted. I mean, I think dang is chill, but neither of these really contributes to the discussion any more than the other, so shouldn't they have the same score? Upvotes really are a popularity contest these days.


This was addressed by pg over a decade ago: https://news.ycombinator.com/newswelcome.html

Empty comments can be ok if they're positive. There's nothing wrong with submitting a comment saying just "Thanks." What we especially discourage are comments that are empty and negative—comments that are mere name-calling.

If you think in terms of what's good/bad for community it may make more sense.

(I hope it's clear this applies whether or not the mods were mentioned in either a positive or negative way.)


It's not clear what the votes are. However, the latter was written by someone who has banned for what appears to be their habit of leaving low-value comments.


The negative comment is from a hellbanned account (which is, frankly, unsurprising). It's not even possible to downvote it, as far as I can tell, due to being DOA.


Always have been.


It's hard to believe that this is the same person with multiple widely-cited ML papers[0]. It's jaw-dropping how talented someone can be.

https://scholar.google.com/citations?user=q4qDvAoAAAAJ&hl=en...


Thank you for the pointer, it is really amazing.


For those interested in more Turing complete format strings, look no further than the "sprint" challenge from this year's Google CTF Quals: https://ctftime.org/task/12834. It's sprintf in a loop this time and the program simulates a maze: https://github.com/google/google-ctf/tree/master/2020/quals/...


The author works at Google, so I suspect he's the same who created this challenge. Really enjoyable, although I didn't manage to solve it during contest.


The printf format string is actually a little language. In the book The Practice of Programming, Kernighan and Pike show how you can devise a similar format string to pack/unpack network packets.


Up next: a C compiler that compiles to printf statements :-P


https://github.com/HexHive/printbf

well this is a brainfuck interpreter inside printf. I’m pretty sure there are plenty of c-to-bf transpilers.


This is by the same author as the ioccc entry and also one of authors of the paper showing the turing completeness of printf http://nebelwelt.net/publications/#15SEC


Ah, that explains everything. I have already seen this technique before and wondered why this entry has to be the best of show---I don't doubt it is worth the prize, just that it didn't sound very novel. But it all makes sense if the technique is not well known and authors tried to revitalize that.


That's fun, but esoteric languages in general and brainfuck in specific tend to lack things you'd want out of c: file system access, system calls, etc.


Hm, I think you could add numeric syscalls, similar to what happens at the asm level. E.g. put the syscall id and some parameters on the "stack", then let the interpreter run the syscall with a new "instruction" e.g. '!'. This could even substitute '.' (putchar) and ',' (getchar), since these are very much just syscalls. So that would reduce the number of instructions by one (to 7).

Oh, getting to 6 would also be fun: One might replace '[' and ']' with a conditional branch '?'. It just needs two parameters: condition and (signed) number of instructions to jump. Adds the bonus (much like normal asm) to write moch more ~~horribly abusive~~ flexible control flow than a structured "while(*ptr)".


It's already implemented: https://github.com/ajyoon/systemf There is even an HTTP server built with it.


Why am I not even surprised...? I thought about writing a sentence about how (relatively) easy it would be to build a verified compiler (think CompCert-for-brainfuck); I'd guess the outcome is one of (a) "someone already did that as well, here is the link" or (b) "I spent the weekend with that, here is the project on github". The Internet is awesome, as are people :)


That was my approach for an analogous program that uses memcpy instead of printf. I didn't go with the jit-style you describe, however. If you're curious here's how I setup the syscalls https://github.com/jcande/xenocryst/blob/master/src/gadgets.... and here's the main loop https://github.com/jcande/xenocryst/blob/master/src/exec.c#L...


There is somewhere a compiler that outputs to all sorts of crazy languages including awk, sed, printf, etc.. but I can't find it right now. Hopefully someone knows what I'm talking about.

I feel like it did LLVM IR to a bunch of languages or something like that.. but my memory is faulty.


You are looking for ELVM: https://github.com/shinh/elvm/ (I have seen many others, but in terms of activity it seems the most maintained one.)


Thankyou, that is the one! :) I love projects like that.


I'm writing my first C99 compiler. IOCCC sounds like a great source of test material.


%n has frequently been used as an attack vector - generally in the context of the other poor practice of printf(<attacker controlled string>, ...)

It’s actually intentionally disallowed in some libc implementations.


How did printf end up here in the first place? Decades of feature additions, or were these features a part of an early spec?


%n was defined in C89, the first C standard: http://port70.net/~nsz/c/c89/c89-draft.html#4.9.6.1

Looking at old source code, the earliest implementation I found is 4.3BSD Tahoe (1988). See https://www.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-Tahoe/usr/... Second oldest I found was Tenth Edition [Research] Unix (1989). See ocvt_n at https://www.tuhs.org/cgi-bin/utree.pl?file=V10/libc/stdio/vf... I couldn't find support in earlier implementations archived on that site.


> Looking at old source code, the earliest implementation I found is 4.3BSD Tahoe (1988).

You are the HN historian of the day.


Another interesting factoid is that macOS only supports %n if the format string is located in read-only memory. Per printf(3) on macOS:

> For this reason, a format argument containing %n is assumed to be untrustworthy if located in writable memory (i.e. memory with protection PROT_WRITE; see mprotect(2)) and any attempt to use such an argument is fatal. Practically, this means that %n is permitted in literal format strings but disallowed in format strings located in normal stack- or heap-allocated memory.

The manual page seems correct:

  % cat test.c
  #include <stdio.h>
  int main(void) {
    printf((char[]){ "%n" }, &(int){ 0 });
    return 0;
  }
  % cc -o test test.c                                          
  % ./test                                                     
  zsh: abort      ./test


Someone should inform The Open Group about this violation of POSIX ;)

Another fun fact: glibc does this too, if you compile with -D_FORTIFY_SOURCE=2. However, since Linux lacks the nice vm_region APIs the code opens up /proc/self/maps :/


dyld on Darwin has an API to ask if any pointer is to a read-only section of a binary. It’s useful because you can e.g. skip strcpys and other allocations.


Hmm, can you tell me more? I can't think of any situation where skipping on a strcpy is legal, since you provide the second buffer and so the copy must occur. And I know that there is heavy uniquing going on for things like selectors and CFStrings at compile time, but where is the dyld API being used at runtime?


Oh, I meant strdup. Look for stdupIfMutable() calls in libobjc.


That is the beauty of POSIX, write once, debug everywhere, fix with plenty of spaghetti #ifdefs.


I'm pretty sure the compilers from Microsoft and Borland supported %n earlier than that. The earliest one I have easy access to that supports it is Microsoft C 4.0 from 1986.


Does anyone know why it was introduced in the first place? I mean.....the return value of printf gives you the exact same information, no? Why give printf the ability to write anything in the first place?


Ah, I found out why - %n prints out the number of characters printed up to the point where the %n is. Printf returns the total number of characters printed.


Fun fact, on glibc, an extension feature is that you can define your own custom conversion specifiers for printf().


awesome. I didn't know about that printf hack....time for some fun experiments


Be careful, though, you don't want anyone to hack you through printf ;)


In the late 90s, looking for "printf(string)" [0] in the code was a great way to discover remote code execution 0days ;-)

[0] should be "printf("%s", string)".


Very much so, it took a long time for this to become obvious as a security problem.

My memory wuftpd was the first big program to suffer from this class of attacks.


It reminds me of a talk by infosec researcher "The Grugq" about opsec techniques used by blackhat hackers. Its subtitle was because jail is only for wuftpd, I couldn't stop laughing at it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: