Is it possible that it's a joke based on the material of the OP? Maybe we're trapped in some sort of International Obfuscated Internet Thread Contest...
Edit: ok, I'm guessing it's a joke about Turing completeness. Loops, you know.
No need to use a loop around it, printf can take care of that pesky detail for you! To quote [0] (my emphasis):
> To achieve full Turing-complete computation, we need a way to loop a format string. This is possible by overwriting the pointer inside printf() that tracks which character in the format string is currently being executed. The attacker is unlucky in that at the time the “%n” format specifier is used, this value is saved in a register on our 64-bit system. However, we identify one point in
time in which the attacker can always mount the attack. The printf() function makes calls to puts() for the static components of the string. When this function call is made, all registers are saved to the stack. It turns out that an attacker can overwrite this pointer from within the puts() function. By doing this, the format string can be looped.
> An attacker can cause puts() to overwrite the desired pointer. Prior to printf() calling puts(), the attacker uses “%n” format specifiers to overwrite the stdout FILE object so that the temporary buffer is placed directly on top of the stack where the index pointer will be saved. Then, we print the eight bytes corresponding to the new value we want the pointer to have. Finally, we use more “%n” format specifiers to move the buffer back to some other location so that more unintended data will not be overwritten.
It does not need the loop. It might be easier to understand by looking at something like this. [0] That printf allows for this kind of Turing complete control flow is well known [1].
The implementation is accidentalyl turing-complete because you can exploit it to get arbitrary memory writes. But the language as specified is not Turing-complete.
Are there any scanners out there that will detect user input ending up as the format string of a printf?
Perhaps a scanner than I can run against all of github, and then rank results by the number of times that code is exposed on a high value server connected to the internet...?
I don’t think it’s worth the effort to extend that to look for tainted strings, not because it wouldn’t be useful, but because it would be hard to do (as an extreme example: is data read from a file user input? It could be a file containing internationalization info)
The (relatively) few programs that construct format strings on the fly will have to add pragmas to disable these warnings.
It implements Conway's Game of Life by creating a DSL using the C preprocessor and printf. The output is a program (several initial boards are supplied to bootstrap) which is the input program to be compiled and run to create the next generation. This is the program for a second generation:
LIFE
L _ _ _ _ _
L _ _ O _ _
L _ _ _ O _
L _ O O O _
L _ _ _ _ _
GEN 2 STAT 328960
END
Each symbol like "LIFE" is a macro, the board is the program.
Personally I find it amusing that the 0-signal comment "Thanks for all dang" is upvoted while the opposite 0-signal comment "Thanks for nothing dang" is downvoted. I mean, I think dang is chill, but neither of these really contributes to the discussion any more than the other, so shouldn't they have the same score? Upvotes really are a popularity contest these days.
Empty comments can be ok if they're positive. There's nothing wrong with submitting a comment saying just "Thanks." What we especially discourage are comments that are empty and negative—comments that are mere name-calling.
If you think in terms of what's good/bad for community it may make more sense.
(I hope it's clear this applies whether or not the mods were mentioned in either a positive or negative way.)
It's not clear what the votes are. However, the latter was written by someone who has banned for what appears to be their habit of leaving low-value comments.
The negative comment is from a hellbanned account (which is, frankly, unsurprising). It's not even possible to downvote it, as far as I can tell, due to being DOA.
The author works at Google, so I suspect he's the same who created this challenge. Really enjoyable, although I didn't manage to solve it during contest.
The printf format string is actually a little language. In the book The Practice of Programming, Kernighan and Pike show how you can devise a similar format string to pack/unpack network packets.
This is by the same author as the ioccc entry and also one of authors of the paper showing the turing completeness of printf http://nebelwelt.net/publications/#15SEC
Ah, that explains everything. I have already seen this technique before and wondered why this entry has to be the best of show---I don't doubt it is worth the prize, just that it didn't sound very novel. But it all makes sense if the technique is not well known and authors tried to revitalize that.
That's fun, but esoteric languages in general and brainfuck in specific tend to lack things you'd want out of c: file system access, system calls, etc.
Hm, I think you could add numeric syscalls, similar to what happens at the asm level. E.g. put the syscall id and some parameters on the "stack", then let the interpreter run the syscall with a new "instruction" e.g. '!'. This could even substitute '.' (putchar) and ',' (getchar), since these are very much just syscalls. So that would reduce the number of instructions by one (to 7).
Oh, getting to 6 would also be fun: One might replace '[' and ']' with a conditional branch '?'. It just needs two parameters: condition and (signed) number of instructions to jump. Adds the bonus (much like normal asm) to write moch more ~~horribly abusive~~ flexible control flow than a structured "while(*ptr)".
Why am I not even surprised...? I thought about writing a sentence about how (relatively) easy it would be to build a verified compiler (think CompCert-for-brainfuck); I'd guess the outcome is one of (a) "someone already did that as well, here is the link" or (b) "I spent the weekend with that, here is the project on github". The Internet is awesome, as are people :)
There is somewhere a compiler that outputs to all sorts of crazy languages including awk, sed, printf, etc.. but I can't find it right now. Hopefully someone knows what I'm talking about.
I feel like it did LLVM IR to a bunch of languages or something like that.. but my memory is faulty.
Another interesting factoid is that macOS only supports %n if the format string is located in read-only memory. Per printf(3) on macOS:
> For this reason, a format argument containing %n is assumed to be untrustworthy if located in writable memory (i.e. memory with protection PROT_WRITE; see mprotect(2)) and any attempt to use such an argument is fatal. Practically, this means that %n is permitted in literal format strings but disallowed in format strings located in normal stack- or heap-allocated memory.
The manual page seems correct:
% cat test.c
#include <stdio.h>
int main(void) {
printf((char[]){ "%n" }, &(int){ 0 });
return 0;
}
% cc -o test test.c
% ./test
zsh: abort ./test
Someone should inform The Open Group about this violation of POSIX ;)
Another fun fact: glibc does this too, if you compile with -D_FORTIFY_SOURCE=2. However, since Linux lacks the nice vm_region APIs the code opens up /proc/self/maps :/
dyld on Darwin has an API to ask if any pointer is to a read-only section of a binary. It’s useful because you can e.g. skip strcpys and other allocations.
Hmm, can you tell me more? I can't think of any situation where skipping on a strcpy is legal, since you provide the second buffer and so the copy must occur. And I know that there is heavy uniquing going on for things like selectors and CFStrings at compile time, but where is the dyld API being used at runtime?
I'm pretty sure the compilers from Microsoft and Borland supported %n earlier than that. The earliest one I have easy access to that supports it is Microsoft C 4.0 from 1986.
Does anyone know why it was introduced in the first place? I mean.....the return value of printf gives you the exact same information, no? Why give printf the ability to write anything in the first place?
Ah, I found out why - %n prints out the number of characters printed up to the point where the %n is. Printf returns the total number of characters printed.
It reminds me of a talk by infosec researcher "The Grugq" about opsec techniques used by blackhat hackers. Its subtitle was because jail is only for wuftpd, I couldn't stop laughing at it.
Oh boy. I'll put that down for my "thing I don't think I wanted to know" of the day.