Hacker News new | past | comments | ask | show | jobs | submit login
Now you C me, now you don't (securitylab.github.com)
67 points by pcw888 on Aug 29, 2020 | hide | past | favorite | 59 comments



> In a nutshell, format string bugs are a class of bugs in which an attacker provides their own format string data into a formatting function, e.g. printf(attacker_controlled)

It might be an odd thing to have a strong opinion about, but I consider it poor form to use printf in a C hello world, for this reason. Stick with puts unless you need to use format-specifiers in the string.


You think the idiomatic C hello-world is poor form because of format string attacks?


Yes. Only use the more dangerous function when you have cause to do so.


printf is only problematic when you have attacked controlled data in the format string. Using a literal is perfectly safe and the intended usage.


Right, in this instance the behaviour is identical, but that wasn't my point.

In the interests of defensive programming, it makes sense to stick with the simpler and safer function unless there's a good reason to use the more complex and more dangerous function. printf should be used only when necessary. In a hello world, it's not necessary.

> Using a literal is perfectly safe and the intended usage.

It's not strictly true that it's perfectly safe. The literal could accidentally contain %n which would cause undefined behaviour. A little contrived, but consider Something has gone really #$@&%n£ wrong!

It's roughly the intended usage of printf, but it's precisely the intended usage of puts / fputs. Using them closes the door completely on format-specifier bugs.


I'd agree with you in most anything but a hello world. Showcasing a few basic powertools as a starting point is probably better than introducing "subtle copies" of the same functionality.


In that case, you could use printf("%s", "Hello, world!").


Printf is perfectly acceptable when a string is being printed without format-specifiers. You are abstracting your limited experience to be true for all situations.

For a simple example, how will puts print a string without a line termination?


Hence, use write()!

Not completely joking here, I think everytime I've had to use strings to make two C programs communicate over a network, write() was as easy to use as printf(). Obviously it's not true when communicating with a user.


That's not a bad point for many cases. I was pointing out that the coding rule in the upper comment to always use puts, and never printf, is bad advice.

I am glad to hear you mention network coding. More people should do that, so fewer people make statements like, "always use puts for writing strings, never printf."


write requires a length, it’s not specifically designed for working with strings like printf is.


> You are abstracting your limited experience to be true for all situations.

You aren't qualified to comment on my experience.

> For a simple example, how will puts print a string without a line termination?

Use fputs. If we're still just writing a simple string, there's no need to reach for a function with format-specifier interpretation.


> You aren't qualified to comment on my experience.

I think the language barrier made you believe I was personally attacking you. You literally constructed a general argument from a specific statement. That is a logical fallacy called fallacy of composition.

If you find yourself printing a string with printf and getting a different result from fputs, it has nothing to do with a newline but with not having checked the string's contents. Ignoring that doesn't fix code with unvalidated input.


And do you recommend checking the return value on printf and puts?


What for? Say printf returns an error, what would you do in that case?


It's bad to just ignore errors. You could immediately terminate, perhaps after printing an error-message to stderr. Depending on the program, this might be better than continuing on with a broken output stream.

In C++ you can have iostream throw on error, avoiding the problem of missing manual checks.


But be careful because stdout might be too short to receive the output.


That is actually a very good point.


It's still a minefield, and nobody is doing anything against it.

printf %n is known to be dangerous for ages, and the secure variant printf_s which forbids it is nowhere implemented because politics. Almost nobody uses the -Wformat attributes in its declarations, and the wide variants of __attribute__(format(wprintf)) and wscanf are waiting to be implemented since 2008. Patches do exist for ages.


> printf %n is known to be dangerous for ages, and the secure variant printf_s which forbids it is nowhere implemented because politics

printf("...%n...",...) is fine[0], it's printf(not_a_literal,...) that's a problem, and forbidding %n just papers over it.

> Almost nobody uses the -Wformat attributes in its declarations

This is completely at odds with my experience (at least for any project serious enough to actually use -Wstuff options), although I don't claim that's necessarily representative.

> the wide variants of __attribute__(format(wprintf)) and wscanf are waiting to be implemented since 2008

So? Wide characters are evil and need to die; good riddance; use a real character encoding. This has nothing to do with printf, which operates on chars.

0: or least as fine as anything else that takes a pointer to a local variable to store results into; I guess you could argue that it ought to return everything as a struct/tuple, but C doesn't support variable-width structs.


Well, FORTIFY_SOURCE >= 2 will reject "%n" unless it's stored in read-only memory. And Ubuntu at least makes this a gcc default, so I wouldn't say nobody is doing anything against it:

    $ cat tp.c 
    #include <stdio.h>
    #include <string.h>

    int main() {
      char a[3];
      printf(strcpy(a, "%n"));
      return 0;
    }
    $ gcc -O3 tp.c ; ./a.out
    *** %n in writable segment detected ***
    Aborted (core dumped)


Does this work if you mprotect a read-only region writable?


It uses the __readonly_area glibc function which actually parses /proc/self/maps:

https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/lin...

So no, I don't believe it will prevent using "%n" in memory that you have made writable. But this isn't really an issue because typically a format string exploit would have no way to arbitrarily call mprotect.

That said, there are other ways to abuse format string bugs apart from %n, for example "Direct Parameter Access" to get stack cookies, and the like:

https://cs155.stanford.edu/papers/formatstring-1.2.pdf


For those of us who forgot this odd behavior:

%n: Print nothing, but writes the number of characters successfully written so far into an integer pointer parameter.


I'm just a perpetual amateur in C and I've never heard of this. TIL. Thanks.


Now promptly forget about it again, so you're never tempted to use it.


Perpetual amateur.

I like that because it describes me perfectly. Saving this.


That's frightening. Is there any legit use case for this in the real world that would stop, e.g., glibc from removing it altogether?


I got curious and looked through sources. I'm not sure where it was added, but it appeared between 4.3BSD and the 4.3BSD Tahoe edition according to the man pages. It did not seem to be present in System III. Searching for actual uses, I didn't find any that weren't testing the functionality (unit tests) or hacking the stack.


Perhaps you could print non-tabulated numbers until you run close to the end of line and be counting the characters with %n. Then you can print a new line, set your counter to zero, and continue.


You might want to know where in the output string different elements of your format string were placed. I've used %n with sscanf, but not *printf.


Some programs use it for formatting purposes. macOS enabled “no format strings from writable memory” a while back and it broke these.


> [...]and the secure variant printf_s which forbids it is nowhere implemented because politics.

I’m curious. What’s the reason behind not implementing a safer version of a function?


That the _s functions are part of Annex K, which is optional, often not very good[0], and whose reliance on runtime checking is often considered problematic given we're talking about C developers. The politics referred to are because the annex is basically a bunch of Microsoft extensions pasted in the standard.

Take printf_s for instance, its main job is to forbid %n, at runtime[1]. It's much safer and more reliable to perform static analysis to mandate literal format strings and forbid %n.

[0] aka not very well designed, and not really good at leading developers away from mistakes, they also deviated from some "unsafe" functions in ways which could be quite critical e.g. strncpy_s infamously does not zero the destination buffer like strncpy, which might not be expected by people catering to codebases knowingly taking advantage of this property

[1] it will also check for %s null pointer which is somewhat more helpful, but again we're talking about C people here


Standard C didn’t even have a good way to do string copying safely until memccpy in C20. Most “safe” functions have unacceptable performance tradeoffs, including most extensions.


Well, safeclib memcpy_s (Posix standard, Annex K) is on good compilers faster than glibc memcpy, whilst also being safe. Unacceptable is just gcc, mostly.

Haven't seen a safe memset yet, other than mine.


Usually the unacceptable performance tends to be argued with developer personal point of view, instead of profile data in the actual production deployment.


> The politics referred to are because the annex is basically a bunch of Microsoft extensions pasted in the standard.

So... M$FT bad, therefore anything they do is bad? That makes no sense.


> M$FT bad, therefore anything they do is bad

That's backward; it's actually "anything [rather, most things] Microsoft does is bad, therefore Microsoft is bad".

And (most of) Annex K is a example of that.

(The strncpy/_s one is bit odd though, because strncpy was specifically for initializing a fixed-length buffer, which is not nul-terminated, but rather nul-padded. strncpy works fine for that purpose, although nobody uses nul-padded buffers anymore. Using it as a size-aware strcpy was always wrong; you should use strlen+memcpy or a separate str[lzc...]cpy (or strcpy_s I suppose) instead.)


Basically, almost no compiler vendor implemented it, they also had the issue that you still need to handle pointer and length as separated arguments.

So instead of fixing the security issues, like having some struct based type, the whole annex was made optional, thus everyone that was against it, just doesn't offer any alternative.


No, it’s actually “this set of extensions Microsoft wrote is bad and they forced it into the standard”.


Still waiting for the counter proposal from anyone else in WG14, though.

Instead OS vendors seem to go via hardware memory tagging solutions, than keeping up a lost war, as we have discussed multiple times.


it will also check for %s null pointer

Glibc will now print something like "(null)" in this case instead of segfaulting.


Take a look at the C standard library. Most of it is string handling. Add a new "safe" version of each function and you just doubled the size of the library. This for the questionable safety that those "safe" functions provide - you still have to keep track and provide the correct parameters yourself most of the time, mess up and at best you get a false sense of security.

Safe and C strings do not belong into the same sentence unless it also includes words like: hazard, dumpster fire or rofl.


CVE database is full of questionable safety exploits regarding C practices.

The more the better, after all security researchers, hackers and security tool vendors have families to feed.


> CVE database is full of questionable safety exploits regarding C practices.

I'm pretty sure josefx agrees with that given their second paragraph, they're mostly saying Annex K doesn't really make things better: it follows the usual (bad) C handling practices, and largely just adds more parameters to mess passing in.


And what was the solution? Deprecate everything without any alternative.


I'm confused about what you're asking about. What was the solution to what? The stdlib being crap? Annex K being crap as well? C existing?


All three of them, actually.

Now given that UNIX and POSIX clones aren't going away anytime soon, it would be interesting that WG14 actually cared to have papers to improve C's current lack of safety.

Instead we get even more papers about how to exploit UB in name of ultimate performance.


What’s wrong with a bigger standard library? If you dynamically link, it makes no difference, and if you statically link, any decent linker won’t bring in what you don’t use (so it also makes no difference).

Also, why can’t C have safe stuff? Because it’s always been possible to shoot yourself in the foot, we shouldn’t try to change that?


> What’s wrong with a bigger standard library?

Maintainer time mostly and for some uses bloat.

> If you dynamically link, it makes no difference

I had projects that were limited by the time it took to load and initialize the program binary and loading/linking the C standard library was the largest dependency it had.

> and if you statically link, any decent linker won’t bring in what you don’t use

Santa, a sufficiently smart compiler and a decent linker walk into a bar. Here the joke segfaults because someone passed the wrong buffer size into sprintf_s.


Mentality, among the C community there is a deep belief that only bad programmers, that shouldn't be using C to start with, write unsafe code, despite 50 years of proofs.


I’ve found that of the best ROIs for any existing large C/C++-using-C codebase is meticulously going through the compiler warning options and enabling all of the things that didn’t make the cut into -Wall or even -Wextra. After getting strict warnings under control and making them CI errors, again meticulously go attribute all varargs printf-style functions with the appropriate format attributes. Those two things have always been such a reachable goal even for one person and prevent everyone else touching the code from introducing sneaky bugs that slip past review. The newer compilers now check and can error if the maximum range for a value would exceed the buffer space you’ve given it, even if you think those values could never occur in practice.


> Almost nobody uses the -Wformat

That surprises me. Everywhere I've worked has used -Wallthethings or the equivalent, back to when it was `lint`.


Without the declarations -Wformat will not help much. Who adds __attribute__(format declarations? Almost nobody.


Worse, WG14 has shown zero interest in improving C's security story. Annex K was a disaster, died among compiler implementation politics.

At least WG21 is striving to reduce the known cases of UB and providing the necessary tooling to security conscious developers to make use of.


> WG21 is striving to reduce the known cases of UB

Citation? The last I heard the standard was fully defending compilers that actively seek out undefined behaviour as a excuse to inject bugs and vulnerabilities into working code under the pretense of optimization. It would be very good news if they were walking that back towards "undefined behaviour is solely for cross-platform compatiblity".


Just one example to start from, https://github.com/jbcoe/iso-cpp-breaking-UB


Ah, I see; that's talking about changing certain behaviours from undefined to well-defined; I was talking about requiring the compiler to pick sane realizations of behaviours that are not well-defined.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: