Finding bugs in Tarsnap

cperciva · on July 2, 2015

In LibreSSL, we went big hammer and clear all the bignums, regardless of sensitivity. And so it was that when I bumped into Colin at BSDCan 2014 two weeks later, failure to clear secret bignums from memory was fresh in my mind.

I bumped into Ted again three weeks ago at BSDCan 2015 and he must have had memory leaks on his mind, because the day after the conference ended he sent me an email pointing out a memory leak in Tarsnap. (Harmless, since it's in a we're-about-to-exit error path, but worth $10 anyway.)

programmernews3 · on July 2, 2015

Aren't free()'s senseless before code exits?

cperciva · on July 2, 2015

Yes, if you know that the process is going to exit. In this case, the missing free is in a function which is about a dozen calls down in the stack, and I know that it's harmless right now because I've looked at the code and verified that an error there will always result in tarsnap exiting; but if I don't fix this now it's quite possible that in the future I'll have code which uses that function but can tolerate it failing.

programmernews3 · on July 2, 2015

I think I understand. Function, which could be used independently should free all used resources.

If a function is very project-specific and has no purpose to being used independently it may be worth considering declaring it static, right?

cperciva · on July 3, 2015

"static" is a different concept -- that marks a function as not being used from outside of that compilation unit (aka .c file). This particular function was used from outside of that file, but was unlikely to be reused in a different project.

syllogism · on July 2, 2015

It's best to be rigorous about it in case you refactor or export the function, and to make sure you get a green light from valgrind.

derekp7 · on July 2, 2015

So does anyone know of a good list of "worst practices" when coding in C? Basically usage that results in undefined behavior which can come back to bite you when run on another architecture?

For example, doing a strcpy(dest, dest + 1) will work in most cases, but if done on 64-bit Linux with a CPU that has sse4 optimizations, you will get random corruption on certain string lengths. (The C standard says that the behavior in this case is undefined). I'd like to see a list of items such as this to watch out for when auditing code.

johnwfinigan · on July 2, 2015

https://www.securecoding.cert.org/confluence/display/c/SEI+C...

Has a lot of rules of the form "Do Not ..."

technion · on July 3, 2015

Running clang with the undefined sanitizer[0] goes a lot of the way towards identifying this sort of thing. I don't see why it isn't used more often.

[0] http://blog.llvm.org/2013/04/testing-libc-with-fsanitizeunde...

bch · on July 2, 2015

What would be good is a C Haters Handbook, like the The Unix-Haters Handbook[0], a conversational/anecdotal book of sticking points and solutions, be they "don't do that", "here's a recipe/workalike", or "this is subtle -- pay attention".

[0] https://en.wikipedia.org/wiki/The_Unix-Haters_Handbook

gregnavis · on July 3, 2015

If I recall correctly the source and destination of strcpy cannot overlap. If they do it's an undefined behaviour.

derekp7 · on July 3, 2015

Yes, you are definitely correct, that is what the standard specifies. However, many programmers assume that if the source string points to a higher memory location than the destination string, then it is ok (i.e., they want to shift a string of X bytes to the left by some amount). And they've gotten away with it for many years, because most strcpy() functions are implemented very simply (they copy one byte at a time). The problem is when 15 year old code is run on a newer platform that has an SSE4 optimized implementation of strcpy -- then they find out what is meant by "undefined behavior".

pmalynin · on July 2, 2015

Coding in C.

java-man · on July 2, 2015

Hear hear!

thu · on July 2, 2015

When I was working for OpenERP, I thought it was surely possible for a user of one database to access other databases (when OpenERP run in multi-tenant mode, which was the case of the official OpenERP SaaS offer) by poking around the connection pool. And indeed in less than an hour I found a way to connect to any database. The fact that you can quickly find bugs when you have an idea of what you can look for is quite scary.

hrbrtglm · on July 3, 2015

Well, that's really interesting for all of us who work with multi-tenant webapp.

Maybe you took some notes.

Could you share some more info if you are free to disclose how you poked with the connection pool. I haven't look at OpenERP sources, but I think the database in use is PostgreSQL.

> "for a user of one database to access other database"

Were OpenERP using a database for each tenant or 1 database with multiple schemas ? In case it was the former case, were all databases accessed with the same credentials ? Which connection pool was it ? Pgpool / Pgbouncer ? Where was the vulnerability, code, db, pool, config setup ?

Sorry for all this questions, I'm indeed very interested and involved in this topic.

thu · on July 3, 2015

I reported it a long time ago and it was eventually fixed a few months later. The bug report is on Launchpad but I don't know if they opened it to the public. Here is a similar issue filled on GitHub (they moved there from Launchpad and they're now called Odoo): https://github.com/odoo/odoo/issues/7243

OpenERP, at least back then, was using a single PostgreSQL cluster for a given OpenERP server. Each tenant has its own PostgreSQL database. In multiple places, OpenERP offers the administrator role of a tenant to customize OpenERP with Python code. Basically you can inject Python code for execution, and it is run in the same processes used for every tenant. OpenERP tries to limit what the Python expressions can do.

The idea of what I did was, starting from the objects I had access to, to find a way to get a database cursor to another database than mine.

The connection pool in OpenERP is custom and part of the Python code base.

To get the cursor, I poked at some object, took its class and reinstanciated (this is all straightforward to do in Python) it with the name of another database (listing other databases was another longstanding issue of OpenERP).

hrbrtglm · on July 3, 2015

Thank you.

So I imagine, given their specifications, they had to give each tenant its own full PG database. I understand they have strong business logics but it does not seem very efficient. Salesforce is known to use a single oracle instance for all its tenants and I don't think their business logics is less demanding.

Giving an administrator or elevated role to the tenant connection looks like a really bad idea from the start, especially if the tenant instance is not sandboxed in a container or virtual server.

But this things are hard and a small mistake can have big consequences.

tokenrove · on July 2, 2015

One of the better ways to find bugs in other software is to first encounter them in your own. (Even if the bugs, as mentioned in this article, aren't actually severe.)

In particular, I remember, while writing an ASN.1 library, commenting to a friend that "most ASN.1 implementations are probably full of dangerous bugs" (since mine certainly was), and about a year or two later all those ASN.1-related vulnerabilities came out.

I guess that in general, bugs appear in patterns that recur again and again in independent code, as people try to solve the same problems with the same tools and make the same assumptions.

Flowdalic · on July 3, 2015

> tarsnap had a signal handler that was reading from a constant array. So what, how could this matter? The standard says thou shalt not.

An references to the standard (I assume he is referring to POSIX) where this is disallowed?

lokedhs · on July 3, 2015

I believe this is in reference to the following quote: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2...

The relevant quote is:

    the behavior is undefined if the signal handler refers to any object
    other than errno with static storage duration other than by assigning
    a value to an object declared as volatile sig_atomic_t, or if the
    signal handler calls any function defined in this standard other than
    one of the functions listed in the following table.