Valgrind is much more than a leak checking tool

nkurz · on Dec 7, 2014

That's a great article! As the article says, "Valgrind should be your tool of first resort". Running clean under Valgrind (as well as without warnings under -Wall -Wextra) is a good requirement for quality code. And it's worth mentioning that although C++ is the example the author chose, Valgrind works on any binary regardless of the language it's written in.

Leaked memory at close might be OK, but "uninitialized value" or "illegal access" almost never is. And if for some reason you think it is OK in your particular case, don't just ignore the warnings. Either change your code, or use a suppression file (http://valgrind.org/docs/manual/mc-manual.html#mc-manual.sup...), or the macros in valgrind.h (https://raw.githubusercontent.com/svn2github/valgrind/master...) to so that others don't waste time on it. And realize that more often than not Valgrind is right and you are wrong. :)

The small changes I might suggest to the article are to add "-g" to the command line flags you always use (it makes debugging a lot easier, and while it makes the executable larger, it almost never makes the code slower); to emphasize more strongly that Valgrind is not a static analyzer and will only catch errors in your code if the execution actually triggers them; to say that "-pedantic" is a good choice if you want code portable to all standards compliant compilers but may not be a good choice otherwise; and to suggest that if you are on Linux, 'perf' is a better first-line profiler than 'callgrind'.

dcsommer · on Dec 7, 2014

> As the article says, "Valgrind should be your tool of first resort".

I used to be of this opinion, but nowadays I believe Address Sanitizer and related tools (https://code.google.com/p/address-sanitizer/) to be a better, easier-to-use set of tools for first line of defense. Like valgrind, there are different modes/tools in this suite, but they have far less of an impact on program runtime. It's sometimes even possible to do production canaries of ASAN enabled services, unlike with valgrind.

turck3 · on Dec 8, 2014

(Author writing): You're probably correct. Unfortunately ASAN isn't an option at my university because we use an older version of g++ (though they do seem to be updating it every now and again now). I've seen ASAN be put to great use in industry though.

lmedinas · on Dec 8, 2014

I also tend to use more the ASAN/TSAN (due runtime performance and easy to use) and others but these tools are still in Beta/Experimental stage and they don't have all the features of Valgrind yet. Another drawback is that you require recent compilers to use them.

mrich · on Dec 8, 2014

Generally I agree. One exception is code generated at runtime (i.e. JIT) or third party libs which you cannot recompile. ASan is planning to handle this but is not there yet.

mcbain · on Dec 8, 2014

I'd also suggest adding --track-origins=yes for uninitialized scans - it might be a bit slower, but is very useful for nailing these things quickly.

We make sure all of our unit tests run clean under valgrind, including the unit testing framework itself - it has caused some devs to grind their teeth making the tests themselves work cleanly, but it saves everyone's sanity in the long run.

The one thing that bugs me about valgrind (well, other than support for non-Linux, non-x86 targets - although that's always improving), is that it lags behind in support for newer additions to instruction sets, like AVX2 - the core are there, but not all the semi-documented forms of the instructions. (I've got a TODO to try and work through a few of the VEX errors I get to make some of the things I work on valgrind'able again - although the latest release notes look positive.)

nitrogen · on Dec 8, 2014

I typically run my C projects' test suites through Valgrind memcheck and fix all errors, leaks, and still-reachables before releasing any new versions. It's an essential part of a C developer's toolbox, IMO.

1amzave · on Dec 7, 2014

valgrind proper is actually a much more general binary instrumentation/execution engine that supports a number of different useful tools; most of what the article is actually describing is memcheck, the tool it runs by default when you don't specify another. While the article does mention helgrind and callgrind, it glosses over the true generality of valgrind in offering the "plugin" structure via which those are implemented.

qznc · on Dec 8, 2014

This! You are still underestimating Valgrind until you have written your own plugin.

For example, it is a nice tool for compiler writers to check optimizations. You could write a plugin which counts dynamic method invocations and check how good your devirtualization optimization works.

DanBC · on Dec 8, 2014

Should there be some mention of the need for caution with the results of Valgrind?

https://www.schneier.com/blog/archives/2008/05/random_number...

nkurz · on Dec 8, 2014

Amazing example. Blindly doing anything because of Valgrind is bad, but I think a better rule might be "Don't comment out lines you don't understand in a standard cryptography package and then distribute it to users without warning". I don't think a mere "mention of the need for caution" could have saved anyone here. Handcuffs and a straightjacket on the Debian developer might have been required in this case.

This link covers it well:

  What can we learn from this? Firstly, vendors should not be 
  fixing problems (or, really, anything) in open source 
  packages by patching them locally – they should contribute 
  their patches upstream to the package maintainers. Had 
  Debian done this in this case, we (the OpenSSL Team) would 
  have fallen about laughing, and once we had got our breath 
  back, told them what a terrible idea this was. But no, it 
  seems that every vendor wants to “add value” by getting in 
  between the user of the software and its author.

  Secondly, if you are going to fix bugs, then you should 
  install this maxim of mine firmly in your head: never fix a 
  bug you don’t understand. I’m not sure I’ve ever put that 
  in writing before, but anyone who’s worked with me will 
  have heard me say it multiple times.

http://www.links.org/?p=327

harry8 · on Dec 8, 2014

"...we (the OpenSSL Team) would have fallen about laughing..."

Note this attitude of derision and arrogance. It's interesting that this was before the quality of OpenSSL code became common knowledge, heartbleed and finally the major work of LibreSSL. I'm not sure if I'd just really like this kind attitude to be a good indicator of trouble coming or it really is so.

Anyway be excellent to each other!

lil_cain · on Dec 9, 2014

From my memory, the Debian developer did email upstream regarding it - and never got a reply...

duskwuff · on Dec 8, 2014

The behavior involved here was a bizarre edge case: OpenSSL was intentionally using uninitialized data as a (very low quality) source of entropy.

tedunangst · on Dec 8, 2014

There's a bit more to it than that. The Debian maintainer didn't just delete one bad call, they deleted a second not buggy call.

http://research.swtch.com/openssl

mikeash · on Dec 8, 2014

Valgrind's diagnostic here was more or less correct. Using uninitialized memory as an entropy source is pretty useless. Uninitialized memory isn't filled with randomness, it's filled with stuff that's not guaranteed. This makes it harder to predict, but if an attack depended on it, you can bet that a smart attacker will figure out what the uninitialized memory would actually contain in your case. (Edit: I think I should clarify here. Adding uninitialized memory to your entropy pool doesn't hurt, as any good CSPRNG will be robust to adding data to the pool that's known to an attacker. It just doesn't help very much, because it's not very random.)

The Valgrind diagnostic was a good one and it was worth fixing. The problem wasn't Valgrind, but rather the fact that the fix inadvertently broke the code in a way that was difficult to detect.

For a poor analogy, imagine that you ask for an assessment of the structural integrity of a building. The assessment comes back saying that some supports are weak and should be replaced. Based on this report, you replace the supports. However, instead of replacing just the weak supports, you replace all of them, and you replace them with supports that look solid but are completely rotten. Then the building falls down. This is not the fault of the report, but rather the fault of the response to it.

koopajah · on Dec 7, 2014

I love valgrind and it saved my ass countless time when tracking memory leaks or overflows. The main issue I've always had with it is filtering the output easily. When you use external libraries (portaudio or Qt for example but also proprietary libraries I had to use at work) a lot of the debug messages pollute what is really yours and your leak is then a needle in a haystack.

nnethercote · on Dec 8, 2014

You know about suppressions, right? Particularly the --gen-suppressions option? See http://valgrind.org/docs/manual/manual-core.html#manual-core... and http://valgrind.org/docs/manual/mc-manual.html#mc-manual.sup....

koopajah · on Dec 8, 2014

I remember trying multiple solutions with exception files and such but not having a lot of luck at filtering the ones I did not care about while keeping my own errors displayed. It's clearly user error on my end but I always gave up and rewrote smaller samples without external dependency when I could.

codebeaker · on Dec 8, 2014

I tried to run a very small (2k CLOC) C program under Valgrind, unfortunately I was using OpenSSL before the layman knew what a ghetto that could be. I tried a myriad of things to suppress all the insane illegal pointer dereferencing and processor instruction probing by OpenSSL to no avail, and couldn't find bugs in my own code, owing to all of this noise!

I learned too late (blame my feeling overwhelmed whilst playing outside my preferred toolbox) that there's a solution, quoting the OpenSSL docs:

> When OpenSSL's PRNG routines are called to generate random numbers the supplied buffer contents are mixed into the entropy pool: so it technically does not matter whether the buffer is initialized at this point or not. Valgrind (and other test tools) will complain about this. When using Valgrind, make sure the OpenSSL library has been compiled with the PURIFY macro defined (-DPURIFY) to get rid of these warnings. (http://www.openssl.org/support/faq.html#PROG14)

Unfortunately that didn't help me either, as I couldn't get -DPURIFY to work, the compile phase too forever, and I'm not skilled with cross compiling, and my target platform was a RaspberryPi. Apparently I hadn't set myself up for success.

I hope someone else can appreciate Valgrind and avoid stumbling into the issues that I faced, and perhaps learn something about OpenSSL compile flags along the way!

marios · on Dec 9, 2014

I have been following the LibreSSL development and the number of memory leaks fixed is nothing short of amazing. From what I have gathered, OpenSSL PRNG routines are being too smart for their own good. Randomness should be provided by the OS (which takes care of gathering entropy from various sources).

The documentation snippet you quoted actually hides nasty details: I recall one of the early commits in LibreSSL that fixed one of these issues, namely feeding the private key into the entropy pool.

I stumbled upon an interesting tweet the other day [1]: "Re-linked #CFEngine against #libressl instead of #openssl...valgrind output went from 600KB to 2.5KB."

Perhaps you can give that a shot ? [1] https://twitter.com/worr/status/540839911369105408

ComputerGuru · on Dec 8, 2014

Unfortunately both ASAN and Valgrind are broken on FreeBSD. ASAN has the decency to tell you it doesn't support FreeBSD, but while Valgrind doesn't even build in FreeBSD and doesn't list FreeBSD as a supported platform, the FreeBSD port is available but completely borked when testing clang-compiled software (which is everything on FreeBSD these days).

I have to compile my FreeBSD-specific code on Linux to find memory leaks, while praying to God the leak isn't in one of the ifdef'd sections.

z0r · on Dec 7, 2014

I wrote my first sizable professional C program a few years ago. There were issues with some slow memory leaks in my code. I'd only vaguely heard of valgrind before, but I decided to give it a try. A day of debugging with it caught issues that would have taken months to shake out otherwise, if at all. (It also helped me track down the memory leaks.)

FraKtus · on Dec 8, 2014

I use it on Mac OS X even if it's not as good as under Linux. Is there any chance it will get improved under Mac OS X ?

rurban · on Dec 8, 2014

It's fixed, but not packaged. You need to compile it by your own and add some dyld suppressions.

FraKtus · on Dec 8, 2014

Nice, I always compile it myself anyway… I will give it a try… At the moment I keep an old 10.6 machine just for that because under that version Valgrind was running fine.

wazari972 · on Dec 8, 2014

Nice article, although I'm not convinced by the first example (as is it now, and in first position): detecting uninitialized variables is a job for compilers: `gcc -Wuninitialized -Wmaybe-uninitialized`. That works well within a single scope, as it's the case in the example. In more complex situations, like an uninitialized field inside a linked-list structure, valgrind will be very useful indeed! By the way, `gcc -O0` sets my uninitialized variables to 0 without a warning, and valgrind is happy as well ...

tbrock · on Dec 8, 2014

Valgrind is amazing but i can't ever figure out how they name these things. It seems like the only rule is computer jargon followed by the word grind.

    valgrind - a multipurpose tool
    kcachegrind - a gui for callgrind
    helgrind - data race analyzer

Am I the only one that thought "what the fuck" when they heard kcachegrind for the first time?

EDIT: just read this, explained everything: http://valgrind.org/info/tools.html

nnethercote · on Dec 8, 2014

On a similar (superficial) note: almost everyone pronounces the name incorrectly. See http://valgrind.org/docs/manual/faq.html#faq.pronounce.

Dewie · on Dec 8, 2014

A "grind" is a sort of gate, for example a port through a fence that keep livestock contained. "Val" comes from "Valhalla" (as mentioned in the link).

I still thought it was pronounced "val-grined".

beagle3 · on Dec 8, 2014

Back in the days I still did windows software, VS6 reigned supreme, and BoundChecker and Purify did more or less what Valgrind did (valgrind was already very helpful; purify was available on unixes and windows, and was more capable than valgrind at the time)

What's the situation in Windows land these days with respect to these tools?

FraKtus · on Dec 8, 2014

We use Dr. Memory on windows, it's not strictly equivalent but it can also track hard to find problems... http://www.drmemory.org

TazeTSchnitzel · on Dec 7, 2014

Valgrind is wonderful for catching bugs in my C code. It's a shame you even have to use a tool like this, but such are the realities of C programming.

ars · on Dec 8, 2014

Why is that a shame? Instead of including a tool like this on every single program invocation (which is essentially what happens if the language forced array bounds checking and other things) you only do it when you need to.

It's a perfect split - do this when developing the program, and not when running the program.

When you are doing the edit/compile/run cycle of developing you should always run via valgrind. It has options to make it silent except if there are errors so it's not annoying.

If you are running your program normally and only using valgrind sometimes for an extra check you are doing it wrong.

i.e.

    make program && valgrind -q ./program

gsg · on Dec 8, 2014

Bounds checking is "essentially" binary instrumentation? That's an... interesting claim.

TazeTSchnitzel · on Dec 8, 2014

It's a shame because in other languages, if there's a bug in my program which goes uncaught, it won't create a massive security vulnerability or a segfault.

mrich · on Dec 8, 2014

You can protect against that by running your vulnerable program in a VM without large performance loss. However, you cannot easily bring your heavily-checked/cache-inefficient/no-inlining Java/Javascript/Python program up to the same C/C++ speed.

This is not a dig against these languages, rather a justification why people are still using C/C++.

ctz · on Dec 8, 2014

There is no language in the world which, in general, prevents bugs becoming security vulnerabilities.

TazeTSchnitzel · on Dec 8, 2014

Yes, but certain specific types of security issue are impossible in other languages.

ars · on Dec 8, 2014

People say that and it's not true. Any bug that can be caught by a change in the language can also be caught by valgrind.

And if you are still worried about bugs, you can use -fsanitize=....

gizmo686 · on Dec 8, 2014

Valgrind can only catch bugs that are exposed while the program is running under it. In most cases (when you have a good testsuite), this is sufficient. However, especially in the case of security vulnerabilities, you might have a bug that is only exposed on specific or malformed input that non of your testsuites check for. For example, if you are parsing input from an untrusted source, and that input has a length prefixed field. Unless your testsuite includes a message whose length prefix is longer than the actual length, Valgrind will not tell you that you will potentially overflow, because you do not do so in any of your tests.

ars · on Dec 8, 2014

I understood all that, that's why I mentioned -fsanitize

pjmlp · on Dec 8, 2014

Which only works in two C compilers that aren't available in all OS out there.

verytrivial · on Dec 8, 2014

vgdb (the GDB interface binary) plus --tools=massif is like magic for finding resource hogs: hit a break point, dump heap profile, step, run another dump, compare.

If you often find yourself staring at the massive massif "massif" (triple pun FTW) and wondering why things are going up AND down, you need vgdb in your life.

PhasmaFelis · on Dec 8, 2014

"Valgrind" would be an excellent name for a skateboard manufacturer.

JoeAltmaier · on Dec 8, 2014

Not a fan. These tools over-report to the point of uselessness. I get 100 pages of things allocated once and released on exit - which are NOT leaks and not useful to fix. Except to satisfy an OCD impulse.

I imagine there's a way to get only a report of repeated allocations, but somehow I never can find it. I have to wonder, why doesn't the tool produce that report by default? Its what I always want.

Anyway I can count on 1 hand the times this class of tool has found anything of value. They are not worth the effort 99% of the time.