Fuzzing a DNS parser written in Go

artursapek · on Aug 6, 2015

CloudFlare has a constant stream of quality articles about Go, I've really been enjoying them.

brobinson · on Aug 6, 2015

They are probably pushing more traffic through Go than any other group on the planet. It's nice of them to share what they come across, and it's also good PR for them. Hell, even I want to work for them now.

eastdakota · on Aug 6, 2015

We're hiring in San Francisco, London, and Singapore:

https://www.cloudflare.com/join-our-team

brobinson · on Aug 6, 2015

I'm in Taiwan and moving to Japan in a few months, so that's out of the question even though I've got 2.5 years of Go under my belt. :(

outworlder · on Aug 6, 2015

On that note:

Hypothetically speaking, what would take for a wanna-be candidate that knows Go but doesn't have relevant work experience on CloudFlare's area of expertise to get up to speed?

jgrahamc · on Aug 6, 2015

Apply. We have a very diverse team of backgrounds. Getting up to speed isn't a problem.

nindalf · on Aug 12, 2015

Hi. Could I speak to you about this?

maxmcd · on Aug 6, 2015

The CloudFlare blog is not written in Chinese :)

https://www.google.com/trends/explore#q=golang

jacquesm · on Aug 6, 2015

With the exception of Google.

outworlder · on Aug 6, 2015

The possible exception of Google. I wouldn't be surprised if Cloudflare had more traffic exercising Go programs.

jrv · on Aug 6, 2015

Possible, yeah. The highest-throughput Go usage I'm aware of within Google is https://github.com/youtube/vitess, Youtube's MySQL load balancer: "a fundamental component of YouTube's MySQL infrastructure, serving thousands of QPS per server".

Not sure how that compares to Cloudflare's global traffic though, and what other high-throughput Go usages exist at Google.

flexd · on Aug 7, 2015

Apparently dl.google.com is powered by Go. I'm guessing that get's a fair bit of traffic too.

http://talks.golang.org/2013/oscon-dl.slide

0xCMP · on Aug 7, 2015

Thanks for this, it'll be nice to go through this a bit and learn how Go is used from inside Google.

jamwt · on Aug 7, 2015

Dropbox's block storage system is written in go, and we're pushing terabit+ bandwidth around in it; I'd say we're in the running. :-)

jacquesm · on Aug 7, 2015

Gah, I really should hit myself over the head, I read about that and I totally forgot to take you into account.

Storage systems obviously push a ton of bits.

https://blogs.dropbox.com/tech/2014/07/open-sourcing-our-go-...

brobinson · on Aug 7, 2015

Oooooh, good point. I had forgotten you used Go, actually.

dkubb · on Aug 6, 2015

I'd also recommend introducing mutation testing in addition to fuzz testing.

There is some overlap between the two techniques, but they approach the problem from completely different directions. Fuzz testing randomly generates inputs and uses that to find bugs where the input is improperly handled. Mutation testing randomly changes the code to make sure the tests themselves cover all the code that is written.

A good approach is to use mutation testing to make sure the existing code is fully covered by the tests. Then use fuzz testing to identify new failing test cases that need to be written. Once the implementation handles the new test cases, mutation test that code to make sure there aren't any untested code paths.

I used this same approach on an Relational Algebra + SQL generator lib I wrote a while ago and the result was something that was very stable -- the only bugs were due to flaws in my own understanding of the specification, not bugs in the implementation.

eru · on Aug 7, 2015

I'd be interested to read a more detailed write up of the process.

amichal · on Aug 6, 2015

It's amusing that one of the bugs they found and fixed was a buffer reuse bug -- similar to what caused heartbleed -- will help in my 'memory safe languages are not magic bullets' rebuttals.

Taek · on Aug 7, 2015

A big mistake I made early on was assuming garbage collected meant no memory leaks. But alas, you can keep pointers to objects you don't need anymore.

For go in particular, I make mistakes when passing slices to multiple objects. Memory safety slip because the slices are 'passed by value', but underneath they point to the same data. Easy to forget.

hueving · on Aug 6, 2015

So is your rebuttal that since there are some reuse bugs, reading out of bounds is ok?

tedunangst · on Aug 6, 2015

How does one go from "memory safe languages are not magic bullets" to "reading out of bounds is ok"? That's not a type safe argument.

pcwalton · on Aug 6, 2015

I think the point is just that "memory safety is a magic bullet" is a straw man version of the actual argument, "memory safety improves security significantly over not having it".

tedunangst · on Aug 6, 2015

Alas, I see the strawman version trotted out at least as frequently as the more reasonable version.

wtracy · on Aug 6, 2015

I've seem people believe that memory leaks can't happen in programs written in garbage-collected programs. So, yeah, the belief that garbage collection and/or bounds checking is a magic bullet really needs to be addressed.

eru · on Aug 7, 2015

They are damn good bullets. Just not silver bullets.

Restricting mutability would help somewhat. (Another bullet.)

haberman · on Aug 6, 2015

Never listen to anyone who tells you that parsers are easy.

Parsers that work on simple input are easy. Parsers that get all the edge cases and diabolical input right are stupidly difficult.

tomjakubowski · on Aug 6, 2015

> Parsers that work on simple input are easy. Parsers that get all the edge cases and diabolical input right are stupidly difficult.

Not to mention parsers which work on a vague and totally unspecified grammar - like "real world" HTTP and HTML parsers, which have to account for misbehaving clients and servers or malformed web pages (is this what you meant by 'diabolical input'?).

haberman · on Aug 6, 2015

> is this what you meant by 'diabolical input'

Not exactly what I meant, but that's a good point too!

By diabolical input, I mean things specifically designed to stress the parser, like ("{" * 100000) in JSON.

tikhonj · on Aug 7, 2015

Parsers are easy. It's the grammars that are complicated.

Seriously: seemingly simple things designed and described informally, in prose, can easily be incredibly complex when you expand all their details. That says a lot about those things, a little bit about the process and mindset that went into them and virtually nothing about parsing in general.

haberman · on Aug 7, 2015

> Parsers are easy. It's the grammars that are complicated.

What is this supposed to mean? It sounds something like "Programming is easy. It's the classes/objects that are complicated."

tikhonj · on Aug 7, 2015

What I'm saying is that parsing is not inherently hard. But, just like anything else that isn't inherently hard, you can make it as hard as you like by making the input arbitrarily complex and convoluted.

Given a grammar, making a parser is largely trivial. A computer could do it. Making a fast parser, or one with good error messages, or one that's modular and composable are all harder, but still generally not hard.

The problem comes up when you have a bad, overly complex, ill-thought-out, incomplete or vague specification. But anything would be hard with a specification like that, not just parsing. (This feels like something standards bodies seem to enjoy producing though.)

Perhaps it's a bit like saying "programming is easy, the world is complicated" except that the world could be so much less complicated if relevant formal languages were simpler and better-defined.

haberman · on Aug 7, 2015

> The problem comes up when you have a bad, overly complex, ill-thought-out, incomplete or vague specification. But anything would be hard with a specification like that, not just parsing.

Nearly nearly every real text format in the world would fall under your definition of "bad, overly complex, ill-thought-out", thanks to practical considerations always leading to some amount of complexity. You may wish that languages could be nice and clean, but in practice they rarely end up that way.

But aside from that, your statement isn't true. Serialization is an order of magnitude easier than parsing, because for a serializer all you have to do is emit something that stays within the lines delimited by the spec.

For a parser, you have to correctly and safely handle absolutely any construct or variation allowed by the spec.

eru · on Aug 7, 2015

Parsers can be simple, if the language you are parsing is simple.

That's why you should make your languages / protocols as `weak' as possible. Prefer a finite language over a regular language over a context free language, and avoid context sensitive languages like the plague.

_paulc · on Aug 6, 2015

For dnslib [1] I use a much simpler fuzzer as part of the test suite [2] - this is nothing like as sophisticated as afl-fuzz or go-fuzz in terms of understanding and exercising code paths (just takes a DNS record and randomly mutates it - though still found a number of bugs).

Does anyone know if there is anything similar to afl-fuzz/go-fuzz for python?

[1] https://bitbucket.org/paulc/dnslib/ [2] https://bitbucket.org/paulc/dnslib/src/22cff0cd3e13098c00107...

foote · on Aug 6, 2015

https://bitbucket.org/jwilk/python-afl

thomseddon · on Aug 6, 2015

There really aren't many freely available DNS servers that have seen serious production usage - I'm really excited to see RRDNS open sourced as previously hinted at, does anyone know if there is a timeline for this?

jrv · on Aug 6, 2015

A Prometheus contributor also fuzzed our query language parser using go-fuzz and amongst other things discovered a really interesting Unicode-related parsing bug:

https://twitter.com/PrometheusIO/status/626187068356624384

Arnavion · on Aug 6, 2015

Is there anything available to fuzz parsers written in JavaScript? (I found tunz/afl-fuzz-js and some others but they seem to be for fuzzing parsers of JavaScript.)