Hacker News new | past | comments | ask | show | jobs | submit login
Fuzzing a DNS parser written in Go (cloudflare.com)
165 points by jgrahamc on Aug 6, 2015 | hide | past | favorite | 39 comments



CloudFlare has a constant stream of quality articles about Go, I've really been enjoying them.


They are probably pushing more traffic through Go than any other group on the planet. It's nice of them to share what they come across, and it's also good PR for them. Hell, even I want to work for them now.


We're hiring in San Francisco, London, and Singapore:

https://www.cloudflare.com/join-our-team


I'm in Taiwan and moving to Japan in a few months, so that's out of the question even though I've got 2.5 years of Go under my belt. :(


On that note:

Hypothetically speaking, what would take for a wanna-be candidate that knows Go but doesn't have relevant work experience on CloudFlare's area of expertise to get up to speed?


Apply. We have a very diverse team of backgrounds. Getting up to speed isn't a problem.


Hi. Could I speak to you about this?


The CloudFlare blog is not written in Chinese :)

https://www.google.com/trends/explore#q=golang


With the exception of Google.


The possible exception of Google. I wouldn't be surprised if Cloudflare had more traffic exercising Go programs.


Possible, yeah. The highest-throughput Go usage I'm aware of within Google is https://github.com/youtube/vitess, Youtube's MySQL load balancer: "a fundamental component of YouTube's MySQL infrastructure, serving thousands of QPS per server".

Not sure how that compares to Cloudflare's global traffic though, and what other high-throughput Go usages exist at Google.


Apparently dl.google.com is powered by Go. I'm guessing that get's a fair bit of traffic too.

http://talks.golang.org/2013/oscon-dl.slide


Thanks for this, it'll be nice to go through this a bit and learn how Go is used from inside Google.


Dropbox's block storage system is written in go, and we're pushing terabit+ bandwidth around in it; I'd say we're in the running. :-)


Gah, I really should hit myself over the head, I read about that and I totally forgot to take you into account.

Storage systems obviously push a ton of bits.

https://blogs.dropbox.com/tech/2014/07/open-sourcing-our-go-...


Oooooh, good point. I had forgotten you used Go, actually.


I'd also recommend introducing mutation testing in addition to fuzz testing.

There is some overlap between the two techniques, but they approach the problem from completely different directions. Fuzz testing randomly generates inputs and uses that to find bugs where the input is improperly handled. Mutation testing randomly changes the code to make sure the tests themselves cover all the code that is written.

A good approach is to use mutation testing to make sure the existing code is fully covered by the tests. Then use fuzz testing to identify new failing test cases that need to be written. Once the implementation handles the new test cases, mutation test that code to make sure there aren't any untested code paths.

I used this same approach on an Relational Algebra + SQL generator lib I wrote a while ago and the result was something that was very stable -- the only bugs were due to flaws in my own understanding of the specification, not bugs in the implementation.


I'd be interested to read a more detailed write up of the process.


It's amusing that one of the bugs they found and fixed was a buffer reuse bug -- similar to what caused heartbleed -- will help in my 'memory safe languages are not magic bullets' rebuttals.


A big mistake I made early on was assuming garbage collected meant no memory leaks. But alas, you can keep pointers to objects you don't need anymore.

For go in particular, I make mistakes when passing slices to multiple objects. Memory safety slip because the slices are 'passed by value', but underneath they point to the same data. Easy to forget.


So is your rebuttal that since there are some reuse bugs, reading out of bounds is ok?


How does one go from "memory safe languages are not magic bullets" to "reading out of bounds is ok"? That's not a type safe argument.


I think the point is just that "memory safety is a magic bullet" is a straw man version of the actual argument, "memory safety improves security significantly over not having it".


Alas, I see the strawman version trotted out at least as frequently as the more reasonable version.


I've seem people believe that memory leaks can't happen in programs written in garbage-collected programs. So, yeah, the belief that garbage collection and/or bounds checking is a magic bullet really needs to be addressed.


They are damn good bullets. Just not silver bullets.

Restricting mutability would help somewhat. (Another bullet.)


Never listen to anyone who tells you that parsers are easy.

Parsers that work on simple input are easy. Parsers that get all the edge cases and diabolical input right are stupidly difficult.


> Parsers that work on simple input are easy. Parsers that get all the edge cases and diabolical input right are stupidly difficult.

Not to mention parsers which work on a vague and totally unspecified grammar - like "real world" HTTP and HTML parsers, which have to account for misbehaving clients and servers or malformed web pages (is this what you meant by 'diabolical input'?).


> is this what you meant by 'diabolical input'

Not exactly what I meant, but that's a good point too!

By diabolical input, I mean things specifically designed to stress the parser, like ("{" * 100000) in JSON.


Parsers are easy. It's the grammars that are complicated.

Seriously: seemingly simple things designed and described informally, in prose, can easily be incredibly complex when you expand all their details. That says a lot about those things, a little bit about the process and mindset that went into them and virtually nothing about parsing in general.


> Parsers are easy. It's the grammars that are complicated.

What is this supposed to mean? It sounds something like "Programming is easy. It's the classes/objects that are complicated."


What I'm saying is that parsing is not inherently hard. But, just like anything else that isn't inherently hard, you can make it as hard as you like by making the input arbitrarily complex and convoluted.

Given a grammar, making a parser is largely trivial. A computer could do it. Making a fast parser, or one with good error messages, or one that's modular and composable are all harder, but still generally not hard.

The problem comes up when you have a bad, overly complex, ill-thought-out, incomplete or vague specification. But anything would be hard with a specification like that, not just parsing. (This feels like something standards bodies seem to enjoy producing though.)

Perhaps it's a bit like saying "programming is easy, the world is complicated" except that the world could be so much less complicated if relevant formal languages were simpler and better-defined.


> The problem comes up when you have a bad, overly complex, ill-thought-out, incomplete or vague specification. But anything would be hard with a specification like that, not just parsing.

Nearly nearly every real text format in the world would fall under your definition of "bad, overly complex, ill-thought-out", thanks to practical considerations always leading to some amount of complexity. You may wish that languages could be nice and clean, but in practice they rarely end up that way.

But aside from that, your statement isn't true. Serialization is an order of magnitude easier than parsing, because for a serializer all you have to do is emit something that stays within the lines delimited by the spec.

For a parser, you have to correctly and safely handle absolutely any construct or variation allowed by the spec.


Parsers can be simple, if the language you are parsing is simple.

That's why you should make your languages / protocols as `weak' as possible. Prefer a finite language over a regular language over a context free language, and avoid context sensitive languages like the plague.


For dnslib [1] I use a much simpler fuzzer as part of the test suite [2] - this is nothing like as sophisticated as afl-fuzz or go-fuzz in terms of understanding and exercising code paths (just takes a DNS record and randomly mutates it - though still found a number of bugs).

Does anyone know if there is anything similar to afl-fuzz/go-fuzz for python?

[1] https://bitbucket.org/paulc/dnslib/ [2] https://bitbucket.org/paulc/dnslib/src/22cff0cd3e13098c00107...



There really aren't many freely available DNS servers that have seen serious production usage - I'm really excited to see RRDNS open sourced as previously hinted at, does anyone know if there is a timeline for this?


A Prometheus contributor also fuzzed our query language parser using go-fuzz and amongst other things discovered a really interesting Unicode-related parsing bug:

https://twitter.com/PrometheusIO/status/626187068356624384


Is there anything available to fuzz parsers written in JavaScript? (I found tunz/afl-fuzz-js and some others but they seem to be for fuzzing parsers of JavaScript.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: