They are probably pushing more traffic through Go than any other group on the planet. It's nice of them to share what they come across, and it's also good PR for them. Hell, even I want to work for them now.
Hypothetically speaking, what would take for a wanna-be candidate that knows Go but doesn't have relevant work experience on CloudFlare's area of expertise to get up to speed?
Possible, yeah. The highest-throughput Go usage I'm aware of within Google is https://github.com/youtube/vitess, Youtube's MySQL load balancer: "a fundamental component of YouTube's MySQL infrastructure, serving thousands of QPS per server".
Not sure how that compares to Cloudflare's global traffic though, and what other high-throughput Go usages exist at Google.
I'd also recommend introducing mutation testing in addition to fuzz testing.
There is some overlap between the two techniques, but they approach the problem from completely different directions. Fuzz testing randomly generates inputs and uses that to find bugs where the input is improperly handled. Mutation testing randomly changes the code to make sure the tests themselves cover all the code that is written.
A good approach is to use mutation testing to make sure the existing code is fully covered by the tests. Then use fuzz testing to identify new failing test cases that need to be written. Once the implementation handles the new test cases, mutation test that code to make sure there aren't any untested code paths.
I used this same approach on an Relational Algebra + SQL generator lib I wrote a while ago and the result was something that was very stable -- the only bugs were due to flaws in my own understanding of the specification, not bugs in the implementation.
It's amusing that one of the bugs they found and fixed was a buffer reuse bug -- similar to what caused heartbleed -- will help in my 'memory safe languages are not magic bullets' rebuttals.
A big mistake I made early on was assuming garbage collected meant no memory leaks. But alas, you can keep pointers to objects you don't need anymore.
For go in particular, I make mistakes when passing slices to multiple objects. Memory safety slip because the slices are 'passed by value', but underneath they point to the same data. Easy to forget.
I think the point is just that "memory safety is a magic bullet" is a straw man version of the actual argument, "memory safety improves security significantly over not having it".
I've seem people believe that memory leaks can't happen in programs written in garbage-collected programs. So, yeah, the belief that garbage collection and/or bounds checking is a magic bullet really needs to be addressed.
> Parsers that work on simple input are easy. Parsers that get all the edge cases and diabolical input right are stupidly difficult.
Not to mention parsers which work on a vague and totally unspecified grammar - like "real world" HTTP and HTML parsers, which have to account for misbehaving clients and servers or malformed web pages (is this what you meant by 'diabolical input'?).
Parsers are easy. It's the grammars that are complicated.
Seriously: seemingly simple things designed and described informally, in prose, can easily be incredibly complex when you expand all their details. That says a lot about those things, a little bit about the process and mindset that went into them and virtually nothing about parsing in general.
What I'm saying is that parsing is not inherently hard. But, just like anything else that isn't inherently hard, you can make it as hard as you like by making the input arbitrarily complex and convoluted.
Given a grammar, making a parser is largely trivial. A computer could do it. Making a fast parser, or one with good error messages, or one that's modular and composable are all harder, but still generally not hard.
The problem comes up when you have a bad, overly complex, ill-thought-out, incomplete or vague specification. But anything would be hard with a specification like that, not just parsing. (This feels like something standards bodies seem to enjoy producing though.)
Perhaps it's a bit like saying "programming is easy, the world is complicated" except that the world could be so much less complicated if relevant formal languages were simpler and better-defined.
> The problem comes up when you have a bad, overly complex, ill-thought-out, incomplete or vague specification. But anything would be hard with a specification like that, not just parsing.
Nearly nearly every real text format in the world would fall under your definition of "bad, overly complex, ill-thought-out", thanks to practical considerations always leading to some amount of complexity. You may wish that languages could be nice and clean, but in practice they rarely end up that way.
But aside from that, your statement isn't true. Serialization is an order of magnitude easier than parsing, because for a serializer all you have to do is emit something that stays within the lines delimited by the spec.
For a parser, you have to correctly and safely handle absolutely any construct or variation allowed by the spec.
Parsers can be simple, if the language you are parsing is simple.
That's why you should make your languages / protocols as `weak' as possible. Prefer a finite language over a regular language over a context free language, and avoid context sensitive languages like the plague.
For dnslib [1] I use a much simpler fuzzer as part of the test suite [2] - this is nothing like as sophisticated as afl-fuzz or go-fuzz in terms of understanding and exercising code paths (just takes a DNS record and randomly mutates it - though still found a number of bugs).
Does anyone know if there is anything similar to afl-fuzz/go-fuzz for python?
There really aren't many freely available DNS servers that have seen serious production usage - I'm really excited to see RRDNS open sourced as previously hinted at, does anyone know if there is a timeline for this?
A Prometheus contributor also fuzzed our query language parser using go-fuzz and amongst other things discovered a really interesting Unicode-related parsing bug:
Is there anything available to fuzz parsers written in JavaScript? (I found tunz/afl-fuzz-js and some others but they seem to be for fuzzing parsers of JavaScript.)