There are 13 root nameserver IPs, but there are actually about 1500 root nameservers. Anycast addressing is used so that multiple root nameservers can use the same IP (requests to a root nameserver IP will be routed to the “nearest” nameserver using that IP).
It is trivial to write a recursive resolver. It is stupid hard to write a recursive resolver that can successfully talk to all the authoritative servers on the internet.
An often overlooked function of recursive implementers and operators is they are the glue that patches around bugs in client implementations and authoritative servers.
Why is talking to all authoritative servers so hard? As long as you cover all the cases such as CNAMES and have the IPs of the roots, you seem to be good to go.
You say "cover all the cases" like it is trivial. The RFCs only define about 60% of the behavior, with the rest left up to the implementer. As a recursive you have to deal with every difference of opinion of authoritative implementers and the bugs they might have introduced.
Very neat write-up, and a good gentle introduction to DNS.
This is a topic near to my heart: about a year ago I completed what I called a "software pilgrimage", where I wrote my own recursive resolver + authoritative nameserver using only the Java standard library, netty, and a few odds and ends like yaml parsers and CLI arg parsers. But nothing related to DNS at all. And I only allowed myself the DNS RFCs 1034 and 1035.
This was partly just a pilgrimage - a spiritual software journey - and also partly scratching an itch - I wanted a local DNS server that provided the ability to host my own internal TLD, and I wanted a nice web UI. And at the time I a) didn't really like pihole's web UI and b) pihole didn't provide easy local DNS names, so I was like, this seems like a great candidate for a nice side project.
So, off I went, writing a library for parsing DNS messages and RRs, and so on. I implemented the RFCs, and for some reason I decided I wanted my DNS resolution to depend on a postgres database functioning, so I used that to store everything. And I wrote a Rails web UI to be the nice front-end.
And so, after a fashion, I finished, and lo and behold the resolver worked. I could resolve the majority of simple queries I'd try. And so I deployed it and let it run for a while.
I was unsurprised when I eventually found names that wouldn't resolve. Turns out DNS is old and creaky and there are many misbehaving nameservers out there and not everyone agrees on what various things should mean.
Some things off the top of my head that took nontrivial tweaking to get right:
console.aws.amazon.com has an interesting lookup chain including some intermediate server that responds NXDOMAIN despite serving you up records that do move you closer to your answer.
It took quite a bit of creativity to finally stop inevitably ending up in infinite resolution loops. I will admit that the only thing that probably saved me from being blackholed in the beginning is that eventually I'd hit the max java heap size and seize up due to GC thrashing. But I finally put a couple ironclad safeguards in place.
There's some goofiness where nameservers will tell you to go ask nameservers that are in their own zone, and not provide glue A records to stop it from being an infinite loop. Example: querying bar.foo.com telling you to go ask ns1.bar.foo.com for bar.foo.com's IP. Great, thanks, highly helpful. But it's OK if you give me an A record in the AR section that tells me ns1.bar.foo.com.
A slight variation of the above: they tell you the NS record and the A glue record, but the TTL of the A is shorter than that of the NS, so the A will expire and you'll just have the NS and the only way to re-find it would be to go get the glue records again.
DNS is fascinating, and I love/hate/love it. It continues to fascinate me how close just those two initial RFCs get you to a working, usable resolver, but also how far they leave you from something you could really depend on. After a year-ish of post-initial-version tweaking, I think I mostly have it pretty solid. But I can also guarantee you I will notice at least one site that won't resolve within, say, the next 6 months.
My project is open-source, and I'd link to it, but it's under my real name, and I'm loathe to link to my realsona from the internet. And I've been kind of eyeing some of the job postings at Cloudflare for the 1.1.1.1 resolver, and I've been considering using this as a cute cover letter topic.
There are 13 root nameserver IPs, but there are actually about 1500 root nameservers. Anycast addressing is used so that multiple root nameservers can use the same IP (requests to a root nameserver IP will be routed to the “nearest” nameserver using that IP).