Hacker News new | past | comments | ask | show | jobs | submit login
Serverless DNS: Self-hosted DNS resolver at the edge (github.com/serverless-dns)
262 points by saltymimir on July 30, 2022 | hide | past | favorite | 83 comments



> Cloudflare Workers and Deno Deploy are ephemeral, as in, the process that serves client request is not long-lived, and in fact, two back-to-back requests may be served by two different isolates (processes)

I suspect this would impact latency. Any benchmarks done to compare Cloudflare workers, Deno and fly.io for this specific application (i don't think ping alone is fair)? I'm guessing fly.io is more suitable here. Also, DoH clients generally maintain a pool of connections to the DoH server i'm not completely sure how this is handled with something like Cloudflare workers.


> I suspect this would impact latency.

Why is that?


I guess parent is refering to cold starts. With enough requests/sec and a good scaling mechanism it shouldn't be a problem though.


Cold starts for Deno deploy and Cloudflare workers are much shorter than something like AWS lambda or Google functions.


That doesn't invalidate the cold start argument though.


Spinning up a process takes some time, adds up across many requests.


Cloudflare Workers uses isolates, not processes.[0] They start much faster, typically in single-digit milliseconds.

In fact, Workers can usually spin up an isolate in parallel with the TLS handshake.[1] After receiving the SNI packet containing a hostname, it'll go start up the isolate for that host, if it isn't running already. TLS typically needs another round trip from there to do a key exchange before application data starts flowing, by that time the isolate is ready to serve. In that case, there is no added latency.

(I am the tech lead for Workers.)

[0] https://blog.cloudflare.com/cloud-computing-without-containe...

[1] https://blog.cloudflare.com/eliminating-cold-starts-with-clo...


Thanks for the article link. It is quite interesting to me that is only possible because we all need to trust the V8 sandboxing anyways. It makes sense since it should not be compromised on the other end of the connection either. However, one should still probably be aware that any exploit would be probably much more practical than e.g. a spectre attack.


Very interesting, thanks for sharing this @kentonv. After four years it might warrant a fresh follow-up conversation, so I submitted:

https://news.ycombinator.com/item?id=32289979

I hope to learn if anyone else has been using Isolates to great, or any, effect.


That is crazy cool - thanks for sharing!


Wow hadn’t heard about isolates at all, really cool! thanks for sharing!


Cool, thanks for pointing this out :).


Any tips on getting a job at cloud flare as a new grad; it's one of my dream companies.



Processes, threads , isolates who cares? It’s all so small right?


This is an incredibly small for to use up compute. I bet 1ms compute usage avg. on cloudflare workers even with coldstarts.


Does this have a clear advantage over Pihole? I see the android app and that's nice but not enough of a killer feature (for me to want) to switch.

Pihole still offers nice things that a cloud solution can't, like local network resolution and DHCP.


Most importantly you can use it transparently outside of your network; you have a single DNS service with a single configuration available everywhere. You can of course use this server as your upstream resolver on local networks, with a local resolver like CoreDNS too, which gives you the best of both worlds: CoreDNS can serve local IPs with normal DHCP configuration, and any other requests can go upstream (securely) to your cool custom DNS-over-HTTPS server. So a bit of both worlds.

Not everyone sees this as an advantage, and even if you do see it that way, you still might not need it. If PiHole works for you, keep using it. I actually want this because I want to share my adblocking/secure DNS setup with my less-technical friends and family, none of whom live with me/share a network/VPN. So needing no new software on their end, just a new resolver to be configured, is very appealing. It can work everywhere on all their devices and it's very easy to configure.

Taking it further: you can customize the DNS path as you wish with your own code in these designs. It's definitely not for everyone and if you like the convenience DHCP/local resolution provides, I wouldn't necessarily switch. But once you actually can like, use your DNS endpoint as an API, and you can configure your DNS resolver programmatically from any language anywhere via HTTP APIs, a lot of neat things become possible. I actually configure my custom DNS resolvers with custom service names pointing to my local devices already; I don't rely on just DHCP+hostname to provide the right resolvable name. And doing this can be as simple as a POST request to a custom endpoint I wrote; the resolver can then just serve custom A/AAAA records for those entries. So if you want flexibility/a custom DNS network, it's very appealing. But if you don't want it, I wouldn't worry about it much.


> Most importantly you can use it transparently outside of your network; you have a single DNS service with a single configuration available everywhere.

As an aside: for a generic replacement for pi-hole inside and outside one’s local network, NextDNS (it’s not self-hosted) works fine. It allows setting up ad blocking and tracker blocker filters from common filter lists, allows custom allow and deny lists, and provides 300k queries a month in the free plan (when this limit is exceeded, the DNS works but not the filter lists).


For comparison to 300k queries, I use pihole and ublock, we basically have 4 users, we're in UK and have one desktop computer, 4 mobile phones, a laptop, a game console, a TV with Netflix (no other pay TV). Most daytimes there's only one person home.

We do an average of ~7k DNS requests per day, somehow. Our max day was 26000 requests ... I assume everyone is abusing DNS for tracking (heartbeat of some sort?).

Microsoft seem to hit the pihole rate limits often, presumably if you block their tracking domains they think making hundreds of requests a minute is the way forward ... it's almost like they're trying to DoS me with my own computer.


Tailscale + pihole is another way to achieve using same config everywhere. It's pretty easy to setup (by a HN definition of easy, would not suggest for any typical user). https://tailscale.com/kb/1114/pi-hole/

NextDNS/Adguard-DNS are more user friendly options.


Have you looked at Consul ? It does exactly what you describe for the DNS functionality.


Yes, I have evaluated Consul (but not deployed it), though of course it was originally designed for a bit of a different use case for server-side environments; though I guess there's nothing that would prohibit it from doing this exact thing. Custom and programmable DNS resolution has a lot of points in the design space...


> "Telling a programmer there's already a library to do X is like telling a songwriter there's already a song about love." -- Pete Cordell

You may be right, but a serverless resolver ain't anything I've seen before! Awesome project, and I'm glad to see that it all comw together-- not all projects do!


On my Android phone I use Wireguard to route DNS traffic to my PiHole server at home so I get all the benefits on the go. Also run AdAway at the same time for double protection.


But what's the latency?


Naturally this depends primarily on how far away from home you happen to be.

If you take a base case of ~1gbps simultaneous fiber link at your house and you are within 10s of miles, the added latency will probably be noticable but not horrendous (back of the napkin says 20-100ms, mostly due to 5g wireless hiccups and cross-network carrier transit).

Browsing HN: May not notice

Watching YouTube or other more bandwidth intensive activities: Probably could notice a little lag and/or longer load times. Even if you have 1gbps upload at home, in some (or many) cases you may only achieve 100mbps or possibly even less to your device (I've tested this manner of Wireguard PTP extensively IRL).

Mileage will vary.


I haven't seen any noticable problems. Got a 400/20 connection in southern California.


Or just don't route all traffic through your WG tunnel. Change allowed ip from 0.0.0.0/24 to the local ranges you want to reach fx 10.6.0.0/24


Would this likely break of goal of having / leveraging pi-hole?


No it just tunnels your DNS traffic and traffic to the 10.6.0.* IPs. Could also add with a comma 192.168.1.0/24 for other local IPs you want to route through WG.


Try using nextDNS. NextDNS + their CLI tool/proxy running locally is super powerful.


There's no info here on how you secure these servers? Couldn't someone just start using your resolver and end up costing you money?


Unfortunately, neither DoT nor DoH have any great features for client authorization. Client certificates would have been great.

On DoH you could put an API token in the URL. On DoT you could encode something similar in the `Host` header (though this isn't really secure as the SNI is retrievable so questionable how effective it really is and I'm not even sure if this is achievable on the edge runtimes).

Adding the DoH-token feature could still make sense, I guess? Unfortunately AFAIK Android supports only DoT, not DoH.

EDIT: This just in, DoH3 in Android: https://security.googleblog.com/2022/07/dns-over-http3-in-an...



I was also wondering about the /configure endpoint. There was no mention of access control.


Awesome! Need more alternatives to pihole. Going to make an installer for this for our home hosting hardware now :)


I moved from pihole to Adguard when I replaced my firewall. The UI doesn't offer as much, but for the common case of loading blacklists for the network, I like it better.


All these solutions do very little for privacy, and DNS resolution is a big privacy hole.

I'll stick with the Tor Browser where I can, but we really need Tor-backed local resolvers.


This isn't strictly true, as you control the dns I can block tracking domains, malware etc. For every device on my network.

My p30 Pro phones home on a lot of domains.

now that they are blocked via a pi hole my phone can no longer send that data.

I would count this as a plus to my privacy


Sure, but it'd still be an improvement if your pi-hole could use Tor or other anonymized way to connect to outside resolvers.


I have computers today with enough storage space to hold entire multi-GB public zone files. The storage capability keeps increasing. However I only use a small fraction of that data. In fact, I have computers that can hold the DNS data for every domain name I will ever use in a lifetime.

Of that data, a relatively small fraction changes periodically. Most of it is static. Generally, I only do remote DNS data retrieval periodically, not immediately preceding each and every HTTP request when accessing a www site.

Every user is different but by controlling what RFC 1035 calls the "Master File" of DNS data I can avoid remote DNS lookups altogether. This speeds up www use for me, greatly. YMMV.

The point that get missed in these discussions, IMHO, is that DNS is not just an issue of speed.^1 (And users can improve speed without help from third parties.) DNS is also an issue of control. Controlling DNS allows me as a user to disable the www's dark patterns where the user selects a domain name to access and the "browser" connects to various domain names to which the user had no intention of connecting.^2 I can easily thwart unecessary, unwanted phoning home, telemetry, tracking and online advertising because they all rely on using DNS that is, to some degree if not wholly, outside the user's control.

1. For example, Google can undoubtedly win the race for DNS speed however the www user will always lose the contest over _control_.

2. Originally this auto-fetching feature may not have been intended to support "dark patterns". However its usage today is a key element of those practices. There are companies today whose vision for the www is shaped by a need for programmitic advertising and the privacy invasion that this requires. They puch for standards and protocols optimised to support "complex" web pages comprised of many components, potentially controlled by various third parties, the most important of which are related to _advertising_. A www user might have a different vision. For example, I am able to use the www quite effectively for informtation retrieval (not commerce) without using auto-fetching.^3 I treat www pages as "simple" ones with only one significant component and none controlled by third parties. This allows me to consume larger quantities of information more rapidly, with less distraction. "Simple" www pages are more valuable to me than complex ones. Though they might be less valuable to "tech" companies seeking to sell advertising services.

3. Common Crawl, the source for much-hyped "AI" projects such as GPT-3, uses the www in a similar way. There are no components for "complex" websites such as Javascript files in the archives.


Is there a torrent that gets updated regularly, or where/how do you download the zone files for all the TLDs? And what dns server software do you use?


Yes, I'd love to know more about how you implemented your setup.


"I have computers today with enough storage space to hold entire multi-GB public zone files. The storage capability keeps increasing. However I only use a small fraction of that data."

What this means is that I do not need to store entire zone files. I only need to store the data for the domain names I will use. The point about storage capability is that this is no longer a limiting factor. When I started using the www, storage space was a limiting factor. I could not store the DNS data for every name I would ever use on a personal computer. Even the RAM on today's computers can be larger than the size of HDDs from the time when I started using the www. Everything has changed.

"For example, I am able to use the www quite effectively for information retrieval (not commerce) without using auto-fetching.^3 I treat www pages as "simple" ones with only one significant component and none controlled by third parties."

What this means is that the set of names I will use is (generally) deterministic. For example, if I aim to access the index.html page at https://example.com, I only retrieve the DNS data for example.com. The set of names for which I must retrieve DNS data is known, a priori.^1 To give a more practical example, I start with a list of all the domain names represented in HN submissions (cf. comments). I retrieve DNS data for those names only. (NB. A small minority of www sites submitted to HN do change hosting providers occassionally or change IP addresses relatively frequently.)

Thus when I read HN submissions, I am not performing any remote DNS queries. At an earlier point, I have performed bulk DNS data retrieval for all domain names in HN submmissions. The DNS data is stored in the memory of a localhost forward proxy or in custom zone files served by a localhost authoritative nameserver.

Another example might be domains found in Google Scholar search results. I collect these names from a series of searches then retrieve the DNS data in bulk. Then I can search and retrieve papers from many sources found through Scholar without making remote DNS queries.

There are a variety of sources for bulk DNS data. Some potential sources are

Public zone file access programs (Contact the registry. Many zones are available through ICANN's CZDS program.) https://czds.icann.org

Public scan data (Sadly, Rapid7 recently stopped publishing their foward DNS data.)

DoH open resolvers (Using HTTP/1.1 pipelining.)

Common Crawl archives (By extracting WARC-TARGET-IP.)

1. In contrast to using browser auto-fetching where I have no idea what other domain names might be automatically looked up when I visit example.com.


The biggest thing this is missing to make it turnkey is DDR, "Discovery of Designated Resolvers". I have deployed multiple iterations of my own custom DNS setup for my home network, and I keep coming back to these "Serverless" things for DNS, because they fit the usage profile very, very well, and don't need any extra work for your home network vs a WAN, and in some ways are actually can be more reliable, since availability is critical and these per-request service models abstract those concerns away a bit (I have more than once had to unfuck a lot of stuff after a CoreDNS outage on my network.) I've been waiting for this for a while now, because it means I can finally make a custom, secure DoH deployment available to all my friends and family: https://techcommunity.microsoft.com/t5/networking-blog/makin...

The TL;DR is that these serverless offerings require you to use the actual HTTPS hostname they expect, so it can actually, you know. Work. They are often run on cloud servers so you have to have a proper 'Host:' field configured when doing HTTP requests to resolve the service correctly and begin doing secure queries. But then how do you do the initial bootstrap and find the HTTPS hostname to use?

So if you want this turnkey, like, "I could configure my non-technical family PC to use it", you really need one extra piece: an ordinary DNS server on port 53 UDP. You actually configure your users to use this DNS server, but its only real job is to then point them to the real DoH server, with the hostname given, thus bootstrapping the connection. (Read the blog post about how this initial query is secured, I'll leave that to you.)

This kind of throws a wrench in the serverless thing, because you need some DNS service sitting on port 53 somewhere. But this initial bootstrap is much less latency sensitive than normal DNS and it is needed infrequently, so you could probably do this fine with CoreDNS and a shit $1 VPN on the internet. As a bonus, if you have clients that do not support DDR, you could configure this resolver to transparently use your serverless DOH resolver as a backend (so there's no difference in resolved names, just the features available.)

It looks like Deno is the only serverless offering I can see that offers UDP support, which means you could, for their platform only, avoid the intermediate VPS and have an entire DoH+DDR capable stack all at once. That's very appealing; maybe I should sign up...


> The biggest thing this is missing to make it turnkey is DDR

It’s considered polite to define any terms you use:

https://datatracker.ietf.org/doc/draft-ietf-add-ddr/


I was able to fix my original post in time, thanks.


DNS over 53 is implementable on Fly [0]. And in the near future, Cloudflare Workers should support it too (at least over TCP if not UDP) [1].

[0] https://github.com/serverless-dns/serverless-dns/issues/67

[1] https://blog.cloudflare.com/introducing-socket-workers/


Node-based Javascript code that calls the shell on my Linux server?

Nope, thank you.

Living dangerously is one thing, being suicidal is another.

https://github.com/serverless-dns/serverless-dns/blob/main/s...


I was curious about this and it's apparently needed on Fly.io specifically, only to enable swap space. I assume this is due to a "quirk" of the nodejs runtime: many runtimes, not just node, tend to commit lots and lots of virtual address space for various (good) reasons to fulfill various roles, but don't necessarily need to back all that with physical memory. However, Linux will require swap space to back some of these commitments in practice (e.g. private memory mapped allocations will need to spill somewhere, so they can only be backed by swap, IIRC.) The node.js runtime almost certainly requires more virtual memory commitment than the default Fly.io runtime you pick (128Mb, I'd guess,) and it doesn't seem to provide swap out of the box, because...

Fly.io simply boots a Linux kernel, and the Docker image you give it is then treated if it was full filesystem for that "instance". Whatever is in the container image is all you get. Many container environments aren't going to have init-like frameworks to initialize swap before booting the application; in fact the container only having say, node.js, a few .js files, and starting that immediately after kernel boot is actually a pretty good startup time optimization. So if you need swap space, because the runtime makes anonymous memory commitments larger than physical memory -- the only way to guarantee that is in the application path itself, like done here. You could also do a shell script I guess. You'd have to do this in any similar low-memory environment, really, if the init system didn't.

Seems reasonable; nothing too suspicious or odd in the long run. It's nothing you're going to need outside of these new-fangled container-like environments (besides, in this case, the shell commands are easily auditable and not in any way user-controllable through user input, vastly reducing defect surface area.) Took 10 minutes to audit and figure out if you've had some previous exposure to runtime tuning; comments to explain this would have been nice, though...


> Fly.io simply boots a Linux kernel, and the Docker image you give it

There is nothing "simply" in a Rube Goldberg machine of such magnitude.


The app specifically creates swap iff it detects it is running on Fly.io: https://github.com/serverless-dns/serverless-dns/blob/9cb3f4...


The code you reference just invokes a shell on a worker in the cloud, not your vps or whatever.


It is for caching DNS queries on fly.io container I believe.


I wonder what can this be used for?


Not the author of this app, but I found this to be very useful for circumventing domain blocks made by ISPs / sovereign entities[0].

Let's say that the government / some central entity takes the blocking a step further by blocking Cloudflare's DNS-over-HTTPS (DoH) endpoint. I could just spin up a new instance on fly.io (or really any other service of your choosing), and use the new endpoint as the new DoH endpoint.

What I like about this service is the fact that I can still use a blocklist to block trackers & ads, just like how you would with NextDNS. Most of the services listed in the example page are pretty generous with their free plans, so the whole setup may end up being cheaper than the Pro plan[1] of NextDNS.

[0]: A number of quite essential services just got blocked by the government where I live, so this is a very real possibility.

[1]: https://nextdns.io/pricing


My ISP analyzes the SNI headers. I really need Encrypted Client Hello.


For others not familiar with SNI vs ECH, Cloudflare has a post on it:

https://blog.cloudflare.com/encrypted-client-hello/


I really really like how your username looks.


Cloudflare Workers supports ECH out of the box. Also, one can deploy serverless-dns against any sub-domain that's available with underlying provider (mydoh.workers.dev, yourdoh.deno.dev, dohapp.fly.dev, etc) and keep changing the sub-domain for free to defeat SNI-based censorship.


Not this project specifically, but a DoH resolver of your own is pretty nice. It's almost impossible for someone to reliably block it by filtering DNS packets (many public networks do for some reason), you can do your own crazy levels of caching (I ignore TTLs and serve stale responses for speed), in general my setup for this just works very pleasantly.


If you want PiHole at all times - at home, while traveling - but don't have a Raspberry Pi.

Use cases: Block ads and tracking domains. Block malware domains. Parental control.

Bonus: Do all that over DoH/DoT to avoid ISP/government/hotel snooping or censoring.


Probably more for privacy reasons. And maybe if you set it to resolve to an adguard or pihole instance it could be for adblocking on the DNS level. Which really is quite effective a lot of the more spammy ads, even though it can't really do anything about Youtube (since they use the same domains for content and ads so blocking ads blocks content too).


Hosting your own DNS resolver.


Why on earth would you want serverless dns?


comparing to self hosted pi-hole, this would allow you to take advantage of ad and other content blocking outside of your home wifi.


Use Wireguard and you can use your home PiHole setup anywhere.


You could host pihole on a free-tier AWS or other cloud provider instance and wouldn't need worry about startup latency of something like Workers or Lambda.


On my pi-hole, I also host pi-vpn (Wireguard): https://www.pivpn.io/


This is a good reason and exactly what I get out of using Tailscale.


We're calling code running on Cloudflare, Deno, or Fly.io "self-hosted" now?


And serverless, for that matter.


[flagged]


To state the blindingly obvious: It's a perfectly valid term for what OP / the github repo refers to and your response just goes on to show that you probably don't know much about these things.

Let me spoil it to you: The cloud is also not an actual cloud, it's just someone else's server.


This seems like a very pedantic and harsh comment? I’d expect most people at HN to understand what “serverless” means, and it appears that tight integration with this architecture is the whole USP of this project, and as such I’d say the author made an excellent choice of putting it in the name: it makes it immediately clear what its key differentiator is.


A lie is a lie. You can blab on like that for pages, but that doesn't change anything.


These kinds of rants were popular five years ago when serverless was still new, nowadays it really doesn’t add any value and I fail to see what you’re trying to achieve here.

It’s just a term that stuck, as did many other terms, let’s just move on.

Do you also get angry that a firewall isn’t a literal wall of fire?


Wall of fire... Cracked me up


Words evolve and change meaning as society evolves.

Words also have more than one meaning. You're being pedantic for no reason.


“Serverless” has, for quite some time, meant something other than “without a server behind it.”

> a method of providing backend services on an as-used basis

As opposed to renting a dedicated server or VPS instance


Correct me if I’m wrong but it’s also becoming (or maybe always has been) synonymous with edge deployment/computing as well. So instead of going througg the trouble to set up a CDN and load balance traffic you get it out of the box.


I’d say that “edge workers” etc are almost always serverless, but there are plenty of serverless architectures that are not edge (e.g. I know of more than a few ETL pipelines built entirely around AWS Lambda).


Reason?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: