Hacker News new | past | comments | ask | show | jobs | submit login

I have computers today with enough storage space to hold entire multi-GB public zone files. The storage capability keeps increasing. However I only use a small fraction of that data. In fact, I have computers that can hold the DNS data for every domain name I will ever use in a lifetime.

Of that data, a relatively small fraction changes periodically. Most of it is static. Generally, I only do remote DNS data retrieval periodically, not immediately preceding each and every HTTP request when accessing a www site.

Every user is different but by controlling what RFC 1035 calls the "Master File" of DNS data I can avoid remote DNS lookups altogether. This speeds up www use for me, greatly. YMMV.

The point that get missed in these discussions, IMHO, is that DNS is not just an issue of speed.^1 (And users can improve speed without help from third parties.) DNS is also an issue of control. Controlling DNS allows me as a user to disable the www's dark patterns where the user selects a domain name to access and the "browser" connects to various domain names to which the user had no intention of connecting.^2 I can easily thwart unecessary, unwanted phoning home, telemetry, tracking and online advertising because they all rely on using DNS that is, to some degree if not wholly, outside the user's control.

1. For example, Google can undoubtedly win the race for DNS speed however the www user will always lose the contest over _control_.

2. Originally this auto-fetching feature may not have been intended to support "dark patterns". However its usage today is a key element of those practices. There are companies today whose vision for the www is shaped by a need for programmitic advertising and the privacy invasion that this requires. They puch for standards and protocols optimised to support "complex" web pages comprised of many components, potentially controlled by various third parties, the most important of which are related to _advertising_. A www user might have a different vision. For example, I am able to use the www quite effectively for informtation retrieval (not commerce) without using auto-fetching.^3 I treat www pages as "simple" ones with only one significant component and none controlled by third parties. This allows me to consume larger quantities of information more rapidly, with less distraction. "Simple" www pages are more valuable to me than complex ones. Though they might be less valuable to "tech" companies seeking to sell advertising services.

3. Common Crawl, the source for much-hyped "AI" projects such as GPT-3, uses the www in a similar way. There are no components for "complex" websites such as Javascript files in the archives.




Is there a torrent that gets updated regularly, or where/how do you download the zone files for all the TLDs? And what dns server software do you use?


Yes, I'd love to know more about how you implemented your setup.


"I have computers today with enough storage space to hold entire multi-GB public zone files. The storage capability keeps increasing. However I only use a small fraction of that data."

What this means is that I do not need to store entire zone files. I only need to store the data for the domain names I will use. The point about storage capability is that this is no longer a limiting factor. When I started using the www, storage space was a limiting factor. I could not store the DNS data for every name I would ever use on a personal computer. Even the RAM on today's computers can be larger than the size of HDDs from the time when I started using the www. Everything has changed.

"For example, I am able to use the www quite effectively for information retrieval (not commerce) without using auto-fetching.^3 I treat www pages as "simple" ones with only one significant component and none controlled by third parties."

What this means is that the set of names I will use is (generally) deterministic. For example, if I aim to access the index.html page at https://example.com, I only retrieve the DNS data for example.com. The set of names for which I must retrieve DNS data is known, a priori.^1 To give a more practical example, I start with a list of all the domain names represented in HN submissions (cf. comments). I retrieve DNS data for those names only. (NB. A small minority of www sites submitted to HN do change hosting providers occassionally or change IP addresses relatively frequently.)

Thus when I read HN submissions, I am not performing any remote DNS queries. At an earlier point, I have performed bulk DNS data retrieval for all domain names in HN submmissions. The DNS data is stored in the memory of a localhost forward proxy or in custom zone files served by a localhost authoritative nameserver.

Another example might be domains found in Google Scholar search results. I collect these names from a series of searches then retrieve the DNS data in bulk. Then I can search and retrieve papers from many sources found through Scholar without making remote DNS queries.

There are a variety of sources for bulk DNS data. Some potential sources are

Public zone file access programs (Contact the registry. Many zones are available through ICANN's CZDS program.) https://czds.icann.org

Public scan data (Sadly, Rapid7 recently stopped publishing their foward DNS data.)

DoH open resolvers (Using HTTP/1.1 pipelining.)

Common Crawl archives (By extracting WARC-TARGET-IP.)

1. In contrast to using browser auto-fetching where I have no idea what other domain names might be automatically looked up when I visit example.com.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: