Show HN: DNS-based alternative to the web for structured data

1vuio0pswjnm7 · on Sept 4, 2020

There has never been any limit to what one can put in a DNS RR.

From the year 2000, 42 ways to distribute arbitrary data over the internet, including DNS:

http://web.archive.org/web/20001207014600/http://decss.zoy.o...

djbdns has always allowed arbitrary characters in TXT records. I have put HTML in TXT records to serve mini web pages over UDP. People have put speex files in RRs. Not sure "BIND" or other popular DNS servers of the day allowed such flexibility. Nowadays it is easy to find such software.

Someone will probably comment on tunneling over DNS or Wikipedia over DNS.

This is why ICANN DNS and its use of DNSSEC always seemed wrong to me. If it's someone else's data, ICANN should not have the power to instantly "invalidate" it. Resurgence of DNSSEC was in part a response to the idea that shared caches are vulnerable, but DNS is more than just shared caches; and the way DNSSEC is used in practice relies on the strange assumption that ICANN is the self-appointed arbiter of "official" domain names. But as you can see DNS, the protocol, can be used for more than simply "official" ICANN-approved domain names.

jeroenhd · on Sept 4, 2020

> But as you can see DNS, the protocol, can be used for more than simply "official" ICANN-approved domain names.

The fact that it can be used to transfer HTML or Speex files doesn't mean it should be used to do so. DNS is developed to serve one purpose, and that's to resolve domain names and domain information. Arbitrary data transfers over DNS were never its intended purpose so of course the designers to DNS extensions don't take DNS tunneling into account.

For example, the iCal protocol allows for arbitrary amounts of information through a comment on each calendar item. However, if I start sending calendar events between devices to stream video, I don't expect the next iteration of the standard to support my use case. The presence of a general purpose information field does not necessarily mean that this field should be expected to serve any purpose unrelated to the problem the protocol was designed to solve.

There's little preventing you from using your own DNS client with different DNSSEC keys for validation of your own records if you're using an alternate domain name registration system. ICANN has always been the de facto standard for TLDs and therefore domain names so putting the general DNSSEC keys with them makes sense.

inopinatus · on Sept 4, 2020

Quite so. Or as one eminent commentator put it, back in 2011:

https://twitter.com/DEVOPS_BORAT/status/137607246036213760

elliottinvent · on Sept 4, 2020

> From the year 2000, 42 ways to distribute arbitrary data over the internet, including DNS

This is very cool, thanks for the link.

Would be interested to hear more about your experience in storing HTML in DNS.

1vuio0pswjnm7 · on Sept 4, 2020

People have now wrapped DNS RR in TLS, then in HTTP and they are thinking about putting DNS RR into HTTP headers or into HTML tags.

https://www.potaroo.net/ispcol/2020-06/row.html

In the late 2000's I took the oppsoite approach, because back then, DNS, i.e., traditional 512K UDP packets, was much faster for data retrieval than TCP/SSL/HTTP. TLS has gotten quite fast but using UDP is still faster, that's why we see UDP-based "reliable transport" protocols like (lesser known) CurveCP and (better known) QUIC. I put the content into the DNS RR instead of putting the DNS RR, e.g., IP address, into the content.

Retrieving content, e.g., a web page, has become a two-step process, thanks to DNS. It could be a one step process if we used IP addresses instead of names; I still do this where possible. However to use names, first we have to get the IP address (step 1), and only then can we retrieve the content (step 2). If we combine the IP address with the content then we can eliminate a step.

Putting everything into DNS instead of into web pages gives more control to the user, IMO. Web browsers are controlled by companies that rely on the survival of the online ad industry. DNS software does not have this problem.

1vuio0pswjnm7 · on Sept 4, 2020

s/512K/512/

caymanjim · on Sept 4, 2020

What are the implications of this for things like DNS caching, replication, root servers, negative lookup, etc? If you run your own DNS and want to create gigabytes of NUM records, that's great. But I wouldn't expect Cloudflare, Google, etc. to want to cache that data, so that could create a situation where most of your DNS is cached somewhere, so it's distributed, fast, reliable, and local, but your NUM records aren't cached, so now you've got servers that are out of sync, or are falsely returning negative results, or you have to implement a mechanism that treats NUM records differently in clients.

I don't know if any of these concerns are applicable, but I'm curious.

eat_veggies · on Sept 4, 2020

If recursive resolvers like Cloudflare, Google, or your ISP stopped caching _num.* records, the servers wouldn't go out of sync or falsely return negative results, since DNS caching and updates are more of a "pull" model than "push" with consensus (so it's just like your computer's L1, L2, L3, memory, and disk caches, but with eviction based on TTL).

So _num records would end up forwarding all the way to the authoritative DNS server for the zone you're trying to access. You'd just have a slower experience than with your normal DNS queries, but you wouldn't observe anything weird.

elliottinvent · on Sept 4, 2020

I think it's only caching that is a factor here. It's certainly possible that DNS resolvers could refuse to cache, or somehow restrict caching (e.g. ignore TTLs) for:

1. any record from a _num. zone (independent records

2. any record from *.num.net (hosted records)

3. any TXT record starting "@n=X" (all NUM records)

4. any TXT record over X bytes

Since DNS is becoming more and more consolidated with DoH and DoT, one decision from a player like Google or Cloudflare could have a big impact here.

I think it depends how the records are being used, if one domain has gigabytes of NUM records then I think resolvers are more likely to act against that domain in particular rather the protocol as a whole.

If gigabytes of NUM records were stored in cache from millions of different domains, then I'd hope that DNS resolvers would see the benefit of caching these records for users. If Cloudflare cached NUM records and Google didn't, then Cloudflare would be quicker for many operations (e.g. dialling a domain, fetching useful data with Siri / Alexa, etc).

rohan1024 · on Sept 4, 2020

Introduction: https://www.num.uk/

NUM Record viewer: https://tools.num.uk/

NUM Record creator: https://app.numserver.com/tools/editor/add

pastage · on Sept 4, 2020

Seems like the TXT record creator fails to escape `. I can not find anything about money/licensing on there.

I love DNS, and terse text formats, but both are pretty arcane for most people. You might as well have people post compressed base85 encoded messages or something similar. I will use this but will the thai on the corner, probably not.

elliottinvent · on Sept 4, 2020

In MODL [1] graves are used to quote strings since DNS implementations vary in how they handle escaped double quotes. That said, the record editor should have escaped it if you entered it into an input field so I'll take a look at this, thanks for highlighting it.

We're united in our love for DNS and terse text formats and I agree that most people don't care about either. However, that's the point of the NUM Server service [2] – it's a front end to create and manage NUM records.

Early next year we're going to populate the DNS with millions of NUM records for businesses based on their company website. So once the data is in DNS and developers start to consume it and build interesting things with it, we hope that companies will want to claim their record and that's where we can make money – from NUM record hosting, as protocol adoption grows.

Licensing for all NUM libraries is Apache 2.0, there's no charge or limitations on records from our NUM Server service, we just ask that a link is provided where a business can claim their populated record.

1. https://www.modl.uk 2. https://app.numserver.com

maaarghk · on Sept 4, 2020

I feel the same way, the record editor is pretty esoteric and reminds me of schema.org really. For any hope of adoption by small business I think it needs to be something registrars buy into with a more user-focused interface which is more like "describe your business to us" than "populate our data format". Large corps, DNS is surprisingly often managed by marketing departments, and you're going to find them asking questions like "so we can't advertise our complementary services to someone looking for our phone number? we can't style it? Why would we want that? We can't track conversion?" All of these are probably positives to the user, but in direct conflict with the interests of the people actually in control of the domain.

edit/ of course the only way around this is for apple and google to make it mandatory to appear in their mobile dialer app or something, but google at least will never do that because in the "way things are" example we can see many, many user-unfriendly situations that are great for google's metrics and revenue

elliottinvent · on Sept 4, 2020

> the record editor is pretty esoteric and reminds me of schema.org really.

This is our first version of the record editor and it'll get much more user-friendly over time. I agree that our module system has some similaries with schema.org, I think what's lacking with structured data formats for the web is a simple way that a small business can adopt the technology. That's what we're trying to offer with the NUM Server – fill in a simple form and we take care of publishing the data.

> For any hope of adoption by small business I think it needs to be something registrars buy into with a more user-focused interface which is more like "describe your business to us" than "populate our data format"

We're about to build in an integration with GoDaddy and 1&1 (IONOS) – this will make it easy for domain registrants to delegate their independent NUM zone (_num.example.com) to the NUM Server. Longer term, registrars might want registrants to build NUM records using tools offered by the registrars. In my opinion, registrars have historically done a pretty bad job of making tools user-friendly and I'm sure we can do a better job.

> Large corps, DNS is surprisingly often managed by marketing departments

This is an interesting point. I've never known of a reliable DNS zone for a large corporation being managed by a marketing department but with more and more services requiring DNS verification records this is becoming more and more common. We actually have a module to address that point – the Custodians module [1]

> ... you're going to find them asking questions like "so we can't advertise our complementary services to someone looking for our phone number? we can't style it? Why would we want that? We can't track conversion?" All of these are probably positives to the user, but in direct conflict with the interests of the people actually in control of the domain.

With the contacts module a company can advertise a range of methods (e.g. social media) alongside their telephone number but you're right they're not in control of how developers use the data or display it. I think this is something companies have already gotten used to – they're not in charge of how Facebook, Google or Yelp display their data. At least with NUM they're in control of the data itself.

We see user anonymity and the absence of tracking (from the resolver to the authoritative server at least) as a big plus point for NUM and a step in the right direction.

I think Twitter is a great example of a technology which would seem to be at odds with a company's goals: (i) complaints for the world to see; (ii) dealing with customer service by Tweet with restricted characters!; (iii) anonymity of users. But businesses use it because users love it.

I really appreciate the feedback.

1: https://www.numprotocol.com/specification#example-modules

elliottinvent · on Sept 4, 2020

Thanks for pulling these links out and highlighting them.

elliottinvent · on Sept 4, 2020

A small team and I built this and we're excited to hear feedback – good or bad. Thanks for taking a look.

sybercecurity · on Sept 4, 2020

From the spec: "NUM imposes no limit on the number of records in a set but some DNS server and client implementations may have difficulties processing very large record sets."

There are still some places were packet size is an issue in DNS - mostly over IPv6. This was a big issue with DNSSEC first started being deployed. Not so much now, but I've still seen instances where large DNS responses get dropped over IPv6 for MTU issues. 1260 is the safe limit for MTU size most of the time even if they advertise a higher limit.

fanf2 · on Sept 4, 2020

You are right that it is wise to keep DNS response sizes less than 1280 bytes if you want to fit in UDP. But the DNS can support responses of up to 65535 bytes over TCP, which servers are required to support. https://tools.ietf.org/html/rfc7766

reificator · on Sept 4, 2020

> * But the DNS can support responses of up to 65535 bytes over TCP, which servers are required to support*

Requirements are only as effective as their enforcement...

elliottinvent · on Sept 4, 2020

Interestingly, Google Cloud DNS has a fixed limit of 1012 characters per resource record (counting the quote marks that separate fixed length 255 char TXT strings). They claim you can have 10,000 resource records in a resource record set, so over 10mb of data.

fanf2 · on Sept 4, 2020

Golly! Well, the minimum practical RR size is 2 bytes for the name (a compression pointer), 2x2 bytes for class and type, 4 bytes for TTL, 2 bytes for length, and some data, e.g. 4 bytes for an A record: total, 16 bytes. So you can't fit more than about 4k records in a DNS response.

It is actually possible to have a zone with an RRset much bigger than that, and it will transfer successfully with AXFR or IXFR, and you can make the RRset bigger or smaller with UPDATE - but you won't be able to query for it... unless the server has some nonstandard shenanigans for returning partial data, as many of them do.

elliottinvent · on Sept 4, 2020

Thanks for your input. We've put character efficiency at the very top of our priorities for this reason. We can fit a lot of structured data in ~450 bytes (to stay under original 512 limit) if the record uses a module.

butz · on Sept 4, 2020

So in theory I could host a minimal website on DNS?

elliottinvent · on Sept 4, 2020

Thanks for your comment. Yes, with a client side app to show and format the content stored in DNS you could do that. I guess it depends on the content you're looking to host.

NUM (and DNS) is not well suited for storing lots of content, but does work well for storing key structured data.

It seems that many small business websites follow similar formats – it's just the key information that's different. For example, contact data, location of store etc. So NUM works well to split that data into a structured machine readable format that can be interrogated by devices, apps and services.

pastage · on Sept 4, 2020

In the same way you can already do that on DNS, and no browser support dns lookup in any sane way except dns over http.

So no.

elliottinvent · on Sept 4, 2020

I think DoH is an important development here – our "NUM Record Viewer" makes all queries over DoH but of course you need to load a web app to run the queries.

I'm hoping browsers will support custom DNS queries at some point. There was some talk of it being supported in Chromium a couple of years ago but it got shelved I think. Hopefully creative uses of the DNS like NUM will encourage them to consider it again.

thrownaway954 · on Sept 4, 2020

so basically stuffing DNS with alot of TXT records.

elliottinvent · on Sept 4, 2020

Hopefully people will use NUM to add lots of useful structured data in TXT records but importantly these records are at their own DNS names and not polluting the main zone

e.g: dig target.com TXT

(and countless others)

edsemail123 · on Sept 4, 2020

NUM looks to me like a great improvement over the defacto 'status quo' of DNS, Search Engines, and 'Site Sifting' for useful info.

I do have some concerns about the plan to make the owners of various domains that much easier to locate and/or name in lawsuits, as at least here in the US, I could see that info being rather easily abused, along with the initial focus on 'contacts' (see my further comments/concerns below).

That said, given you asked for feedback/suggestions, and what looks to me the focus and high level of usefulness of NUM, especially on streamlining the overall process for 'inter-entity transactions' (whether personal, commercial, or whatever) I believe that a rather useful 'module' (and likely better yet, some number of modules), I would see as Services, Products, and/or Solutions.

Each of those can be seen as either Standard or Custom or perhaps even involve both (ie, a standard Solution for xyz market typically includes abc standard products as well as def custom services or whatever)

This could easily include info about various products, as well as entire 'product lines', along with direct connect to marketing/sales materials and/or contacts, list/actual pricing, specific support resources, whether contacts and/or documentation (manuals) and/or even ways or sites that their organization prefers for handling certain interactions (phone calls, texts, chat, or even say direct (and perhaps non-disclosed) 'click to connect' methods, whereby entering a 'client id' (or having some security certificate) that then perhaps creates a direct connection, or maybe provides a custom 'menu' of options directly available, or whatever, might become possible

Also, given that many companies, groups, governments could also likely use something like this Internally as well, perhaps create the ability to 'federate' the NUM info (both up and down).

Taking that to the next logical step, there could be NUM data/records flagged for different 'audiences'

These 'audience' entries then could be used to auto-magically publish 'internal', 'external', 'vendor', 'client', 'employee', or whatever type records in appropriate places and ways, in NUM, thus helping to maintain appropriate access, security, permissions, etc.

I do really like the option to include public keys as well, as that opens up avenues to directly and easily establish programmatic methods for fully encrypted communications, transactions, file transfers, and whatever else.

In fact, using an organizational public key, along with an employee-designated key (plus whatever other factors) could then be used to instantly create say a Wireguard connection to whichever resources (perhaps including additional NUM/DNS records, data, etc) that that individual has been provided with access to, thus creating a fairly easy way to establish 'Zero Trust', yet fully functional [net]work environments, allowing equal access, no matter where one might happen to be located

That could simultaneously allow for a reduced, if not single, set of security protocols/parameters per organization, and given that simplification effectively tends to increase overall organizational security, similar to how Wireguard is seen as so revolutionary, due to it's simplicity when compared to legacy VPN technologies

That said, I do believe that, additionally, especially for personal contacts/sites/details, and/or organizational units, there really ought to be methods (put) in place to allow for some level of anonymous yet authenticated access, such that NUM doesn't inadvertantly disclose info that ends-up creating yet more 'attack surface' for 'bad actors'

A simple example might be what happens by 'scraping' sites, winnowing down that info, and then publishing it (in clear text).

That would of course be done in an effort to 'help', though I could see that rather easily causing inadvertent complexities, or even outright disasters, especially given how much 'less than skilled' disclosure of info, whether at the individual/family level or at various organizational entities/levels, I have seen happening time and again on Many web-sites world-wide.

Those bits of info Currently tend to be obscured by exactly the nature of how the web has developed (and that NUM seems to be well positioned to address and effectively resolve moving forward) and Yet, at the same time, taking all those juicy bits of info, boiling them all down, and 'canning' them, such that Any script kiddie could then (far more easily And programmatically) utilize all that 'condensed goodness' to then target Anyone or Any group just about Anywhere, simply using NUM's (assuming publicly accessable) data, could well cause some unintended back-lash, if not handled with care.

I do realize that this last one could be an area where there is no simple answer, at least not yet, and I believe I would be remiss if I didn't mention my concerns here as well

elliottinvent · on Sept 18, 2020

I really appreciate the detailed feedback here, I somehow missed it.

I don't think site ownership data is something to be concerned about since we'll only publish that if we find it on the website. So if a company doesn't have a website or has no company details on their website then we won't populate a record for it. So it's unlikely NUM would make it any easier to name a domain registrant in a law suit than the website would.

I think a module that lists a company's products or services could have some really interesting applications.

NUM is of course compatible with all DNS implementations, so a local DNS zone mycompany.local could hold it's NUM records in _num.mycompany.local – I think this has got a lot of potential for large companies and public sector organisations.

You're right that great care needs to be taken when scraping website data and publishing it to the DNS to prevent inadvertent publishing data that was intended to be private or was published to the web before spam was such an issue, also for GDPR reasons. It's unavoidable that making machine-readable data open and freely available will result in it being consumed through automated means and it's likely that some of this data will be used in ways which are undesirable.