Hacker News new | past | comments | ask | show | jobs | submit login

> WHOIS is actually an... interesting protocol to make a parser for.

It's actually impossible. Responses are essentially free-form (if the server responds at all). I tried my hand at this; you can make an ad-hoc "parser" that works for 90% of addresses/domains (or you could, ten years ago when I tried). But the remainder are intractable.

Nowadays it's much worse; nearly everything is hidden behind privacy shields, which purport to protect PII. But WHOIS records aren't supposed to contain personal information; they're supposed to contain contact information for network operators.

This is ICANN's doing, I'm afraid. ICANN had a rule that networks should provide public WHOIS servers. They never enforced the rule, and now they've scrapped it.




> But WHOIS records aren't supposed to contain personal information; they're supposed to contain contact information for network operators.

Doesn't a whois have to include email, phone number and physical address? For a company that's not really PII, but I don't understand how it wouldn't be considered personal information in the whois for my personal website.


RDAP contains some of the WHOIS information in a machine-readable format. (JSON) Not everything, but I think it's better.

Not everything runs an RDAP server, though; I do wish ICANN/IANA or whoever would enforce that.

> Nowadays it's much worse; nearly everything is hidden behind privacy shields, which purport to protect PII. But WHOIS records aren't supposed to contain personal information; they're supposed to contain contact information for network operators.

Network operator info can also be PII. My info is PII, but I have a domain name, so putting my info into WHOIS is putting PII into WHOIS.

The privacy guard just forwards everything to me, minus spam.

(If it's a corporation, I don't think there's a good reason to permit privacy guards. But not all domains are owned by BigCos, yet.)


Regarding RDAP, it actually is enforced by ICANN, it’s been mandatory for a few years for gTLDs (not sure if it is, or can be made mandatory for ccTLDs). All registrars handling gTLDs should now have an RDAP, otherwise they’re in breach of ICANN rules.

RDAP has the benefit of being JSON, but even then it’s a reaaally crappy format. For example, contacts are represented by the jCard pseudo-standard, which is a JSON version of vCard, and it’s completely awful and hard to deal with. Basically instead of a nice JSON object, it’s arrays in arrays in arrays…

RDAP should get better in the future versions, but I’m not sure registrars will follow in good faith because the initial specs were a bit of a shit show.


> not sure if it is, or can be made mandatory for ccTLDs

ccTLDs don't. Proof by counter-example: .co, .io.

I don't know if that can be enforced, but it'd make my life easier.

(Technically, too, AIUI, it's the registries that run the RDAP servers, not the registrars.)

Yeah, the JSON is a bit of a crap-shoot, but I think it's maybe marginally better than trying to parse raw WHOIS text…? IDK. Probably depends on the exact datum you want to pull out of it.


> you can make an ad-hoc "parser" that works for 90% of addresses/domains (or you could, ten years ago when I tried). But the remainder are intractable.

Could generative AI help out these days? "Here's a whois, give me [the info I want]:"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: