Hacker News new | past | comments | ask | show | jobs | submit login
DigiCert Revocation Incident (CNAME Domain Validation) (digicert.com)
141 points by vitaliyf 4 months ago | hide | past | favorite | 49 comments



> The underscore prefix ensures that the random value cannot collide with an actual domain name that uses the same random value. While the odds of that happening are practically negligible, the validation is still deemed as non-compliant if it does not include the underscore prefix.

That's not the rationale for mandating the underscore prefix. The actual reason is so services that allow users to create DNS records at subdomains (e.g. dynamic DNS services) can block users from registering subdomains starting with an underscore. It serves the same purpose that /.well-known does.

For example, if an attacker requests a certificate for dyndns.example and DigiCert gives them a record without an underscore prefix like da39a3ee5e6b4b0d3255bfef95601890afd80709.dyndns.example, they can register that subdomain with the dynamic DNS provider, publish the required record, and get the certificate for dyndns.example. It doesn't matter how much entropy DigiCert put in the record name.

I definitely commend DigiCert for pledging to revoke the certificates within 24 hours and not having a delayed revocation or trying to language lawyer their way to a 5 day revocation as other CAs have tried. Nevertheless, this post severely minimizes the security impact of their mistake, and provides an excellent example of why CAs should always be required to strictly adhere to the rules and not be permitted to excuse noncompliance based on their own security analysis.


> For example, if an attacker requests a certificate for dyndns.example

Shouldn't that get caught by the Public Suffix List?

I would hope DigiCert has checks in place to prevent someone domain-validating ownership of the entire of co.uk under any circumstances :)

(They should still revoke the mis-issued certificates though)


There's no prohibition against issuing certificates for names on the Public Suffix List.

BR 3.2.2.6 prohibits issuing a wildcard certificate for an entire public suffix unless the "Applicant proves its rightful control of the entire Domain Namespace" (without specifying how this should be done - arguably, publishing a DNS record would qualify) but also says that CAs should use the "ICANN DOMAINS" section of the PSL only, not the "PRIVATE DOMAINS" section, so domains for dynamic DNS providers and the like wouldn't be included. [https://github.com/cabforum/servercert/blob/main/docs/BR.md#...]


PSL has a couple of sections - ICANN and PRIVATE. PRIVATE can be a little more flexible/ignorable. If they implement a hard rule, then occasionally they'd have to make exceptions when the real Dyn comes along and wants (legitimately) a wildcard for their name.


> Shouldn't that get caught by the Public Suffix List?

PSL is a best-effort sort of thing, so it's good but not definitive. It would be dangerous to rely on it when issuing certs imo.


> I definitely commend DigiCert for pledging to revoke the certificates within 24 hours and not having a delayed revocation or trying to language lawyer their way to a 5 day revocation as other CAs have tried.

It seems I spoke too soon: https://bugzilla.mozilla.org/show_bug.cgi?id=1910322#c8


Having a restraining order served on them is going to slow things down a bit.


That only affects 72 certificates. They've delayed revocation for all 83,000+.


https://bugzilla.mozilla.org/show_bug.cgi?id=1910322

for more background. The short story is that when doing CNAME based validation, they were supposed to put an underscore at the start of the random string for you to add to your DNS records. They still generated sufficiently random strings but didn't include a _ before it which is in violation of the RFC. The rationale is that some sites might do something like give you control of yourusername.example.com and they don't want to make it possible for random users to register the random string and be able to manipulate it. If you don't allow users to generate anything that causes a hostname to appear with a leading underscore, they can't pass the domain validation.


Also, while a DNS name can have an underscore a host name, even in DNS, cannot have this character. So if you have a user named "haha_funny" you already aren't allowed to give them the hostname "haha_funny.somesite.example" - and on some system it will just silently not work because it's invalid.

So even if you are completely oblivious to this work, and don't care about security at all, your "Give everybody a hostname" code should already avoid underscore characters as desired because otherwise stuff breaks.

Several current systems use DNS names (but not hostnames) which feature underscores but it's pretty unlikely that you've got (for example) a service where users can pick their own TCP/IP service name and port and issued appropriate records for it in DNS. If you have done this weird thing you probably want to use the existing mechanism (in DNS of course, the CAA record) to tell most CAs that they should not issue for your names even if they think they've received permission. You can then cut a suitable deal with a for-profit CA to do whatever crazy extra checks you want (e.g. Meta's CA has to contact actual people in the appropriate security team at Meta, so that "mistakes" which give somebody a certificate for facebook.com never happen without some pretty drastic real world errors).


So if you have a user named "haha_funny" you already aren't allowed to give them the hostname "haha_funny.somesite.example" - and on some system it will just silently not work because it's invalid.

Not long ago I actually did come across a site that had an underscore in its domain name, and it worked both for me and apparently Google, because it indexed and showed a (relevant) page from that site. I only remembered it was on a *.tripod.com subdomain, and can't find that exact site now since I don't remember what I was searching for (it was a highly obscure and technical topic), but there do appear to be others there with underscores, e.g.:

http://computer_collector.tripod.com/

http://hattori_striker.tripod.com/

http://forgotten_dark_angel.tripod.com/


In 2019 the CAs agreed not to issue certs to underscored subdomains making this less useful.

As evidenced by all your links being http.

(as an aside, it looks really weird seeing a bare http link in the wild - crazy that was the old norm!)


My browser is configured to auto-upgrade such links and I get a full screen interstitial when the upgrade fails (as of course it did for these)

This is now at a place where I'd recommend such configuration more broadly, it's not suitable for everybody, but many could benefit from just knowing all links are secured.


Wildcard certs match subdomains with underscores, as pointed out by a sibling comment. Example: https://_.4a.si./


A wildcard would still work for these fwiw


Google also indexed my site http://_.4a.si as seen here http://google.com/search?q=site:_.4a.si


A live proof that CNAME records starting with _ exist.


There are.

There shouldn't be, but there are.


One of the impacted companies filed a restraining order, because they believe their incompetence is more important than basic functionality of the PKI. Can't wait to hear how they expect to respond if they ever have encounter a cert compromise or actual misissuance, maybe they'll demand 24 hour revocation in that case?

Honestly my opinion is that this should trigger the company being banned by all CAs.

The company in question is Alegeus Technologies LLC: https://www.courtlistener.com/docket/68995396/alegeus-techno...

From basic googling it looks like a healthcare provider, so exactly the kind of company you would want to have shitty IT and security infrastructure. A++ work. Absolutely stellar.


I just want to call out both CrowdStrike and DigiCert for being one of "those" companies that insist on publishing critical support information behind a login with the clock ticking on a global outage of their own making.

There are no polite words that I can use to accurately convey the depth of my disappointment at this kind of inconsiderate behaviour during a crisis, so I won't say anything more.


What critical support information? What global outage? How are these two events or companies remotely equivalent?

If I'm not a Digicert customer, what do I care about the details of how to redo a validation on Digicert? If I am a Digicert customer I have been emailed already and I will obviously have to log in to do anything at all with my domain.

They say this affects 0.4% of Digicert customers who are what % of the world? Actually not even 0.4% of Digicert customers, but 0.4% of those particular validations. What does that actually work out to? Just who all is actually down?

I fail to see any equivalence.


Are you referring to the list of affected certificates?


If you’re not a customer, your domain isn’t affected.


You may be a customer of a customer (e.g. Azure) so you could be affected peripherally via that route.


24h notice to change certificates in who knows how many systems, at the worlds largest companies, while everyone is on vacation.

This will be interesting.


Respect to them for actually abiding by the BRs. Most CAs just shrug [1] and [2] say [3] it's [4] too [5] complicated [6], or just lie and claim planes will start crashing [7]. It's really disheartening that publicly trusted CAs just ignore their contractual obligations however they see fit.

Ideally these companies should have response plans in place to prioritize certificate rotation. They can use this as a fire drill for what would happen if there were a key compromise.

Alternatively, if companies cannot handle the rotation, then they likely should re-evaluate if WebPKI is even appropriate for their use-case.

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1885568

[2]: https://bugzilla.mozilla.org/show_bug.cgi?id=1898848

[3]: https://bugzilla.mozilla.org/show_bug.cgi?id=1910237

[4]: https://bugzilla.mozilla.org/show_bug.cgi?id=1896053

[5]: https://bugzilla.mozilla.org/show_bug.cgi?id=1896553

[6]: https://bugzilla.mozilla.org/show_bug.cgi?id=1877388

[7]: https://bugzilla.mozilla.org/show_bug.cgi?id=1903066#c48


"Alternatively, if companies cannot handle the rotation, then they likely should re-evaluate if WebPKI is even appropriate for their use-case."

I hate hearing this awful take, as if every IT organization has the same neat and tidy systems deployed as they do. Never had to deal with 3rd party SaaS vendors certificate pinning requiring service tickets to change, don't have any hardware devices or appliance based software images each with their own web interface to update certs...

Yes companies should have a plan to do their minimum yearly certificate rotates. Yes those companies should have a security plan to rotate affected certificate issues, but in those cases the business users are ok with an outage to remediate a real security issue.

But what happened here is that Digicert invalided the entire domain's worth of certs. All those service.companyname.com certs or duplicates under that domain validation were affected in bulk. In some companies there could be thousands of certs under that domain. Digicert screwed up their system implementation and made their customers suffer.

"It's really disheartening that publicly trusted CAs just ignore their contractual obligations however they see fit."

It's also disheartening to see browsers in the CA consortium ignore the CA resolutions as well. Like how everyone voted for 2 year certs and Apple did their own thing anyways. Any punishment for Apple come? So why pick on the others?


Stuff like this is why some parties have been calling for increasingly-shorter cert validity. When a cert is valid for several years it allows companies to develop an increasingly complex workflow around deploying them, sometimes taking weeks and involving dozens of parties to roll them out. This is in turn used as an excuse by CAs to completely ignore the industry standards.

Those SaaS vendors probably shouldn't be doing cert pinning to begin with. If you don't trust your root store either implement support for CAA or DANE, no need to roll out your own workflow. Those hardware devices should either 1) not use publicly trusted certs, 2) renew their own certs, or 3) have an API to automatically update certs.

The only reason they're still getting away with it is because doing it manually once a year isn't horribly painful. If 90-day validity becomes the industry standard, pain-free certificate renewal turns into a must-have for all new contracts.


> Stuff like this is why some parties have been calling for increasingly-shorter cert validity. When a cert is valid for several years it allows companies to develop an increasingly complex workflow around deploying them, sometimes taking weeks and involving dozens of parties to roll them out. This is in turn used as an excuse by CAs to completely ignore the industry standards.

"several years"? The certs we are getting have one-year lifetimes. It used to be two years, but was reduced to one year some time ago (I don't remember exactly when).

Also, I don't think the problem is cert lifetimes, I think the problem is having so many certs expiring all at the same time. A lot of IT folks are coming off the major pain of the CrowdStrike crash. This is similar: You suddenly have a very large number of certificates that are going to stop working in less than 24 hours, and you have to respond.

Sure, you could say "Well, companies should be resourced to be able to handle that at any point." Except that's not the reality right now.


I think they're suggesting that 1 year certificates are still at the point where people can just manually rotate them as they expire. If you keep reducing the lifespan, to say 90 days, that starts to tip the scale. You'll be spending too much human time manually rotating certificates that it will make financial sense to just automate the process.

If the process is automated then revocation can be automatically handled as well (so long as ARI gains traction).


90 day certificates will be here soon, and moving to shorter lifetimes from there.


Heck, subscribers could go to 10 day certs today (soon 7) and be immune from revocation entirely.


I work with customers that typically take 3 or 4 days to either acquire or renew a cert. Even though they are on one of the major cloud provider with automated certs, they refuse to use those mechanisms due to policy. They would rather send everything, including private keys, through email. They also take several days, sometimes weeks, to update a DNS entry. Welcome to modern IT.


Trying to deploy SaaS apps for customers it sometimes takes 3-4 weeks to get them to make any DNS changes, then at the last minute they CC us into an email with SquareSpace support for some reason (their DNS is on Cloudflare...)


I believe it. It's insane how long some of this stuff takes.


I think the issue is less with SaaS vendors doing cert pinning and more that many SaaS vendors offering deploying on customer domains often rely on those same customers to make the DNS changes for validation, and whenever you introduce another party like that it's exponentially more difficult to actually get things done in a timely matter.

IMO they should just use HTTP challenges to avoid this whole thing, but it's a pretty common pattern I see with a lot of SaaS vendors, even major fintechs.


That's one option. Alternatively, they could just delegate the _acme-challenge with a CNAME.

If clientportal.somebank.com is actually run by somesaas.com, they can define CNAME _acme-challenge.clientportal.somebank.com --> [some_key].domainvalidations.somesaas.com

When the SaaS vendor needs to request a new cert, they set the appropriate TXT record on [some_key].domainvalidations.somesaas.com.


"Never had to deal with 3rd party SaaS vendors certificate pinning requiring service tickets to change"

I think this tends to fall into "probably shouldn't have been using Web PKI". I can't immediately think of a reason why you'd need a publicly trusted certificate if you're pinning a specific public key.. at that point who cares who signed it?

I do agree that there are real costs with rotating certificates that ultimately may make it impossible for an organization to complete that work in the revocation window. That is very much an area that needs further automation developed and more importantly, for it to actually be adopted. I believe that's what ACME Renewal Information is attempting to address.

"but in those cases the business users are ok with an outage to remediate a real security issue"

Ideally yes, but that might be the same point you find out the certificate was used in some critical system (let's say Air Traffic Control like a previous CA tried to claim). They still may very well not be okay with the revocation despite the security issue. _Those_ are the people that need to stop using these certificates and there's really no way to weed them out until a revocation actually needs to occur.

"Digicert screwed up their system implementation and made their customers suffer."

And those customers are right to be mad at DigiCert. They probably don't have a legal basis to challenge as the subscriber agreement explicitly permits immediate revocation without prior notice, but they can certainly take their business elsewhere.

"It's also disheartening to see browsers in the CA consortium ignore the CA resolutions as well. Like how everyone voted for 2 year certs and Apple did their own thing anyways. Any punishment for Apple come? So why pick on the others?"

Admittedly I'm not very familiar with the various root programs and the obligations they have with CAs, but it doesn't seem unreasonable that root programs would be free to impose stricter requirements then the BRs.

Though I do find it two-faced for Apple to vote for Ballot 193 only to then impose a stricter requirement. At the very least they should have abstained.


"I can't immediately think of a reason why you'd need a publicly trusted certificate if you're pinning a specific public key"

Inter-finance systems mostly, some government. Sometimes they pin the CA issuer, sometimes IP based although with dynamic cloud IPs that is disappearing, sometimes inside a VPN, and other times just the cert issues themselves. Same service handing public users while making bidirectional API calls to other interfaces that are more locked down.

Not everyone is a monolithic copy and paste Wordpress hosting site, a new cloud native cash rich startup, or a massive Google/Amazon/Microsoft with huge teams to orchestrate everything using their own architecture and systems they developed themselves. Private PKI? Even more orchestration layers for enrollment especially in places with BYOD.

There is no point to low expiry certs anyways. If a server is hacked, the primary concern is what data were they able to exfiltrate and for how long - not that a keypair was maybe stolen to be used in a very complicated and unlikely attack to intercept some of the same data they already stole.

Your ATC comment seems to continue your theme that everyone should run a private PKI instead. Airports are full of interconnections between themselves, other airports, airlines, ground crews, satellite relays, and weather monitoring systems. So then all these parties need to do all the same actions as the public PKI - root key signing , cert issue logging, secure interface for issuing certs, develop a trust across all parties and make them install your root in all their systems ..... or, just use the public PKI services which already does that. You are just reinventing the wheel and probably will get it wrong. Maybe for some strictly backend systems, or things like server out of band management it works well, but not anything involving multiple companies.

The CAs work with large and complex business understand these complexes and voted for 2 year duration. The owners of the browsers just wanted to further their own cloud bottom lines.


"Your ATC comment seems to continue your theme that everyone should run a private PKI instead."

Not the OP you replied to, but I want to add some nuance: there's a vast solution space between using the WebPKI and rolling your own. The enterprise focused CAs have non-WebPKI CAs and CA-as-a-service offerings, both with way longer certificate lifetimes and way longer revocation periods.

If you don't need WebPKI-compatible certs (because you're not offering services to the general public) and your org cannot abide by the WebPKI rules requiring 24 hours max before revocation, you are doing something very wrong when you use the WebPKI.


I think part of the issue could be with the naming - 'public PKI'. I'd argue that doesn't really exist anymore - the nomenclature in use for some time now is 'web PKI'.

It's now ostensibly an ecosystem for use by modern, updated clients - browsers and OSs - for TLS. clientAuth will be gone from the webPKI soon, too, I hope.

It's fast becoming a more fluid, shifting ecosystem. We'll be on 90-day leaf certs very soon, shorter after that. Roots and intermediates will have much reduced lifetimes. New guidelines and regulations change things rapidly. Mass revocation events like this one.

In the ATC example - all parts of that ecosystem should be managed to the point that distributing a private root is relatively easy. It shields them from events like this. As another commenter has pointed out - running a private CA (or what might be known as an 'ecosystem CA' like we see in IoT with Matter, airlines with CertiPath, wireless with WinnForum) can be done 'as-a-service' easily, be it from a cloud vendor or CA or similar provider.

If folks continue to use the web PKI for non-web purposes, then they have to be in a position to deal with challenges like short-lifetime certs, 24-hour revoke/reissuance windows, and frequently-updated trust stores.

Most of the agreements and T&Cs for public CAs already forbid use in 'critical' systems anyway, so you're effectively agreeing to these kind of 24-hour changes from the start.


Getting your online account hacked can feel like a punch to the gut. It's unsettling and makes you question your digital safety. But don’t worry! Here’s a roadmap to help you recover your hacked account and get back to feeling secure online. she a tech reach her (MARIECONSULTANCYOZ@GMAIL.COM and INSTAGRAM :MARIE_CONSULTANCY)


Spoke too soon... seems like subscriber(s?) issued DigiCert a Temporary Restraining Order to not revoke: https://bugzilla.mozilla.org/show_bug.cgi?id=1910322#c8

Bold.


> While we had regression testing in place, those tests failed to alert us to the change in functionality because the regression tests were scoped to workflows and functionality instead of the content/structure of the random value. [...]

> Unfortunately, no reviews were done to compare the legacy random value implementations with the random value implementations in the new system for every scenario.

In other words, they didn't do proper testing. At the bottom of the article they suggest they're going to improve it.


Is this a potential cause of the current Azure outages hitting western europe? I know DigiCert are used by Azure extensively...


Unlikely. Microsoft operates their own CAs. Some of their CAs have been cross-signed by a DigiCert root, but Microsoft is responsible for the domain validation. I don't think they extensively use certificates issued directly by a DigiCert CA.


They have almost 300 in the affected list from DigiCert, so who knows?!


Can someone explain why this issue deserves a 24h notice?

Seems more reasonable to me to have a much longer deprecation notice.


As far as I can tell, this issue would be a problem where all of the following conditions are met:

1. Tenants are allowed to create arbitrary subdomains with arbitrary CNAME values 2. Tenants are not authorized to act on behalf of the TLD directly, only on their respective subdomain 3. Tenants are ostensibly prevented from TLD cert issuance by being explicitly blocked from creating subdomains that start with underscores

For most entities these conditions probably do not hold true anyway. But it could conceivably apply to certain free/dynamic dns providers, for example afraid.org and noip both allow arbitrary CNAMEs (though I checked my noip account and it wouldn't work anyway because of length limits on subdomains).

I would guess that in act fact there are very few entities in existence for which this actually represents a potential threat against them, since it requires a very specific delineation of zone authorizations, but there might be a few.

For most of Alegeus customers I doubt any of this applies, though, they're probably lucky to know their GoDaddy login to add any sort of DNS record, let alone have a whole system in place for less privileged users to create arbitrary CNAME records subject to controls over the use of underscores.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: