Hacker News new | past | comments | ask | show | jobs | submit login
Starlink was down because of expired certificate (twitter.com/elonmusk)
59 points by croes on April 8, 2023 | hide | past | favorite | 38 comments



An oldie but a goodie.

I really don’t like the short cert expiration and cynically think it’s an example of security practices default to something that makes money to a very niche group that charges per cert issued.

I’m extra annoyed because my internal CA at my org requires 1 year expirations and won’t up it even though it’s internal facing only servers that are already on our network. I don’t know what the odds are of a cert being compromised and fronting a new server on our network, but I don’t think the odds are improved with 1 year expiration periods over 5 years.

I’ve had the same GPG key for 15 years or so and plan on never changing it unless the key is compromised. Same for the locks on my house.


https://www.chromium.org/Home/chromium-security/root-ca-poli...

> *a reduction of TLS server authentication subscriber certificate maximum validity from 398 days to 90 days*. Reducing certificate lifetime encourages automation and the adoption of practices that will drive the ecosystem away from baroque, time-consuming, and error-prone issuance processes. These changes will allow for faster adoption of emerging security capabilities and best practices, and promote the agility required to transition the ecosystem to quantum-resistant algorithms quickly. Decreasing certificate lifetime will also reduce ecosystem reliance on “broken” revocation checking solutions that cannot fail-closed and, in turn, offer incomplete protection. Additionally, shorter-lived certificates will decrease the impact of unexpected Certificate Transparency Log disqualifications.

Edit to add that the above is under the following,

> *In a future policy update or CA/Browser Forum Ballot Proposal, we intend to introduce:*


Like most engineering choices it has upsides and downsides.

I'm personally "pro choice" here and in favour of allowing system operators to decide how much and what kind of automation makes sense for them.

TLS vulns in general are likely to need a server software or config upgrade in response, automated re-issuance is nice but doesn't significantly reduce overall toil.

Revocation being "generally broken" sucks, but is not on my personal "top 50 security problems" that I care about.

What very short lived certs do is create a tight runtime dependency between all deployed infrastructure working and JIT cert issuance systems & refresh code.

IMHO this has analogous risks and benefits to JIT supply chains in real-world contexts.

For some op models IMHO it IS better for a human to just put the thing on the server every year or two.


Shorter, free certs are a good thing. They make this sort of thing less likely by encouraging its automation.

It's the long infrequently expiring certificates that the knowledge of becomes lost to a company, that are a problem.

Can you even describe how you would react if your GPG key was compromised?


I don’t want to let an external entity know about my internal servers. So free isn’t an option.

I disagree about expiring knowledge on long internal certs. I want to avoid any network snooping so I use TLS. I don’t worry much about server impersonation since it’s an internal server. There’s no special knowledge required to generate a cert and put it on a server. So if it expires 10 years from now, and everyone involved is gone, then Id just generate a new cert and let it run for 10 more years.

If my GPG key was compromised, I would generate a new key, and notify everyone who uses my key of my new key. It’s not a huge group and the only way they know my public key is me telling them. Fingers crossed, I’ll never need to regenerate and will die with this key.

Rotating my private key would reduce my risk from an unknown compromise. But not sure by how much. And the changeover to a new key would likely introduce a known risk of having partners change stuff and get confused every n months, rather than run the risk of someone getting my key.

Non-expiring keys is good enough for billion dollar Bitcoin wallets, it’s good enough for my GPG. And I wish my Wiki Server that I have to rotate keys every 11 months.


> Rotating my private key would reduce my risk from an unknown compromise. But not sure by how much.

Therein lies the rub: you index security on knowing you're compromised, whereas most compromise is imperceptible to the user. Rotating credentials reduces risk.


I think this is not a fruitful reasoning. Never sharing keys reduces the risk too.

How much does rotating credentials reduce risk? Should I rotate once a year? Once a month?

Rotating every day would be even more secure, right? But how much more? I think not very much.

I must accept some risk to communicate. I accept that I can manage my private key and keep it safe. It’s my identity, so it’s important to prevent leaking that credential. I think it’s better to protect the credential than to frequently rotate with all the potential errors there.

The risk is that I don’t know if I’m compromised. But I think that risk is less than the errors involved in rotating keys according to some arbitrary schedule.


> Never sharing keys reduces the risk too.

That's not up you, it's up to your adversary.

> How much does rotating credentials reduce risk? Should I rotate once a year? Once a month?

As frequently as possible. Signal for example uses ephemeral keys for each message.

> Rotating every day would be even more secure, right? But how much more? I think not very much.

Stop guessing and use empirical evidence to support your reasoning.

> The risk is that I don’t know if I’m compromised. But I think that risk is less than the errors involved in rotating keys according to some arbitrary schedule.

It's arbitrary because you are making a strawman argument to support a foregone conclusion.


> Signal for example uses ephemeral keys for each message.

There’s a big difference between identity keys and session keys. It makes total sense to use lots of throw away keys (this is how tls works) but making a new identity key for every message is madness.

There is no empirical evidence for how frequently to rotate your identity keys.

A few years ago NIST started recommending never changing passwords unless they are compromised [0]. Identity keys aren’t exactly the same as passwords but I think they are similar.

I don’t think anyone quantifies how much of a benefit there is to changing your password nor how frequently to change it. “As frequently as possible” is not useful advice as that could be every minute or never. I need more actionable guidance so I can weigh it against other priorities

[0] https://pages.nist.gov/sp800-63-3.html


> There’s a big difference between identity keys and session keys. It makes total sense to use lots of throw away keys (this is how tls works) but making a new identity key for every message is madness.

That's not what happens (new identity for each message) and compromise of a Signal identity key has no impact on message security, unlike GPG. Also it's not how all TLS works; it's how TLS works with perfect secrecy ciphers only.

> There is no empirical evidence for how frequently to rotate your identity keys.

Certainly not if you refuse to look for it.

> A few years ago NIST started recommending never changing passwords unless they are compromised

Passwords derive session keys (cookies) which rotate very frequently. You have a lot to learn about computer security, I'm happy to make some reading recommendations if you're sincerely interested.


I'm not a part of this conversation, but I'd love to see those recommendations if you're willing to share.


What are the odds an attacker can get keys once but isn't able to steal them again after they're rotated?

It makes more sense for user credentials that might have been phished but I'm not sure that converts directly to machine managed keys.


How is free not an option?


Ditto for “smart things” hardware.

I manufacture a non internet connected wifi device, and I basically can’t use HTTPS, WebBluetooth, and a host of other things because I can’t get a long 30+ year cert anywhere. 1 year is max for the web.

Do they really prefer forcing people to use HTTP? It’s bone-headed.

At the very least, they need to make an exception for LAN devices, but they don’t.


Can’t you do a self cert and set the expiry to 2049 or something?


Clients won’t recognize the self cert so instead of an expired cert warning, the user will get an unknown cert warning.


That's mainly a problem for genetir web clients, though. If it's a mobile or other app, you can package the CA in.

I think web clients generally wouldn't be able to connect to an IoT device over a private or public IP anyway since afaik public CAs won't issue certs for IPs.


Agreed. I have seen several cases were the recent short cert lifetime has made sites/services less secure as the devs either skip encryption entirely or are running with expired certificates. (I’m referring to small, seldom maintained or internal/non-public facing type projects where the goal is simply to not be using clear text)

Another example of this backfiring, I’ve seen several smaller sites/services make use of much more centralized SSL services (such as Cloudflare), just to avoid having to maintain the SSL certificate. (Whereas prior to the shorter expiration change, these would have been individual domain specific certificates.) I’m aware that this does improve security in many cases as well.


Given the average turnover for IT professionals hopping orgs I would personally require shorter certificate durations as a business continuity measure. Not even getting into automation, it keeps the processes fresh in the minds of administrators and should in theory reduce the chance of losing all organizational knowledge of how and where to renew certificates in the environment.


Or just have a decent process that generates certs every 100 years.

I’m a fan of good practices and automation and whatnot, but don’t think making decisions just to require chaos and turnover in the hopes that it makes practices better is a good idea.


Counterpoint:

A couple years ago I movd WHMCS instance to LE. 90 days later the cert lapses because it wasn't renewed. So I punch in, re-issue, everything is fine. For 90 days.

That repeated for like a year, when I finally gave up and spent a 15 minutes (compared to 3 for the manual re-issue) to dig up and fixed the reason the auto renew didn't work. It work fine ever since.

It was jusst a botched cron entry.


> I’ve had the same GPG key for 15 years or so and plan on never changing it unless the key is compromised

GPG has no forward secrecy, so 15 years of your "encrypted" communications are at risk. Not a great model for secure messaging.


It’s pretty good. If my private key is compromised all those 15 years of comms are compromised no matter how I slice it, right?

How would I protect against this risk?


You use encryption schemes with forward secrecy, which has been the standard for years.

Compromising my permanent Signal private key, for example, does not allow previous messages to be decrypted.


1. You can run ACME internally.

2. If "alerting for a failing process" is too hard for your org, you have serious other problems to address.

3. "I care about TLS for privacy but not for server impersonation" is such a great reminder of how incompletely people are able to think through security-related scenarios. Big whoops.

"Cert expiriration got us, too" is just another form of "Oh I guess we should have used unique passwords", or "Oh, I guess we should have had backups. Or put the expiration dates on a shared calendar. Or spent the few hours of engineering time to setup ACME and wire it to your existing alerting. It's not hard to do things right, it just requires understanding that a small bit of cost/learning today will save you being embarrassingly blind-sided by your own carelessness down the line.

Since apparently Elon doesn't see it prudent to "scrub for SPOF" until the excrement hits the fan, maybe he should go ahead and order the same for his other companies that have a higher potential collateral.


Observation: If a Certificate Authority (CA) is down or unreachable over the Internet for whatever reason -- then effectively all transactions which require that CA to work are down as well...

If all CA's are simultaneously down or unreachable -- then effectively any protocol that requires a CA -- is also down...

This includes SSL, TLS, and everything which requires HTTPS to work -- not just all HTTPS web pages -- but all HTTPS web services as well...

A "single point vulnerability" indeed!


A CA doesn't need to be online for HTTPS to work. If it's offline, it won't be able to issue new certificates, but existing ones will be just fine. In fact, Root CAs are often kept offline as a best practice to limit their exposure.


I stand corrected!

My original cursory understanding of HTTPS -- was that it required SSL/TLS to make it work, which in turn required CA's issuing certificates and the validation of those certificates to make that work...

Now, all of the above is true -- but the subtle distinction that I realized after reading your comment (and performing more subsequent research on the matter) -- was that everything above is (apparently -- based on the best knowledge from the research I've done at this point in time!) based on local Root Certificates whose data is contained in local files (i.e., no need to reach out to a CA server for validation) which act as the "Trust Anchor" for all other SSL/TLS certificates...

I.e., new certificates handed to a user's browser by a new website -- do not need to be validated by making an Internet connection to the CA and asking that the CA validate them -- instead, they are validated using cryptographic hashing techniques against the user's local CA Root Certificates...

These CA Root Certificates are apparently X.509 certificates -- or follow that format:

https://en.wikipedia.org/wiki/X.509

Now, that's a good thing!

It means that there's no dependency (one less point of possible failure) that a CA be up and running -- for all of those HTTPS/TLS/SSL transactions to work...

So, you are correct -- and I stand corrected!


spend less time in the comments section and more time building bro


Question #1: Building what exactly, "bro"?

Question #2: How would that change anything?

Question #3: How do I know that you're not:

a) A GPT-3 or other AI powered chat bot?

b) A Troll?

c) A paid disinformant and/or propagandist and/or someone else with an agenda -- foreign or domestic?

?

Question #4: What value do you genuinely believe your comments add to the discussion?

?


I assume Starlink have monitoring tools? Nagios for example have NRPE plugins to check certificate expiration on any domain it can reach. Most infrastructure monitoring tools have something along this line. It is not clear to me why he is mentioning single points of failure. Does he want to load balance across multiple names and certs perhaps?


I get that LE is a single point of failure, but is it really that hard to automate cert renewals?


My experience is that these questions are scratching at the wrong side of that problem: is it hard to automate? for sure no. Is it hard to monitor effectively and alert someone when it falls over? That's where the rabbit hole starts because there are a lot of "no, but"s hiding in that, not to mention the age-old "who watches the watchers?"

Please don't misunderstand: I'm not arguing against the concept of doing something more often to drive down risk, but I do think it's not as "how hard can it be?" that folks make it out to be. It is trading one set of risks for another, and when the ACME goes off the rails, it takes out your global Internet provider, evidently


Every single time we deploy updates to our production code, we also simultaneously run a check against any main TLS certificates we use to see when they expire -- it is built into the scripts used to verify the deployment. We also have certificate checks running weekly that alert to slack channels.

While we cannot automate the select few on Digicert, we can automate any of the self-signed certificates (you can use your own CA still in 2023) or any of the Let's Encrypt ones.


While I'm just one person, the only issues I've had with ACME stem from people registering false domains pointed at my IP, and an over enthusiatic WAF, which caused legitimate requests to get blocked.


Eh, I worked at the biggest Telco in Canada (50k employees), and we had failures all the time due to certs expiring.

It's incredible what happens when an organization is big enough, the years roll by, teams change, people leave and stuff just gets forgotten about. Even stuff that has tens of thousands of customers bringing in millions in revenue a year just doesn't have an owner or a person accountable. Happens every week.


No. It is really hard to convince the "Product Owner" that this automation is more important than the HR half-year review though (or whatever scrummy reason that has nothing to do with providing IT services and everything with building software products).


Funny that he considers certificate expiry as vulnerability.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: