You seem to be continuing to warn against a proposal that isn't the one that was...

phil21 · on Oct 21, 2016

It doesn't take much of an imagination to attack this.

The older I get in tech the more I realize we just go in circles re-implementing every bad idea over again for the same exact reasons each "generation". Ah well.

TTL is TTL for a reason. It's simple. The publisher is in control, they set their TTL for 60 seconds so obviously they have robust DNS infrastructure they are confident in. They are also signaling with such low TTLs that they require them technically in order to do things like load balance or HA or need them for a DR plan.

Now I get a timeout. Or a negative response. What is the appropriate thing to do? Serve the last record I had? Are you sure? Maybe by doing so I'm actually redirecting traffic they are trying to drain and have now increased traffic at a specific point that is actually contributing to the problem vs. helping. How many queries do I get to serve out of my "best guess" cache before I ask again? How many minutes? Obviously a busy resolver (millions of qps at many ISPs) can't be checking every request so where do you draw the line?

It's just arrogant I suppose. The publisher of that DNS record could set a 30 day TTL if they wanted to, and completely avoid this. But they didn't, and they usually have a reason for that which should be respected. We have standards for a reason.

rickhanlonii · on Oct 21, 2016

Assume we serve the last known record after TTL.

Here's the attack:

- Compromise IP (maybe facebook.com)

- DDoS nameservers

- facebook removes IP from rotation

- Users still connect to bad actor even though TTL expired

"We have standards for a reason" is absolutely correct, and we can't start ignoring the standards because someone can't imagine why we need them _at this moment_

bhauer · on Oct 21, 2016

Yes, but there's one piece missing.

> Here's the attack:

> - Compromise IP (maybe facebook.com)

- Attacker generates or acquires counterfeit facebook.com certificate.

> - DDoS nameservers

> - facebook removes IP from rotation

> - Users still connect to bad actor even though TTL expired

I understand what you are saying, but this attack scenario is extraordinarily difficult as a means to attack users who have opted to configure their local DNS resolver to retain a last-known-good IP resolution. It involves commandeering an IP and counterfeiting Facebook's SSL/TLS certificate. As I have said elsewhere in this thread, all sites are currently vulnerable to such an attack today for the duration of their TTL window. So if this is a plausible attack vector, we could plausibly see it used now.

Kalium · on Oct 21, 2016

You're right! Completely, absolutely, 100% right. If this was a plausible attack vector, we could see it used now. And you know what? We do!

This is why some people are concerned about technical decisions that make this vector more dangerous. Systems that attack by, say, injecting DNS responses already exist and are deployed in real life. The NSA has one - Quantum. Why make the cache poisoning worse?

bhauer · on Oct 21, 2016

Kalium, I really appreciate your responses.

If my adversary can steal an IP from Facebook, create a valid certificate for facebook.com, and provide bogus DNS resolution for facebook.com, I feel it's game over for me. My home network is forfeit to such an adversary.

But I get your point. It's about layering on mitigating factors. The lower the TTL, the lower the exposure. Still, my current calculus is that the risk of being attacked by such an adversary is fairly low (well, I sure hope so), and I would personally like to configure my local caching resolver to hold onto last-known-good resolutions for a while.

All that said, I have to hand it to you and others like you, those whom keep the needle balanced between security and convenience.

Kalium · on Oct 21, 2016

Now that I think about it more, it's even worse than that. A bogus non-DNSSEC resolution and a forged cert, both of which are real-life attacks that have actually happened, and you're done for. Compromising an IP isn't really necessary if you're going to hang on to a bad one forever, but it's a nice add-on. It removes the need to take out the DNS provider, but we can clearly see that that is possible.

Keeping the balance between security and convenience is difficult on the best of days. Today is not one of them. :/

Godel_unicode · on Oct 21, 2016

If you can forge certs for HTTPS-protected sites, this is not what you would use them for.

Kalium · on Oct 21, 2016

It's part of what I would use them for. A big, splashy attack distracts a bunch of people while you MITM something important with a forged cert? Great way to steal a bunch of credentials with something that leaves relatively few traces while the security people are distracted.

phil21 · on Oct 21, 2016

> - Attacker generates or acquires counterfeit facebook.com certificate.

So you enabled an attack vector that has to be nullified by a deeper layer of defense? And in some cases possibly impacted by a user having to do the right then when presented with a security warning.

Why would you willingly do that?

Also I do find your assumption of ubiquitous TLS rather alarming - facebook is a poor example here, there are far softer and more valuable targets for such an attack vector to succeed.

Edit: Also to keep my replies down...

> I would personally like to configure my local caching resolver to hold onto last-known-good resolutions for a while.

You can! All these tools are open source, and there are a number of simple stub resolvers that run on linux (I'd imagine OSX as well) which you can configure to ignore TTL. They may not be as configurable as you like, but again they are open source and I'm sure would welcome a pull request :)

Kalium · on Oct 21, 2016

The policy that caused so much pain before is to take DNS records, ignore their TTLs, and apply some other arbitrarily selected policy instead. I confess, I don't understand how the proposal at hand is different in ways that prevent the previous pains from recurring.

Maybe you can enlighten me on key differences I've overlooked? How do you define "failing to reply"? Do you ever stop serving records for being stale, or do you store them indefinitely?