Hacker News new | past | comments | ask | show | jobs | submit login
DDoS Attack Against Dyn Managed DNS (dynstatus.com)
1563 points by owenwil on Oct 21, 2016 | hide | past | favorite | 674 comments



Out of curiosity, why do caching DNS resolvers, such as the DNS resolver I run on my home network, not provide an option to retain last-known-good resolutions beyond the authority-provided time to live? In such a configuration, after the TTL expiration, the resolver would attempt to refresh from the authority/upstream provider, but if that attempt fails, the response would be a more graceful failure of returning a last-known-good resolution (perhaps with a flag). This behavior would continue until an administrator-specified and potentially quite generous maximum TTL expires, after which nodes would finally see resolution failing outright.

Ideally, then, the local resolvers of the nodes and/or the UIs of applications could detect the last-known-good flag on resolution and present a UI to users ("DNS authority for this domain is unresponsive; you are visiting a last-known-good IP provided by a resolution from 8 hours ago."). But that would be a nicety, and not strictly necessary.

Is there a spectacular downside to doing so? Since the last-known-good resolution would only be used if a TTL-specified refresh failed, I don't see much downside.


OpenDNS does this: https://support.opendns.com/hc/en-us/articles/227987767-Dyna...

It's called SmartCache.


I do this, too.

It's called HOSTS and djb's cdb constant database.

And one does not need to use a recursive cache to get the IP addresses. Fetching them non-recursively and dumping them to a HOSTS and a cdb file can sometimes be faster; I have a script that does that. Fetching them from scans.io can be even faster.

   cd||exit
   [ -c null ]||mknod null c 2 2 
   
   case $# in
   0)
   {
    sed '
         /#/d;
         /^[0-9]/!d;
    ' /etc/hosts \
     |{ 
        while read a b c d;
        do 
        echo +${#b},${#a}:$b-\>$a;
        done;
      }
    echo;
   } \
    |exec awk '!($0 in a){a[$0];print}' \
    |exec cdbmake $0.cdb $0.t||exit
   exec cdbdump < $0.cdb
   
   ;;1)
   test ${#0} = 2 ||
   exec cdbget $1 < $0.cdb >null;
   exec cdbget $1 < $0.cdb;

   esac

   usage: $0  
   usage: $0 domainname
First usage compiles and dumps database to screen. Second usage checks for presence of domainname and exits 0 if present otherwise exits 100. Third usage is if $0 is only two characters it will check for presence of domainname and if present print the IP and domainname in HOSTS format.

http://cr.yp.to/cdb.html

With all due respect to the enormous reliance on it that has built up over the past decades, DNS is not the internet. It is just a service heavily used for things like email and web. This does not mean, in an emergency, email and web cannot work without DNS. They once did and they still can.

The internet runs just fine without DNS. Some software may refuse to honour HOSTS and rely on solely on DNS. But that is a vulnerability of the software, not the internet. (And in such cases, e.g., qmail, I just serve my own zone via tinydns, which again is just a mirror of HOSTS.)


What you are doing and claiming is ridiculous.

For one, how are you doing to deal with stale records?


Awesome! Is this available as software I can install on my network? Sorry, probably a dumb question.


Nope, just point your machine or router's DNS to use opendns resolvers instead of your regular ones: 208.67.222.222 and 208.67.220.220


Do you have a link on the opendns web site that refers to those specific Ips?



One can go even further and install DNSCrypt:

https://dnscrypt.org/


Any downsides to using this? I'm tempted to start using it, but I'm not really sure if there's any particular thing I should consider first.


Be aware that some things (Netflix, Comcast, Youtube) expect you to use your local DNS server so that they can route you to the nearest media server. Using a central IP Address like what is mentioned here can result in unsatisfactory video streaming....at least that's what I found with our Apple TV.


OpenDNS sends your "EDNS client subnet" to some CDNs including Google, though maybe not Apple.

https://www.opendns.com/enterprise-security/technology/globa...


Yes, but beware, they (at least used to) resolve unknown names to a page filled with ads.



That's good to know - the ads are the reason I reluctantly switched from OpenDNS to google.

(Reluctantly in that Google already has enough of my data, thanks, through gmail, search, maps, docs and other services, not because it doesn't work well.)


Google DNS doesn't store any identifiable/private data, as far as I understand?

https://developers.google.com/speed/public-dns/privacy


Yea, but it's also plaintext. Super easy to tap, if I understand correctly.

Still, I prefer it to isps snooping.


Anyone know if Google Public DNS does?


It doesn't (first result is openDNS, second is google):

    $ dig -tA twitter.com @208.67.222.222

    ; <<>> DiG 9.8.3-P1 <<>> -tA twitter.com @208.67.222.222
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63973
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0

    ;; QUESTION SECTION:
    ;twitter.com.			IN	A

    ;; ANSWER SECTION:
    twitter.com.		0	IN	A	199.59.148.82
    twitter.com.		0	IN	A	199.59.149.198
    twitter.com.		0	IN	A	199.59.148.10
    twitter.com.		0	IN	A	199.59.150.7

    ;; Query time: 14 msec
    ;; SERVER: 208.67.222.222#53(208.67.222.222)
    ;; WHEN: Fri Oct 21 11:53:40 2016
    ;; MSG SIZE  rcvd: 93

    $ dig -tA twitter.com @8.8.8.8

    ; <<>> DiG 9.8.3-P1 <<>> -tA twitter.com @8.8.8.8
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 47295
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

    ;; QUESTION SECTION:
    ;twitter.com.			IN	A

    ;; Query time: 13 msec
    ;; SERVER: 8.8.8.8#53(8.8.8.8)
    ;; WHEN: Fri Oct 21 11:53:47 2016
    ;; MSG SIZE  rcvd: 29


A shame OpenDNS used to redirect me to some spam webpage every time I tried to resolve a domain that didn't exist--they earned a spot on my black list forever. :(


It's been years since we did that, and they were not spam pages, and easily able to opt-out.


Well, the fact that people still remember goes to show what a truly terrible idea it really was and that it probably did permanent damage to your brand.


I'm not sure what metric you use to judge it as terrible.

I thought it was great. 10,000 companies pay for my service today. 65 million people use my infrastructure today. Cisco bought the company for more than $650m. It continues to innovate on the decades old DNS in secure and useful ways.

So let me know what part is terrible.


The part where you repeated Verisign's mistake in breaking a fundamental protocol.

NXDOMAIN. Kind of a thing, and important to protocols other than HTTP.


The point is that the company did just fine even having made a mistake. Ignoring that is just being difficult.


No, the point that a company doing just fine is somehow an excuse for its actions is just the reason why we can't have nice things.


"I got mine."


I used OpenDNS for a long time. I eventually switched to Google DNS mostly because its IPs are shorter and easier to remember, and I didn't use any of the power user features for OpenDNS. I remember the page full of ads and to be honest I don't begrudge it. We all expect everything given to us for free these days, and then we don't even want the company to make money showing us an ad on the rare occasion that we mistype a URL. It's hard to get paid these days.

Ironically, those unrealistic expectations are probably a significant factor in the growth of data mining and resell; how else is a free-to-use website that doesn't have any ads (or whose users mostly block ads) going to get paid? You may say "not my problem", but it affects you when you leave the company no option but to resell data on the behaviors they observe from you.


That is not an appropriate tone for someone representing OpenDNS to take.


Why not? It's blunt, but to the point, honest, and passionate. Who cares about tone?


And seems very appropriate for the founder of OpenDNS. Pretty authoritative.


Because it's dismissive.


Everyone has preferences, I guess. I far prefer honest and curt to the kind of anodyne, contentless word-payloads pumped out by so many corporate communications departments.

Say, generating corporate communications seems like a promising direction for neural networks. A Markov chain comes close...


They don't do that any more, for what it's worth. I think for a while that was the only revenue stream for what was otherwise a free service. https://www.opendns.com/no-more-ads/


This attitude only promotes the idea that "well we might as well just continue like this then". If you can never forgive a company for doing wrong when they've corrected themselves years ago and now have a track record of doing nothing else that's irked you then what's the point in them ever bothering to make the change?

If what they do is useful to you but have a feature or bug or something else you don't like then you absolutely should forgive them if they then fix that feature or bug to work in a way you like. They may as well never bother fixing things if they can never be forgiven after repenting their internet sins.

If you've since found something that does do what you want then fair play, fill your boots. Otherwise you're being petty for the sake of being petty.


Historically, doing this has been a source of a truly awe-inspiring amount of pain.


Aw, don't leave us hanging like that. What problems did it cause?


Imagine migrating your website to a new host. A month later, you learn that a major ISP has decided that its customers don't need to know about the move, because they hold on to last-known-good records as they like. So half your traffic and business is gone. Or maybe you can't use anything run on Heroku, because the dynamicism there doesn't play nice with your resolver's policies.

That's the kind of world we used to live in when TTLs were often treated as vague suggestions.


The scenario I was describing was one where a last-known-good resolution would be used if and only if a refresh attempt fails after the authority-provided TTL expires.

I believe the scenario you are describing is a rogue ISP ignoring that authoritative TTL wholesale, caching resolutions according to its own preferences regardless of whether the authority is able to provide a response after the authoritative TTL expires.


The rogue ISPs thought they were helping people by serving stale data. After all, better something past its use-by date than failing, right? A low tolerance for DNS response times, and suddenly large chunks of the internet are failing a lot...

Among other problems, this enables attacks. Leak a route, DDoS a DNS provider, and watch as traffic everywhere goes to an attack server because servers everywhere "protect" people by serving known-stale data rather than failing safe.

Be very, very careful when trying to be "safer". It can unintentionally lead somewhere very different.


> A low tolerance for DNS response times, and suddenly large chunks of the internet are failing a lot...

Hang on a second. I feel that you're piling on other resolver changes in order to make a point. I'm not suggesting that the tolerance for DNS response times be reduced. Nor am I suggesting a scenario where the authority gets one shot after their TTL, after which they're considered dead forever. I would expect my caching DNS resolver to periodically re-attempt to resolve with the authority once we've entered the period after the authority's TTL.

> Leak a route, DDoS a DNS provider, and watch as traffic everywhere goes to an attack server because servers everywhere "protect" people by serving known-stale data rather than failing safe.

I think you're suggesting that someone could commandeer an IP and then prevent the rightful owner to correct their DNS to point to a temporary new IP.

Isn't the real problem in this scenario the ability to commandeer an IP? The malicious actor would also need to be able to provide a valid certificate at the commandeered IP. And at that point, I feel we've got a problem way beyond DNS resolution caching. Besides, if what you have proposed is possible, isn't it also possible against any current domain for the duration of their authoritative TTL? That is, a domain that specifies an 8-hour TTL is vulnerable to exactly this kind of scenario for up to an 8-hour window. Has this IP commandeering and certificate counterfeiting happened before?


> Hang on a second. I feel that you're piling on other resolver changes in order to make a point.

Yes. The point I am making is the additional failure modes that need to be considered and the pain they can cause. Historically have caused.

At no point did I ever think you were suggesting that one failure to respond renders a server dead to your resolver forever. Instead, I expect that your resolver will see a failure to respond from a resolver a high percentage of the time, leading to frequent serving of stale data.

> Isn't the real problem in this scenario the ability to commandeer an IP?

You're absolutely right! The real problem here is the ability to commandeer an IP.

However, that the real problem is in another castle does not excuse technical design decisions that compound the real problem and increase the damage potential.


> Instead, I expect that your resolver will see a failure to respond from a resolver a high percentage of the time, leading to frequent serving of stale data.

If this were true, the current failure mode would have end users receiving NX DOMAIN a "high percentage of the time," which obviously is not happening.

{edit: To be clear, I'm reading the quote as you stating that "failure to resolve" currently happens a high percentage of the time, and therefore this new logic would result in extended TTLs more often than the original post would assume they would happen}

> However, that the real problem is in another castle does not excuse technical design decisions that compound the real problem and increase the damage potential.

It's fair to point out that this change, combined with other known issues could create a "perfect storm," but as was pointed out this exploit is already possible within the current authoritative TTL window. Exploiting the additional caching rules would just be a method of extending that TTL window.

On the other hand, where do you draw the line here? If you had to make sure that no exploits were possible most of the systems that exist today would never have gotten off the ground. It seems a bit like complaining that the locks to the White House can be exploited (picked), while missing the fact that they are only supposed to slow someone down before the "men with guns" can react.


Based on the highly unscientific sample of the set of questions asked by my coworkers in my office today, the failure mode of end users receiving NX DOMAIN has happened much more than on most days.

I don't need to make sure no exploits are possible. However, it at all possible, I'd like to help ensure that things aren't accidentally made more dangerous. It's one thing to consider and make a tradeoff. It's quite another to be ignorant of what the price is.


Well, it obviously happens when the resolver is down, but that's the situation that this logic is being proposed to smooth over. The normal day-to-day does not see a high percentage of resolvers failing to respond, or else people would be getting NX DOMAIN for high profile domains much more often.


I'm just trying to make sure we don't wind up making DNS poisoning nastier in an effort to be more user-friendly.


All the attacks mentioned here seem to be of the following shape:

1. Let's somehow get a record that points at a host controlled by us into many resolvers (by compromising a host or by actually inserting a record).

2. Let's prolong the time this record is visible to many people by denying access to authoritative name servers of a domain.

(1) is unrelated to caching-past-end-of-ttl, so you need to be able to do (1) already. (2) just prolongs the time (1) is effective and required you to be able to deny access to the correct DNS server. Is it really that much easier to deny access to a DNS server than it is to redirect traffic to that DNS server and supply bogus reponses?


DNS cache poisoning is currently a very common sort of attack. The UDP-y nature of DNS makes it very easy. There are typically some severe limitations placed on the effectiveness of this attack by low TTLs. It does not require you to deny access to the authoritative server. This attack is also known as DNS spoofing: https://en.wikipedia.org/wiki/DNS_spoofing

Ignoring TTLs in favor of your own policy means poisoned DNS caches can persist much longer and be much more dangerous.


Right now, to keep a poisoned entry one must keep poisoning the cache.

In that world, one can still do that. One can also poison the entry once and then deny access to the real server. You seem to be arguing that this is easier than continuous poisoning. Do I understand you correctly?


You are correct in your assessment of the current dangers of DNS poisoning.

I am in no way arguing about ease of any given attack over any other. I am arguing that a proposed change results in an increased level of danger from known attacks.

I'm arguing that the proposed change at hand, keeping DNS records past their TTLs, makes DNS poisoning attacks more dangerous because access to origin servers can be denied. Right now TTLs are a real defense against DNS cache poisoning, and the idea at hand removes that in the name of user-friendliness.


The way I read your argument, it relies on denying access to be cheaper or simpler than spoofing (X == spoofing, Y == denying access to authoritative NS):

You are arguing that a kind of attacks is made more dangerous, because in the world with that change an attacker can not only (a) keep performing attack X, but can also (b) perform attack X and then keep performing Y. If Y is in no way simpler for the attacker why would an attacker choose (b)? S/he can get the same result using (a) in that world or in our world.

Am I misreading you or missing some other important property of these two attack variants?


I believe you may have failed to consider the important role played by reliability.

X cannot always be done reliably - it usually relies on timing. Y, as we've seen, can be done with some degree of reliability. Combining them, in the wished-for world, creates a more reliable exploit environment because the spoofed records will not expire. The result is more attacks that persist longer and are more likely to reach their targets.

Such a world is certain to not be better than this one and likely to be worse.


Indeed I didn't consider that. Thanks a lot for being patient and enlightening.


[flagged]


I appreciate the support. But FWIW, I don't think Kalium was trolling. Although he (I assume, but correct me if I am wrong) and I disagree on the risk versus reward of extending the time-to-live of cached resolutions beyond the authoritative TTL, I nevertheless appreciated and enjoyed his feedback.


You assume correctly.


I'm afraid we're simply going to have to agree to disagree on this point. I do not share the opinion that this is a good idea with significant upside and virtually no downside. I also do not agree that none of the issues I have raised apply to the original suggestion - I believe they do apply, which is why I raised them.


WRT the second attack, what they're referring to is actually DNS cache poisoning - inserting a false record into the DNS pointing your name at an attacker-controlled IP address. This is a fairly common attack, but usually has an upper time limit - the TTL (which is often limited by DNS servers).

This proposal would allow an attacker to prolong the effects of cache poisoning by running a simultaneous DDoS against un-poisoned upstream DNS servers.


Not sure whether it could be used in a legitimate attack (probably), but it can definitely lead to confusing behavior in some scenarios. You switch servers, your old IP is handed to some random person, your website temporarily goes down - and now your visitors end up at some random website. Would you want that? Especially if you're a business?

Also, "commandeering" an IP of a small hosting might be easier than you think. It depends entirely on how they recycle addresses.


You seem to be continuing to warn against a proposal that isn't the one that was made. What specifically is dangerous about using cached records only in the case of the upstream servers failing to reply?


It doesn't take much of an imagination to attack this.

The older I get in tech the more I realize we just go in circles re-implementing every bad idea over again for the same exact reasons each "generation". Ah well.

TTL is TTL for a reason. It's simple. The publisher is in control, they set their TTL for 60 seconds so obviously they have robust DNS infrastructure they are confident in. They are also signaling with such low TTLs that they require them technically in order to do things like load balance or HA or need them for a DR plan.

Now I get a timeout. Or a negative response. What is the appropriate thing to do? Serve the last record I had? Are you sure? Maybe by doing so I'm actually redirecting traffic they are trying to drain and have now increased traffic at a specific point that is actually contributing to the problem vs. helping. How many queries do I get to serve out of my "best guess" cache before I ask again? How many minutes? Obviously a busy resolver (millions of qps at many ISPs) can't be checking every request so where do you draw the line?

It's just arrogant I suppose. The publisher of that DNS record could set a 30 day TTL if they wanted to, and completely avoid this. But they didn't, and they usually have a reason for that which should be respected. We have standards for a reason.


Assume we serve the last known record after TTL.

Here's the attack:

- Compromise IP (maybe facebook.com)

- DDoS nameservers

- facebook removes IP from rotation

- Users still connect to bad actor even though TTL expired

"We have standards for a reason" is absolutely correct, and we can't start ignoring the standards because someone can't imagine why we need them _at this moment_


Yes, but there's one piece missing.

> Here's the attack:

> - Compromise IP (maybe facebook.com)

- Attacker generates or acquires counterfeit facebook.com certificate.

> - DDoS nameservers

> - facebook removes IP from rotation

> - Users still connect to bad actor even though TTL expired

I understand what you are saying, but this attack scenario is extraordinarily difficult as a means to attack users who have opted to configure their local DNS resolver to retain a last-known-good IP resolution. It involves commandeering an IP and counterfeiting Facebook's SSL/TLS certificate. As I have said elsewhere in this thread, all sites are currently vulnerable to such an attack today for the duration of their TTL window. So if this is a plausible attack vector, we could plausibly see it used now.


You're right! Completely, absolutely, 100% right. If this was a plausible attack vector, we could see it used now. And you know what? We do!

This is why some people are concerned about technical decisions that make this vector more dangerous. Systems that attack by, say, injecting DNS responses already exist and are deployed in real life. The NSA has one - Quantum. Why make the cache poisoning worse?


Kalium, I really appreciate your responses.

If my adversary can steal an IP from Facebook, create a valid certificate for facebook.com, and provide bogus DNS resolution for facebook.com, I feel it's game over for me. My home network is forfeit to such an adversary.

But I get your point. It's about layering on mitigating factors. The lower the TTL, the lower the exposure. Still, my current calculus is that the risk of being attacked by such an adversary is fairly low (well, I sure hope so), and I would personally like to configure my local caching resolver to hold onto last-known-good resolutions for a while.

All that said, I have to hand it to you and others like you, those whom keep the needle balanced between security and convenience.


Now that I think about it more, it's even worse than that. A bogus non-DNSSEC resolution and a forged cert, both of which are real-life attacks that have actually happened, and you're done for. Compromising an IP isn't really necessary if you're going to hang on to a bad one forever, but it's a nice add-on. It removes the need to take out the DNS provider, but we can clearly see that that is possible.

Keeping the balance between security and convenience is difficult on the best of days. Today is not one of them. :/


If you can forge certs for HTTPS-protected sites, this is not what you would use them for.


It's part of what I would use them for. A big, splashy attack distracts a bunch of people while you MITM something important with a forged cert? Great way to steal a bunch of credentials with something that leaves relatively few traces while the security people are distracted.


> - Attacker generates or acquires counterfeit facebook.com certificate.

So you enabled an attack vector that has to be nullified by a deeper layer of defense? And in some cases possibly impacted by a user having to do the right then when presented with a security warning.

Why would you willingly do that?

Also I do find your assumption of ubiquitous TLS rather alarming - facebook is a poor example here, there are far softer and more valuable targets for such an attack vector to succeed.

Edit: Also to keep my replies down...

> I would personally like to configure my local caching resolver to hold onto last-known-good resolutions for a while.

You can! All these tools are open source, and there are a number of simple stub resolvers that run on linux (I'd imagine OSX as well) which you can configure to ignore TTL. They may not be as configurable as you like, but again they are open source and I'm sure would welcome a pull request :)


The policy that caused so much pain before is to take DNS records, ignore their TTLs, and apply some other arbitrarily selected policy instead. I confess, I don't understand how the proposal at hand is different in ways that prevent the previous pains from recurring.

Maybe you can enlighten me on key differences I've overlooked? How do you define "failing to reply"? Do you ever stop serving records for being stale, or do you store them indefinitely?


So if our resolver was on our resolver was on our laptop and had a nice UI that would work great. Now the question is : why is the resolver not in my laptop?


It can be if you want it to be, but it's probably much less interesting than you think.

You likely underestimate the sheer number of DNS records you look up just by surfing the web, and how useful that information would be to 99.99% of users.

Basically the tools exist for you to do this yourself if you are so inclined, but they may not be that user friendly since they aren't generally useful to most.


Seconded. This is a common idea which occurs to people who haven't dealt with DNS before, and it ends with a much better understanding of how many things https is used for and going back to using openDNS as your resolver.


Proper cache invalidation is one of the 2 hard problems of computer science (the other 2 being naming things and off by 1 errors).


Yes, and there are 10 kinds of people in the world: those who understand binary and those who don't.


It'd be nice to have a "backup TTL" included, to allow sites to specify whether and how long they wanted such caching behavior.

Also, that cache would need to only kick in when the server was unreachable or produced SERVFAIL, not when it returned a negative result. Negative results returned by the authoritative server are correct, and should not result in the recursive resolver returning anything other than a negative result.


> Also, that cache would need to only kick in when the server was unreachable or produced SERVFAIL, not when it returned a negative result. Negative results returned by the authoritative server are correct, and should not result in the recursive resolver returning anything other than a negative result.

Precisely. I am not suggesting any change to how a caching resolver comprehends valid responses from the authoritative server for a domain. For example, if the authoritative server says, "No such domain," then the domain is understood to be gone. At that point, the domain being gone is in fact the last-known-good resolution.


This is why TTL is a variable. If you want to have your records last longer, you can. If you want to later shorten them, you can. People screw DNS up enough already, let's not make it worse by adding layers of TTL.


It might be a stretch to use the information, but the SOA RR does contain an EXPIRE field, defined as "A 32 bit time value that specifies the upper limit on the time interval that can elapse before the zone is no longer authoritative." It's an additional request, but the SOA RR does contain the type of information you are asking for.


I've been thinking of adding this exact feature to my DNS framework that I've been working on (if github was resolving): https://github.com/bluejekyll/trust-dns

If you have any feedback, I'd love to hear it.


Sounds promising. I'll want to take a look when GitHub is resolvable again. :)


To be perfectly honest, a "feature" like this has no business being in a safe and secure DNS server. You should fail-safe, rather than serving stale data of unknown safety.

Serving data you cannot verify is a dangerous failure state.


Perhaps. In this case the web was down. I definitely understand the point, stale data with TTLs which have expired, especially on RRSIG records is dangerous.

But I have to wonder about situations like this where DNS has been taken down, what the better good is. If the records can be proven to have been cached as authentic data at some point within some period of time. In this case hours, is it for the better good that stale authentic records are acceptable to serve back? In this case a stale period of some number of hours would have been good.

I'm not so sure which is better in this case.


Would you believe me if I said that the DNS protocol itself has an answer for this? The answer is in the basic design of what a TTL is. It's preferable to serve nothing than to serve known bad data. Stale data is a form of bad data.

As another user put it, we have these standards for good reason.


> Is there a spectacular downside to doing so? Since the last-known-good resolution would only be used if a TTL-specified refresh failed, I don't see much downside.

Because you would keep old DNS records around forever if a server goes away for good. So you need to have a timeout for that anyways.


Yes, but:

1) Memory and disk are cheap. My caching DNS resolver can handle some stale records.

2) I suggested above that this behavior would continue until an administrator-specified and potentially quite generous maximum TTL expires. That is, I could configure my caching DNS resolver to fully purge expired records after, for example, 2 weeks.


> 1) Memory and disk are cheap. My caching DNS resolver can handle some stale records.

The problem is not that it would require storage but that stale records can be outright wrong. That timeout would require configuration and DNS does not provide that.

So sure, a new timeout could be introduced but that currently does not exist in DNS.


> The problem is not that it would require storage but that stale records can be outright wrong.

Again, the scenario is that the authoritative/upstream resolver cannot be reached in order to refresh after the authority-provided TTL expires. Are you saying that in the case of a service having been intentionally removed from the Internet (the domain is deactivated; the service is simply no more), my caching resolver will continue to resolve the domain for a time? Yes, it would. What's the downside though?

> That timeout would require configuration and DNS does not provide that.

Yes. This would be a configurable option in my caching DNS resolver, in the same vein as specifying the forwarders, roots, and so on. But to be clear, this would not be a change to the DNS protocol, merely a configuration change to control the cache expiration behavior of my resolver. I'm not wanting to sound flippant, I'm not sure I understand the point you're trying to make here.


>The problem is not that it would require storage but that stale records can be outright wrong.

But the tradeoff here is a wrong record vs a complete failure to lookup the record. I would rather have the wrong one.


If a server goes away for good, at some point NS records will stop pointing to it. We could serve stale records as long as all of the stale record's authority chain is either still there or unreachable.


I've had an IP address from a certain cloud provider for a month. Some abandoned domain still has its nameserver and glue records pointing to the IP, and i get DNS queries all the time.

The domain expires in January. I hope it's not set to auto-renew. :-)


Note that this is already happening. The only thing my proposal would change is that it would also affect servers that used to be authoritative for subdomains of such abandoned domains. I would expect there to be very few of them: very few domains have delegations of subdomains to a different DNS server and they are larger and thus less likely to be abandoned.


I think what the poster above you is saying is a feature on some software that isn't in an RFC somewhere.


HTTP has a good solution/proposal for this: the server can include a stale-on-error=someTimeInSeconds header in addition to the TTL and then every cache is allowed to continue serving stale data for the specified time while the origin is unreachable. Probably a good idea to include such a mechanism in DNS, too.

https://tools.ietf.org/html/rfc5861


I can guarantee you that popular DNS resolvers (think 500b+ transactions a day) do have this feature!

Don't want to say much more due to it being my job, and I don't want to give away too much.

EDIT: https://www.google.com/patents/US8583801


What is the point of this comment?


FWIW, since the comment was a reply to my message above:

It provided value by answering my question concerning serious downsides to providing optional post-TTL last-known-good caching within a DNS resolver. The answer is implicit in that a major DNS resolver provides exactly this functionality.


Thank you :)

A little more information, considering it is public. (I had to double check if it was)

https://www.google.com/patents/US8583801


you mean opendns?


No I mean ISP's that run their own DNS resolvers.


i seem to remember that dns has generally been reliable (until recently, i guess), probably nobody has ever thought that to be necessary.

you could write a cron script that generates a date-stamped hosts file based on a list of your top-used domain names, and simply use that on your machine(s) if your dns ever goes down. that's basically a very simple local dns cache.

if you feel like living dangerously, have it update /etc/hosts directly.


> i seem to remember that dns has generally been reliable (until recently, i guess)

Probably because people used to use long TTLs (1 hour, 4 hours, whatever) and now the default behavior in services like Amazon Route 53 is to use 5 minutes.


Try Akamai managed CDN content. 20 seconds !!


The 20 seconds with Akamai is because of their dynamic end user IP mapping technology, Basically they need to map in near real-time based on characteristics of the end user IP, they can't afford a long TTL


It's not illegal to have TTL that short but it certainly feels like violation of some implicit contract between users and provides. Of course the root cause of this is the horrendous hack of using DNS for CDN routing. It doesn't have to be that way... I wrote a recent article about this very issue here

http://www.infoworld.com/article/3133104/mobile-technology/w...


I haven't heard of packetzoom before, I'll definitely take time over the weekend or next week to dig into your approach.

I wouldn't call DNS based IP mapping "horrendous" simply because it doesn't work as well for mobile,I understand you have your own pitch but lets go easy on the hyperbole :)

The fact is that it is still very effective. The major CDNs are quite aware of the mobile shortcoming of DNS based mapping and I am pretty sure it is something they are working to address.

At the end of the day location is just one component involved in accelerating content, there are plenty of other features that various CDNs use to deliver optimal performance.

Regarding the short TTLs, I get your argument, it is indeed like a user's browser is constantly chasing a moving origin. The alternatively however is a non-optimized web, which would be orders of magnitude worse. Remember the benefits of CDNs doesn't just accrue to end users but also to content providers, most origin servers can't handle even the slightest up tick in traffic.


"I wouldn't call DNS based IP mapping "horrendous" simply because it doesn't work as well for mobile"

OK I'll take back the word "horrendous" but it's a hack alright.

> The fact is that it is still very effective. The major CDNs are quite aware of the mobile shortcoming of DNS based mapping and I am pretty sure it is something they are working to address.

No not really. They're certainly trying to patch DNS to pass through enriched information in DNS requests through recursive calls... but it's such a long shot to work consistently across tens of thousands of networks around the world, and requires coordination from so many different entities, that it's clearly a desperation move more than than a serious effort. Regardless, there's no real solution in sight for the web platform.

For mobile (native) apps though, the right way to discover nearby servers is to directly build in that functionality using mobile specific techniques. There's no reason to keep limiting mobile apps to old, restrictive web technologies considering that apps have taken over as majority of traffic around the world. That's the root idea behind a lot of what we're doing at PacketZoom. Not just in service discovery, but also in more intelligent transport for mobile with built-in knowledge of carriers and network technologies etc, automatic load-balancing/failover of servers and many other things. Here's my older article on the topic

http://www.infoworld.com/article/3016733/application-develop...


Say I want to implement my own dynamic DNS solution on a VPS somewhere - if I set short TTLs am I causing problems for someone? How short is too short?


This company, for example, has a 99.999% uptime SLA. Thats roughly 5 minutes per year.


I think a problem that you might be overlooking is that DNS lookups aren't just failing, they are also very slow when a DDOS attack is underway on the authority servers. This introduces a latency shock to the system which causes cascading failures.


All will break the moment that one of the websites that you access makes a server-side request to another website ( think about logging-services, server-clusters, database servers, etc - they all either have IPs or most-likely some domains. )


I'm not sure I understand what you're saying.

The scenario is that my local network's caching DNS resolver retains resolutions beyond the authority-provided TTL in the event that a TTL-specified refresh at expiration fails. Therefore, my web browser may—in the very rare situation where this arises—make an HTTP request to an IP address of a server that has been intentionally moved by a service provider (let's assume they did so expecting their authoritative TTL to have expired). Since this scenario only arises because my caching resolver wasn't able to reach the authority, I'm not seeing a downside.

But if I understand your reply correctly, you are saying that the web server I've contacted may, in turn, be using a DNS resolver that is similarly configured to provide last-known-good resolution when its upstream provider/authority cannot provide resolution. This would potentially result in that web server making an HTTPS API request to a wrong IP, again only in the rare case where we have defaulted to a last-known-good resolution. I'm not really seeing the problem here except that the HTTPS request might fail (if the expected service was moved and no longer exists at the last-known-good IP), but how is that worse than the DNS resolution having failed? In both cases, the back-end service request fails.


One possible issue is that IPs are re-used in cloud environments. Potentially, your browser could POST sensitive data to an IP address that now belongs to a totally different company.


Yeah, that is definitely possible.

I mean, hopefully it is over HTTPS so they can't do anything with it... but if it isn't then it can definitely happen. Our servers get random web traffic all of the time.


True. I am not aware of any web services POSTing sensitive data over the public Internet that don't use HTTPS. If your service is sending sensitive data over HTTP without TLS, I feel the problem is bigger than a potential long-lived DNS resolution.


Valid point! One would hope that does not happen.


I mean, hopefully it is over HTTPS so they can't do anything with it...

DV certs only rely on you being able to reply to an HTTP request, so if any CA was using such a caching DNS server, you could probably get a valid cert from them.


HTTPS does not protect you against sending data to a host owned by another company.


Yes it does, the cert presented by api.othercompany.com would not pass validation when you're trying to open a connection to api.intendedcompany.com.


Correct, but they wouldn't be able to decrypt the data.


The data doesn't even get there, the handshake kills the connection before that.


I think you understood me. Maybe I can explain more.

If you have a service `log.io` with it's own DNS servers ( running named or djbdns ). And one day you decide to shut them down and rename the service to `loggy.io`.

What will happen is that any DNS trying to query the `log.io` DNS will reach unreachable server, which will lead to serving the last-known IP from the proposed DNS Cache on your machine.

If you don't use forever-failback-cache after the TTL expired you will just reach unreachable server and return back no IP address.


This is the equivalent of retiring the domain name itself. If you stop renewing it anyone can hijack it and serve whatever they like. Not to forget, they will also get email intended for that domain.

Anyone sane will keep the domain name and ns infrastructure and serve a 301 HTTP redirect.

All anyone is proposing here is to override the TTL to something longer (like 48h) if the nameserver is unreachable.

Of course the perfect solution would be to have the recursive nameserver fetch the correct record from a blockchain.


Thanks for the reply.

> If you have a service `log.io` with it's own DNS servers ( running named or djbdns ). And one day you decide to shut them down and rename the service to `loggy.io`. > What will happen is that any DNS trying to query the `log.io` DNS will reach unreachable server, which will lead to serving the last-known IP from the proposed DNS Cache on your machine.

To reiterate the scenario you've put forth as I understand it: I'm a service operator and I've just renamed my company and procured a new domain. I've retired the old domain and expect to fulfill no more traffic sent to that domain. When a customer of my service attempts to resolve my old domain, their caching DNS resolver may return an IP address even though I have since shut down the authoritative DNS servers for the retired domain. They will make an HTTPS request to my servers (or potentially someone else's, if I also gave up my IP addresses), and fail the request because the certificate will be a mismatch.

The customer's application will see a failed request either by (a) DNS failing to resolve or (b) HTTPS failing to negotiate. Either way, my customer needs to fix their integration to point to my new domain.

To be clear, it is up to my customer to decide whether they want their caching DNS resolver to provide a last-known-good resolution in the event that authoritative servers are unreachable. If they prefer failure type (a) over (b), they would configure their DNS resolver to not provide last-known-good resolutions.


When your customer sees that HTTPS error, they may associate it with your company failing at security.


You can install EdgeDNS locally. It does that, among other things.

https://github.com/jedisct1/edgedns


This can be pretty bad in a world where AWS ELB IP addresses change regularly.


Why? The OP is only proposing using a cached result when there's no updated record available.


Serving wrong records is usually worse than serving no records.

EDIT: It would be fine as long as your site only served HTTPS content and HSTS was enabled for your domain, preventing any sort of MITM attack.


If your DNS server is offline, is the last record it returned when it was online really the "wrong" one? There'd be no right one in that case.


Exactly. You can't know if it is still valid, so you might send clients to an IP that's now controlled by somebody else. Worst case, they know and set up a phishing site. DNS generally has been reliable enough that the trade-off is not worth it.


> Worst case, they know and set up a phishing site.

They'd need to specifically gain access to the last known good IP address, which might be different depending on which DNS resolver you talk to (geodistribution, when the record was last updated, etc). I wouldn't really consider that a realistic attack vector.


Withing a small hosting provider this might be pretty simple. Attacker might lease a bunch of new servers and get the IP that was recently released. Then they could launch a DDoS to force address resolution in their favor. It's a bit far-fetched, but a lot of very successful attacks seem that way until someone figures out a way to pull them off.


Sure, within a small host or ISP, that may be doable; even then, I'd consider it a stretch. But if we were to limit it to those constraints, when will this ever be exploited except as a PoC? No entity large enough to actually want to run this exploit on has an architecture where this would be feasible (due to things like geodistribution and load balancing), nor would be hosted on a provider so small that their IP pool can be exploited in the manner described. If I were an attacker, I'd focus on something with much bigger RoI.


How about DNS cache poisoning? Serving stale data combined with a DDoS on the root resolver makes DNS cache poisoning more dangerous by prolonging it.


I like this idea. Grab a new elastic IP on AWS. Set up an EC2 listening on 80, maybe some other interesting ports, and see if anything juicy comes in. Or just respond with some canned SPAM or phishing attempt. Repeat.


I guarantee you'll get lots of weird traffic. We do all the time on our ELB-fronted app servers.


Yes. You either positively know the right answer, or you return the fact that you don't know the right (currently valid per the spec) answer. The right answer in the situation you posed is "I don't know".


Relevant (or at least a-propos) post by Bruce Schneier, from a month ago: "Someone Is Learning How to Take Down the Internet"

https://www.schneier.com/blog/archives/2016/09/someone_is_le...

Edit: And to be clear: I don't mean to imply there's any connection :)


Prediction: A massive, sustained attack will occur on key US Internet infra on election night in an attempt to debase the US election results.


That was exactly my thought. This may be unrelated, or it may be a test run. But a large scale attack on Election Day that crippled communications would stir up unrest for a variety of reasons. Although I think that's highly unlikely to change the outcome, unrest after such a contentious election is not good.


If this is a test run, this is an amazing early warning for Twitter and the like to immediately start working on contingency plans for election day.


What can they do? It's not Twitter themselves being DDOS'd, it's a DNS provider. This propagates up the chain to impact both a Tier 1 network and cloud providers, which hits tons of stuff on top of that.


Have a failover DNS provider. Amazon uses Dyn, but also has UltraDNS as a backup, and it's obviously still up. Twitter vs Amazon:

host -t ns twitter.com: ns3.p34.dynect.net, ns4.p34.dynect.net, ns1.p34.dynect.net, ns2.p34.dynect.net.

host -t ns amazon.com: ns3.p31.dynect.net, ns4.p31.dynect.net, ns2.p31.dynect.net, pdns6.ultradns.co.uk, pdns1.ultradns.net, ns1.p31.dynect.net.


Thank you so much for mentioning this. This was my first thought when I heard about all the major enterprise sites affected by the DDoS:

"How do all these major players have singly-homed DNS"?


if you utilize geoip routing features of one provider, it can be difficult to impossible to then ensure repeatable/deterministic behavior on a second provider.


They could distribute instructions for users to follow, E.g. using some permanent working IPs or alternate DNS servers.


Gives them time to test the countermeasures before 11/8


Assuming you're referring specifically to targeting media companies reporting on the results and not the electric grid like someone else mentioned, wouldn't they have to DDoS Google itself for that to work? I don't really see a DDoS of Google being effective.


[flagged]


This comment says literally nothing.


Probably not a great idea. If the internet went down at my work, none of us would be able to do anything, so we'd probably all head out to the polls just because we have nothing better to do. Unintentionally increased turnout.


This is terrifying. Thankfully I don't think much actual voting infra is network reliant. But it could probably delay the results from being finalized for days, and allow Trump to spew further allegations of rigging.

Though if they targeted electric grid, water, and public transport, starting early in the day and choosing the regions by their populations political leaning, it could easily have an effect on the result itself.


You don't need to target voting infrastructure. You target media infrastructure (DNS, streaming, web media) in order to either reduce or shift voter turnout. A candidate ahead in a battleground state? You stomp on media reporting to ensure their opponent's voters aren't dissuaded from heading to the polls.

Control the message, and through that the actual votes cast.


I dont even think you need to necessarily shift voter turn out. I think you need to sow enough confusion in order to cast the results into doubt.


Yeah it just needs to be "The internet was broken so your votes were lost" and then some made up post-hoc explanations that 90% of people don't understand so they can't dispute


Given various fuckups over the years, media won't call a state until the polls are completely closed. Silencing them doesn't change this strategy.


This is not congruent with their behavior during the primaries.


I'm working at the polls in CA, and can verify this; all critical information is moved by sneakernet with a two-person rule on its handling.

Of course, I have no information on the security model of the pre-election preparations and post-election tabulation, but luckily results for each polling place are also posted for the public to inspect - media outlets and campaigns can verify the tabulation themselves with a slight delay.


Hahahaha. For sure it's not supposed to be network reliant. But from my experience working on critical infra, even things like power grids and rail systems, this is almost never the case.


That is a terrifying thought. Sounds like the plot of a potential Neal Stephenson novel.


I think you're right, not much of the voting infra is network reliant, but the more I think about it the more it seems that the "fear" factor of such outages could influence the election. Or, perhaps a curated working set of information sources, thanks to selective DDoS. Regardless, terrifying to be sure.


Is it really important who wins it there are only two candidates that share common view on many problems? And you don't need Internet to count votes anyway.


It's okay. James Comey, the FBI chief, said the US electoral system is such a mess, it would be too hard for an attacker to hack it or damage its integrity in any way. It's all good.

https://www.techdirt.com/articles/20160912/16553435504/fbi-d...

Of course, he said nothing about internal rigging:

https://twitter.com/TweetBrettMac/status/789372518436052992


Please keep the unfounded conspiracy theories off Hacker News. Thank you.


You have complex situation and you can't understand it, conspiracy theory offers simple answer.


> The Department of Homeland Security told CNBC that it is "looking into all potential causes" of the attack.

http://www.cnbc.com/2016/10/21/major-websites-across-east-co...

Is this par for course for all large DDOS attacks or did something tip them off?


More like for the first time in a long time, serious negative economic impact is occurring. I sincerely wish this was a wake up call, but it won't be.


From what I know of the situation (don't trust me, I'm not going to offer citations or sources), this attack wasn't particularly large in terms of gigabits/second. It was, however, very large in terms of economic impact.

I would assume that when a large number of big enterprise-y things go down, HSI takes notice. When other providers get attacks that are 20x larger (gbit/sec), but have much less widespread impact and impact on less enterprise-y things, they don't care so much.


>We don't know who is doing this, but it feels like a large nation state. China or Russia would be my first guesses.

Why not the USA?


It doesn't make a whole lot of sense for the USA to take down the internet, as they benefit the most from it. A significant fraction of that economy is based on it, much larger than in the cases of China and Russia. It would be like the owner of a coal mine campaigning for a carbon emissions tax: maybe there's something we don't know, but from the information we have it seems unlikely.

Note that this wouldn't rule out the USA as such. First, it could be a longshot preparedness thing, with no expectation that it would ever be used. Second, they could be red-teaming the thing (looking for weaknesses so that they can arrange for them to be shored up).

In either of these scenarios, it's no less likely that the USA would be doing it than anyone else. If you assume that whoever is doing this is planning to use their knowledge, however, the economic argument makes the USA less likely to be involved.


There are many types of actors even within the USA nation-state / government.

For example, if a particular part of the government got wind of a data dump about to be released by another nation-state or independent actor (for example, a leak of some kind) - I think some parts of the USA government that possesses the ability to do so wouldn't hesitate to take down dns to the entire internet to avoid another similar data leak to the Snowden dump.

Be really wary of attributing intent: you do not know who will benefit the most from taking down certain services. To claim that the US benefits from the internet so much that it wouldn't do certain actions to protect itself from certain types of harm is shortsighted.

Even my example could be really wrong, but the idea is that nobody really can say - "oh the internet is too important to xyz, they'll never do anything!"


> wouldn't hesitate to take down dns to the entire internet to avoid another similar data leak to the Snowden dump.

I don't understand how this would change anything unless you're assuming they would take down the Internet permanently


I'm probably wrong, but this is how I see it (not sure about the OP).

News cycles happen fairly rapidly, so if you could take down a number of sites that might be friendly to the dissemination of potentially damaging information just long enough such that it's forgotten about, or the attack is so large the media talks about the attack instead, then you might be able to successfully avoid widespread public knowledge of such information. Though, this would be best aided with collusion or cooperation (intentional or otherwise) from the media. Toss in a few unrelated services as a bonus for collateral damage, and you might be able to avoid scrutiny or, at the very least, shift the blame to an unrelated state actor. It won't prevent the release of information, but that's not the point--you want to prevent the dissemination and analysis of that information by the public at large.

This is all hypothetical, of course, and not likely to work. It also comes with the associated risk that if you were discovered or implicated, public outrage might be even worse than if you allowed the release of the information you hoped to distract from in the first place! As such, I can't imagine anyone would be stupid enough to try.

I'll take my tinfoil hat off now.


If you take them offline before they've managed to disseminate the info, then it can't be forgotten because nobody knew about it in the first place. Which means when the sites come back on, the info is still newsworthy.


What kind of information would be so sensitive as to risk crashing the economy over, yet so trivial that people would forget about it because they couldn't access Twitter for a day? I get that it's more nuanced than that, but I'm really struggling with this scenario; sensitive information tends to get out if it's important enough, even if you're willing to kill a bunch of people.


It renders the server that's hosting a leak unable to broadcast the leak temporarily, while they arrange more conventional measures to seize it. It's a more rapid response than getting a warrant and a police team on location. The broad nature of the attack also avoids tipping off the server owners.

I still give it less than a 5% probability, though.


> the server that's hosting a leak

Honestly, that's fairly thin. WL uses torrents and other means of disseminating data that don't rely on central control structures. Plus, presumably, WL has the ability to quickly shift data into secure hands who are willing to release it when things quiet down.

So, sure, the USA could go send someone to sieze the hard drives of someone who has confidential information. But, I have to imagine one of the first steps when getting that kind of information is to disseminate it to others (at least some of whom are unknown to the states). If they were hit, these people would very quickly take that as a signal to indiscriminately release all the information.


Or they could take it down for enough time for other parts of the government and/or international diplomatic system to do their work.

Remember, a data leak is not just a technical issue. They can resolve it in any number of ways - get a small team incursion into another state's territory for extraction, etc. All the outage needs to do is to hold open that window for enough time for all the different parts of the entire threat response chain to do each part's job.

A lot of technical people think tech is the end, but no - if you get a small team to go knock on the person's door, and get your internet response team to shut down dns, or to get someone on site at the telco to perform certain actions at the router/switch level, etc all portions working together is a powerful way to resolve or to accomplish certain goals.

Think bigger, especially with state actors - the resources are there, and this line of thought is probably really basic stuff that people came up with in the 1960's or 70's (even when the arpanet was being created, there was probably already a team tasked with taking such actions - it only make sense to have 2 teams working on such goals in tandem - one to create the network, the other to take it down)


For either of your cases, I can't imagine the value of having it last this long, or even be this severe. The impact on the economy and trust in a infrastructure company is too high.

I'd be more willing to put my money on someone attacking an entity downstream who is normally immune to DDOS attacks of this size.


It would be like the owner of a coal mine campaigning for a carbon emissions tax

Not necessarily so strange. See https://en.wikipedia.org/wiki/Bootleggers_and_Baptists for instance.


Total and absolute speculation follows: If the US wanted reliable take-down capability, they might want to test it first, and it would be least provocative if they tested here in the US.

As for the length of the "test," they might want to see how the US would react to such attacks in the future, and shake out anything critical. "Oh, these two agencies can't talk to each other. Good to know."

I hate the way modern times makes me look.


The usual thinking goes something like; well, the US created the internet so why would they want to take it down? Yes, NSA spies and all that, but they need the internet up to do that and also as bad as NSA is, it's nowhere near as bad as China or Russia where they ... (ranges from censorship to eating babies alive)


> The usual thinking goes something like; well, the US created the internet so why would they want to take it down?

To pin it on someone else?

"17 Intelligence agencies told me Russia hacked our DNC thing" (Clinton).

So maybe it is now "Oh look they took down the whole internet as well".


>To pin it on someone else?

That's... kind of conspiratorial thinking? Would you cut off your own hand so you could blame it on someone else?


As someone else pointed out this is a classic false flag operation. Look up Gleiwitz incident and Operation Northwood. It can be very effective. With good opsec and anonymity online it can be even easier.

> kind of conspiratorial thinking?

You mean like lizard aliens infiltrating our planet? -No. But in the realm of "shooting down of passenger and military planes, sinking a U.S. ship in the vicinity of Cuba, burning crops, sinking a boat filled with Cuban refugees, attacks by alleged Cuban infiltrators inside the United States, and harassment of U.S. aircraft and shipping and the destruction of aerial drones by aircraft disguised as Cuban MiGs", yes.

It was mostly a reply to "US would have absolutely no reason for doing this" and the reply is there cold be a plausible reason.


It's called a false flag operation, and does happen.


And the ratio of false flag operations to false accusations of false flag operations is about 1:99. Lots of unlikely things "do happen," but if that's the immediate explanation you reach for you're going to be wrong most of the time.


> Lots of unlikely things "do happen," but if that's the immediate explanation

Where did I say it was my immediate explanation and this is _likely_ what is happening?


You might want to take this up with that "rdtsc" guy who wrote "As someone else pointed out this is a classic false flag operation." He seems to disagree with you on those points.


Oh right, I know him! He is a decent fellar. I think he was saying if US is attacking its own infrastructure, then a false flag operation is a plausible explanation. Talking to him a bit revealed he didn't say this US attacking itself.


Not suggesting this is a false flag operation, but pointing out the concept as some commenters seemed unfamiliar with it.


Of course, I just tried to explain how the thinking behind the sentiment that it obviously isn't the U.S., not saying I agree with it.


Russia retaliating for the US taking out the ESA Mars Drone.


I thought, "first actual space battle! Neat!"

Then I felt horrible.


What's this about? Confusing premise. Got any links?


Not a joke, just conjecture.

Some NPR story about US Cybercommand responding to Russian cyber attacks, 'at place and time of our choosing.'

'Some you might hear about. Some you might not.'

FFWD to a couple days ago, NPR story about a botched European and Russian lander.

Today, US Eastern Seaboard is seeing connectivity disruption due to DDoS attacks.

*

Unwinding the stack, the latest news is these DDoS attacks are not likely state sponsored.

Russia pulling off a coordinated attack that soon after and in response to my theorized US retaliation seems unlikely.

US attacking a joint partnership between Russia and Europe civilian space program seems unlikely.


I assume it was just a joke...


From the context of the paper, because the USA could just send a three-letter-agency agent of some sort to Dyn, a US-based company, and ask what their infrastructure looks like? (Presuming of course some weird scenario where they weren't already tracking it, which seems unlikely.)


Why would they attack themselves?


Because we are not indiscriminate like that.


Who is "we"? The u.s. government is a conglomerate of interests, organizations, individuals... Many of whom are quite indiscriminate. I'm not at all proposing that the U.S. was involved here. I'm questioning the simple identification of the U.S. government with the word "we", and the corresponding assumption that this institution is integrated in a carefully discriminating way...


Why the USA?


With the elections coming, USA would be on the top of my list.


why?


Because America is an Orwellian hellhole, gawd, haven't you read animal house 84?!


I did, as far as I remember these book are all about USSR ;)


the number of people missing the sarcasm in your post is worrying


Because it's being down voted? I think any down votes are more likely because the comment doesn't add substantively to the conversation and is distracting given other recent threads.


um. if you are basing your logic on orwelianness then the England should be your your top candidate. massive monitoring of the populace, severe restrictions on the ability to carry anything that is vaguely pointy or goes bang, poor freedom of speech rules (relative to the US).... I love England but as far as orwellian societies go, you can do notably worse than the US.


As someone living in England, I can confirm that a) we are an extreme surveillance society, which the general population neither really understands, nor cares about b) the vast majority of us are very grateful we don't have (legalised) guns on the streets, and we suffer from a much lower homicide rate as a result


Schneier: "I have received a gazillion press requests, but I am traveling in Australia and Asia and have had to decline most of them. That's okay, really, because we don't know anything much of anything about the attacks.

If I had to guess, though, I don't think it's China. I think it's more likely related to the DDoS attacks against Brian Krebs than the probing attacks against the Internet infrastructure, despite how prescient that essay seems right now. And, no, I don't think China is going to launch a preemptive attack on the Internet."

[1] https://www.schneier.com/blog/archives/2016/10/ddos_attacks_...


If this is supposed to be "taking down the internet", then I'm not impressed. Using cached DNS still gives access to any service. I'm even typing here on HackerNews.

If this is another practice run, then I'm still not impressed. Taking down one provider is not that hard. Good luck finding the resources to do this DDoS to ALL large DNS providers out there.

Maybe it's not really fair to link to that post every time a DDoS with more than average payload happens. Especially since the post doesn't mention any specifics, because well, "protect my sources". It's like the "buy gold now" guy starring in the 2 AM infomercial predicting an economic recession within the next 5 years, without adding what the exact cause is going to be. He is probably going to be right, but that doesn't make him a visionary.


I work as a freelancer and today I didn't get paid and that's just me. Companies probably lost millions today. By just one DDoS to one DNS provider.

Yeah it's not the whole Internet, but how do you define "taking down the Internet" anyway. Is it every connected computer or just a huge amount of interconnected big websites? Because the latter is happening right now.


Let's try to put this DDoS attack in some context aside from the technical part.

As @scrollaway mentioned, 6 weeks ago, Bruce Schneier posted that several companies told him that they're detecting attempts to probe their networks and find ways to bring it down https://www.schneier.com/blog/archives/2016/09/someone_is_le...

Now let's look at the progress of events:

- Hillary Clinton's personal email server was hacked a while ago.

- A lone hacker published a document obtained by hacking the DNC servers. The document includes opposition research on Donald Trump and how Hillary can attack him in the election.

- Wikileaks published emails obtained by hacking the DNC

- US intelligence agencies confirmed that Russia was behind the DNC hack

- It was reported that the CIA is starting a cyber attack against Russian targets. http://www.nbcnews.com/news/us-news/cia-prepping-possible-cy...

- This is happening while the war in Syria and Iraq is growing. The Russians are there to "fight ISIS" but they have deployed an air defense system even though ISIS doesn't have any air force.

- Russia's only air craft carrier is trespassing through UK waters to get to Syria in a show of force that doesn't really add anything to their military capabilities there.

https://www.theguardian.com/world/2016/oct/20/russian-fleet-...

https://www.theguardian.com/world/2016/oct/19/convoy-of-russ...

- Finland (yes, Finland) is increasingly worried about Russia. They violated their air space, and they're questioning Finland's independence. Finland shares a long boarder with Russia.

http://www.businessinsider.com/r-finland-sees-propaganda-att...

- US ships were attacked near Yemen after they're bombed some targets the belong to the rebels. https://www.theguardian.com/us-news/2016/oct/13/us-enters-ye...

- US election is in 3 weeks and Donald Trump is openly in love with Putin. Trump questioned the benefit of NATO which is the basis for Europe stability after the 2nd world war.

Say Hello to World War III, everybody!


> US election is in 3 weeks and Donald Trump is openly in love with Putin.

He states that he's never met Putin nor has any holdings in Russia. He has stated that he is open to positive relationships with the Russian government.

> Trump questioned the benefit of NATO which is the basis for Europe stability after the 2nd world war.

I believe he stated that he wants NATO to "pay their fare share" in the costs of maintaining the organization.

I'm not a Trump supporter but we shouldn't believe everything we read.


Trump says he never met Putin, now. In the past, he said he did. I just did a search for "trump met putin" and found a bunch of news sites reporting that in a GOP debate a while ago Trump said

“I got to know him very well because we were both on ‘60 Minutes,’ we were stablemates, and we did very well that night.”


Trump was boasting in that debate about nothing, they were on the same episode of 60 minutes but they were not even on the same continent for that episode.

http://time.com/4108198/donald-trump-60-minutes-putin/


That's not really the point. The point is Trump is now saying he hasn't, but in the past he said he has. Not only is he contradicting his earlier statement, but it also makes him not trustworthy. And of course, if he was boasting about having supposedly met Putin in the past, that means he thought it was a good thing to boast about, which suggests that he is sympathetic to Putin and to Russian interests.


> - Finland (yes, Finland) is increasingly worried about Russia. They violated their air space, and they're questioning Finland's independence. Finland shares a long boarder with Russia.

The Finns actually have still quite good relationship with Russia (better than other neighbors) and nobody's actually questioning Finland's independence. Baltic countries is a different story.

Source: A Finn here.


It's good to know things are clam, I was just quoting the article. Not sure where they got that from

> Finland is becoming increasingly worried about what it sees as Russian propaganda against it, including Russian questioning about the legality of its 1917 independence


Lithuanian here, can confirm. Quite scared of Russia, for (what I hope are) understandable reasons.


> - Russia's only air craft carrier is trespassing through UK waters to get to Syria in a show of force that doesn't really add anything to their military capabilities there.

If they were really gearing up for war why would they move their only carrier away from the mother land. Your article even says it is more of a "show of force" than start of war.

So how did you jump to WW3?


I'm sure that carrier is being followed by multiple NATO submarines as well.

It seems incredibly unlikely that a global war would start over Syria when we've had 60 years of proxy conflict instead. Russia or NATO have absolutely nothing to gain from an open military conflict.


> - Finland (yes, Finland) is increasingly worried about Russia. They violated their air space, and they're questioning Finland's independence. Finland shares a long boarder with Russia.

Finland isn't worried, they have had stable relations for half a century as both sides agreed to not mess with each other. They have even refused to join NATO because it is actually safer for Finland and vice versa.

Cowboys with missiles stationed on Russia's border making hyperbole statements (like you do) - now that would be a real threat. (the same was also true the other way around with the Soviets stationing missiles in Cuba)


> Say Hello to World War III, everybody!

Is this sabre rattling or the prelude to a global conflict? Surely at worst it will (continue to) be a proxy war between NATO and Russia in Syria and nothing more? What motive is there for Russia or NATO to engage in open warfare? I'm not sure that a slow and prolonged lead up to an open war would even be effective in this situation.

Perhaps it should be "Say hello to Cold War v2.2017"?


Hopefully they stick to semver


Your threat can't be parsed due to depreciation of the old rhetoric API. Please upgrade your rhetoric to 2.3.x.


I like this version number better :)


I just want to sell my software, why does everyone have to fight?!

Thank you for these links. I'm trying not to get wrapped up in conspiracies but am increasingly worried by the mounting conflict. I'd love to hear a calm, reasoned response from someone more knowledgable than me on these topics.


The global powers are negotiating with themselves, and consolidating state power at home. Proxy wars for the former, and fear mongering at home for the latter.

The players are the 0.001% who control these states and the rest (we) are the captive (and propagandized) audience. They are being super kind as to at least make it entertaining for us.


Yeah I've gotten a little wrapped up as well looking into all of this mounting tension between the US and Russia. Protip, stay away from /r/the_donald.

My personal opinion is that it is mostly political and I think (hope) that what is happening in Syria won't escalate to direct conflict between the US and Russia.

I stumbled across this little blog article the other day and it helped relieve some of my anxieties.

https://cluborlov.blogspot.com/2016/10/oopsa-world-war.html


I don't actually think this will lead to open conflict. My comment at the end was just saying that this is what a world war would probably look like now, and that this back and forth might continue for a while. I'd like to hear from an expert as well, rather than rely on piecing news items together.


I'm not worried about open conflict, I'm worried that the internet will become a battlefield and our businesses will be caught in the crossfire.


> - Russia's only air craft carrier is trespassing through UK waters to get to Syria in a show of force that doesn't really add anything to their military capabilities there.

I read they were passing in international waters. Is that not the case? It's clearly a show of force, but no need for the hyperbole if it is not true.


Foreign vessels are allowed to transit through another nations waters. This happens regularly and is not in any way noteworthy.


Thanks for the info. What was noteworthy to me is that their aircraft carrier is running on diesel and is clearly something from the 80s.


prediction: in some time they will probe Google. That will be fascinating.


I think it's safe to assume they've been probing Google for 15+ years. One documented example: https://en.wikipedia.org/wiki/Operation_Aurora


Really? It's unclear there's any connection between that vague article and Dyn's problems today.



Let's hope they update that page after today's incident.


Maybe they are also using deep learning.


Maybe some Deep Learning experiment actually created Skynet and it is probing us.


I wanted to provide an update on the PagerDuty service. At this time we have been able to restore the service by migrating to our secondary DNS provider. If you are still experiencing issues reaching any pagerduty.com addresses, please flush your DNS cache. This should restore your access to the service. We are actively monitoring our service and are working to resolve any outstanding issues. We sincerely apologize for the inconvenience and thank our customers for their support and patience. Real-time updates on all incidents can be found on our status page and on Twitter at @pagerdutyops and @pagerduty. In case of outages with our regular communications channels, we will update you via email directly.

In addition you can reach out to our customer support team at support@pagerduty.com or +1 (844) 700-3889.

Tim Armandpour, SVP of Product Development, PagerDuty


I had the privilege of being on-call during this entire fiasco today and I have to say I was really really disappointed. It's surprising how broken your entire service was when DNS went down. I couldn't acknowledge anything, and my secondary on-call was getting paged because it looked like I wasn't trying to respond. I was getting phone calls for alerts that wasn't even showing up on the web client, etc. Overall, it caused chaos and I was really disappointed.


"It's surprising how broken your entire service was when DNS went down." lol


How does the service you're responsible for work when DNS stops functioning?


Hopefully you have a nice SLA with them.


I appreciate the update, but your service has been unavailable for hours already. This is unacceptable for a service whose core value is to ensure that we know about any incidents.


Given that a large swath of SaaS services, infrastructure providers, and major sites across the internet are impacted, this seems harsh. Are you unhappy with PagerDuty's choice of DNS provider, or something else they have control over? I don't think anyone saw this particular problem coming.


A company that bills themselves as a reliable, highly available disaster handling tool ought to know better than to have a single point of failure anywhere in its infrastructure.

Specifically, they shouldn't have all of their DNS hosted with one company. That is a major design flaw for a disaster-handling tool.


I'm not using the service, but I'm curious what an acceptable threshold for this company is. Like, if half the DNS servers are attacked? If hostile actors sever fiber optic lines in the Pacific?

I ask because my secondary question, as a network noob, is was anybody prepared / preparing for a DDOS on a DNS like this? Were people talking about this before? I live in Mountain View so I've been thinking today about the steps I and my company could take in case something horrifying happens - I remember reading on reddit years ago about local internets, wifi nets, etc, and would love to start building some fail safes with this in mind.

Two pronged comment, sorry.


I'm not using the service either, but I noticed this comment [1]. It's not the first time that a DNS server has been DDoS-ed, so it has been discussed before (e.g. [2]). At minimum, I would expect a company that exists for scenarios like this to have more than one DNS server. Staying up when half of existing DNS servers are down is a new problem that no one has faced yet, but this is an old, solved one.

[1] https://news.ycombinator.com/item?id=12759653

[2] https://www.tune.com/blog/importance-dns-redundancy/


Re question #2, Amazon uses UltraDNS as a backup and seemed to be relatively unaffected by today's attack.

Re question #1, check out PagerDuty's reliability page here: https://www.pagerduty.com/features/always-on-reliability/

Namely "Uninterrupted Service at Scale - Our service is distributed across multiple data centers and hosting providers, so that if one goes down, we stay available."

It seems fair to expect them to have a backup dns too, but I am not an expert.


> is was anybody prepared / preparing for a DDOS on a DNS like this

Yes.

I have, personally, been under attack with as-large or larger than todays attacks at my DNS infrastructure and survived.


This is exactly my point.


From the perspective of my service being down, my customers being pissed, and me not being notified.. yes, maybe PD should be held to a higher standard of uptime. Seems core to their value prop.


> I don't think anyone saw this particular problem coming.

Knocking half of the web off the grid because their DNS provider is under attack? It happened recently to DNSimple.

https://blog.dnsimple.com/2014/12/incident-report-ddos/

The irony is that I noticed it when dotnetrocks.com went offline, at that time dotnetrocks was sponsored by dnsimple...


Flush your DNS like the parent said.


Flushing DNS wont do shit


pagerduty.com moved to Route53, but the TTL on NS records can be very long. Flushing (restarting, ...) whatever can cache DNS records in your infra will help to quickly pick up the new nameservers.


Not on your laptop. On your local DNS resolver.


[flagged]


Running a redundant DNS provider is expensive as all hell.

While 'expensive' is a relative term, I disagree that it's cost-prohibitive for most firms, as I looked into this specifically (ironically considered using Dyn as our secondary). The challenge isn't coming up with the funds, it's if you happen to use 'intelligent DNS' features; these are proprietary (by nature) and thus they don't translate 1:1 between providers.

In addition to having to bridge the divide yourself, by analyzing the intelligent DNS features and using the API from each provider to simultaneously push changes to both providers, you have to write and maintain automation/tooling that ensures your records are the same (or as close as possible) between the providers. If you don't do this right, you'll get different / less predictable results between the providers, making troubleshooting something of a headache.

Thus in that case the 'cost' in man effort (and risk, given that APIs change and tooling can go wrong) in addition to the monthly fee.

If all you're doing is simple, standard DNS (no intelligent DNS features), it's not as hard, and it's just another monthly cost. Since you typically get charged by queries/month, if you run a popular service you're probably well able to afford the redundancy of a second provider.


Ah so make everything redundant. Double my costs in man hours and in monetary cost. Brilliant!


Ah so make everything redundant. Double my costs in man hours and in monetary cost. Brilliant!

The sarcasm is curious. It's a business decision. Either your revenue is high enough that the monetary loss from a several-hour intra-day outage is potentially worse than the cost of said redundancy, or you don't care enough to invest in that direction (it's expensive).

Making things redundant is exactly a core piece of what infrastructure engineering is. I guess with the world of VPSes and cloud services, that aspect is being forgotten? And yes, redundancy / uptime costs money!


When your service literally says it exists to help provide uptime, redundancy makes sense.


Your automation should be handling creating/modifying records in both providers. Also, if you're utilizing multiple providers you don't need to pay for 100% of your QPS (or whatever metric is used for billing) on every provider, only 50% for two or 33% for three. You can just pay for overages when you need to send a higher percentage of your traffic to a single provider.


A lot of providers do have 'fixed' portions of costs, so, it won't be quite 1/2 or 1/3rd.

It may, at scale, be like 100% (one provider), 55%+55% (two) and 40%+40%+40% (three). Still eminently affordable.


Really?

Route53 on AWS is $0.50/zone and $0.40/million queries. API integration is also very easy.

Using something like Route53 as a backup is significantly cheaper than suffering from the current Dyn outage.


That is not helpful if you want vanity name servers



I assume your clients would prefer working nameservers over vanity ones. Especially if you are in a critical business like PagerDuty.


Latest github NS moved to awsdns

        $ dig -tNS github.com @8.8.8.8

        ; <<>> DiG 9.8.3-P1 <<>> -tNS github.com @8.8.8.8
        ;; global options: +cmd
        ;; Got answer:
        ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15616
        ;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 0

        ;; QUESTION SECTION:
        ;github.com.                    IN      NS

        ;; ANSWER SECTION:
        github.com.             899     IN      NS      ns-1283.awsdns-32.org.
        github.com.             899     IN      NS      ns-1707.awsdns-21.co.uk.
        github.com.             899     IN      NS      ns-421.awsdns-52.com.
        github.com.             899     IN      NS      ns-520.awsdns-01.net.
        github.com.             899     IN      NS      ns1.p16.dynect.net.
        github.com.             899     IN      NS      ns2.p16.dynect.net.
        github.com.             899     IN      NS      ns3.p16.dynect.net.
        github.com.             899     IN      NS      ns4.p16.dynect.net.

        ;; Query time: 32 msec
        ;; SERVER: 8.8.8.8#53(8.8.8.8)
        ;; WHEN: Fri Oct 21 13:01:48 2016
        ;; MSG SIZE  rcvd: 248
But my local copy is still on dynect

        $ dig -tNS twitter.com

        ; <<>> DiG 9.8.3-P1 <<>> -tNS twitter.com
        ;; global options: +cmd
        ;; Got answer:
        ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62729
        ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 4

        ;; QUESTION SECTION:
        ;twitter.com.                   IN      NS

        ;; ANSWER SECTION:
        twitter.com.            75575   IN      NS      ns3.p34.dynect.net.
        twitter.com.            75575   IN      NS      ns4.p34.dynect.net.
        twitter.com.            75575   IN      NS      ns1.p34.dynect.net.
        twitter.com.            75575   IN      NS      ns2.p34.dynect.net.

        ;; ADDITIONAL SECTION:
        ns3.p34.dynect.net.     54698   IN      A       208.78.71.34
        ns4.p34.dynect.net.     81779   IN      A       204.13.251.34
        ns1.p34.dynect.net.     8544    IN      A       208.78.70.34
        ns2.p34.dynect.net.     54775   IN      A       204.13.250.34

        ;; Query time: 0 msec
        ;; SERVER: <local>
        ;; WHEN: Fri Oct 21 13:02:14 2016
        ;; MSG SIZE  rcvd: 179


Your local copy is also twitter, instead of github :)


I believe you don't understand DNS. It's probably the most resilient service (granted it's used correctly). There's nothing inherent in the protocol that would prevent them to use multiple DNS providers.

> Running a redundant DNS provider is expensive as all hell.

What makes you think that?


Sorry if this sounds dickish, but renting 3 servers @ $75 apiece from 3 different dedicated server companies in the USA, putting TinyDNS on them, and using them as backup servers, would have solved your problems hours ago.

Even a single quad-core server with 4GB RAM running TinyDNS could serve 10K queries per second, based on extrapolation and assumed improvements since this 2001 test, which showed nearly 4K/second performance on 700Mhz PIII CPUs: https://lists.isc.org/pipermail/bind-users/2001-June/029457....

EDIT to add: and lengthening TTLs temporarily would mean that those 10K queries would quickly lessen the outage, since each query might last for 12 hours; and large ISPs like Comcast would cache the queries for all their customers, so a single successful query delivered to Comcast would have (some amount) of multiplier effect.


That's not how that sound be done. Just use a mix of two providers. Using your own servers and TinyDNS is silly for million/billion dollar companies.

See MaxCDN for example who uses a mix of dns providers (AWS Route53 and NS1):

    ns-5.awsdns-00.com.   ['205.251.192.5']   [TTL=172800] 
    ns-926.awsdns-51.net.   ['205.251.195.158']   [TTL=172800] 
    ns-1762.awsdns-28.co.uk.   ['205.251.198.226'] (NO GLUE)   [TTL=172800] 
    ns-1295.awsdns-33.org.   ['205.251.197.15'] (NO GLUE)   [TTL=172800] 
    dns1.p03.nsone.net.   ['198.51.44.3']   [TTL=172800] 
    dns2.p03.nsone.net.   ['198.51.45.3']   [TTL=172800] 
    dns3.p03.nsone.net.   ['198.51.44.67']   [TTL=172800] 
    dns4.p03.nsone.net.   ['198.51.45.67']   [TTL=172800]
Curious, are you the kind of person that runs their own smtp email server and complains about GitHub pricing being too expensive?


No tool is silly as long as it does the job adequately. Are paperclips silly for a billion-dollar company?

If both Dyn and R53 go down, it's exactly when you want a service like PagerDuty work without a hitch.


You're asserting that your (or their) homegrown DNS service will have better reliability than Dyn and Route53 combined. That assertion gets even worse when it's a backup because people never, ever test backups. And "ready to go" means an extremely low TTL on NS records if you need to change them (which, for a hidden backup, you will), and many resolvers ignore that when it suits them, so have fun getting back to 100% of traffic.

Spoiler: I'd bet my complete net worth against your assertion and give you incredible odds.

Golden rule: Fixing a DNS outage with actions that require DNS propagation = game over. You'd might as well hop in the car and start driving your content to people's homes.


Idea: Chaos Monkey for DNS outages


I don't know how big PagerDuty is; IIRC over 200 employees, so, a decent size.

I was giving a bare-minimum example of how this or (some other backup solution) should have already been setup and ready to be switched over.

DNS is bog-simple to serve and secure (provided you don't try to do the fancier stuff and just serve DNS records): it is basically like serving static HTML in terms of difficulty.

That a company would have a backup of all important sites/IP addresses locally available and ready to deploy on some other service, or even be built by hand via some quickly rented servers, is I think quite a reasonable thing to have. I guess it would also be simple to run on GCE and Azure as well, if you don't like the idea of dedicated servers.


Not neccesarily. Granted this is how I would configure a system (two providers), but it is just as sensical to use one major provider which falls back to company servers in the event of an attack like this. It is all in sysadmin preference, while it is smart to relegate low-level tasks to managed providers it is also smart to have a backup solution that is under full control just in case that control needs to be taken at some point in time.


That would be a quick fix similar to adding another NS provider. Of course if dyn is out completely they might not have their master zone anywhere else. Then it's similar to any service rebuilding without a backup.


+1 for using a mix of two providers. That's what we do at my startup. Never had a problem since (knock on wood).


+1 for TinyDNS.

I just wish it scaled to multiple cores :(


[flagged]


[flagged]


[flagged]


You can't comment like this on Hacker News. Please read the guidelines:

https://news.ycombinator.com/newsguidelines.html


"Challenges" is exactly the sort of Dilbertesque euphemism that you should never say in a situation like this.

Calling it a "challenge" implies that there is some difficult, but possible, action that the customer could take to resolve the issue. Since that is not the case, this means either you don't understand what's going on, or you're subtly mocking your customers inadvertently.

Try less to make things sound nice and MBAish, and try more to just communicate honestly and directly using simple language.


Running multiple DNS providers is not actually that difficult and certainly not cost prohibitive. I am sure after this, we will see lots of companies adding multiple DNS providers and switching to AWS Route53 (which has always been solid for me).


How am i meant to see twitter status updates when twitter is down?


Please check our status page as an alternative method for updates. Unfortunately, it's also been encountering the same issue so we're sending out an email with the latest updates.


I didnae get an email


It's still a work in progress. If you have any immediate issues please contact us at support@pagerduty.com or (844) 700-3889.


PagerDuty outage is the real low point of this whole situation. Email alerts from PagerDuty that should have alerted of the outage in the first place, only got delivered hours later after the whole mess cleared out.


The outage started more than eight hours before you posted this message..


To be fair, Hacker News isn't exactly the first place where companies post status messages. I actually applaud him for posting his message here.


I'm a GitHub employee and want to let everyone know we're aware of the problems this incident is causing and are actively working to mitigate the impact.

"A global event is affecting an upstream DNS provider. GitHub services may be intermittently available at this time." is the content from our latest status update on Twitter (https://twitter.com/githubstatus/status/789452827269664769). Reposted here since some people are having problems resolving Twitter domains as well.


I'm curious why you don't host your status page on a different domain/provider? When checking this AM why GitHub was down, I also couldn't reach the status page.


+1

The only way that I could check to see if Github knew they were having problems was by searching Google for "github status", and then seeing from the embedded Twitter section in the results page that there was a tweet about having problems. Twitter also being down for me didn't help the situation either.


The attack is on the DNS servers, which take names like www.github.com and resolve them to ip addresses (i.e. 192.30.253.112 for me). Their status page is status.github.com - it is on the same domain name (github.com) as the rest of the site. Normally this isn't a problem because availability is usually something going on with a server, not DNS.

In this case, the servers (DNS server under attack at Dyn) that knows how to turn both www.github.com and status.github.com into an IP address were under attack and couldn't respond to a query. The only way to mitigate this would be to have a completely different domain (i.e. githubstatus.com) and host the DNS with a different company (i.e. not Dyn).


Right, this was my point. Hosting "status.domain.com" doesn't help much when it's "domain.com" that's having the problem. I think today's event will make a lot of companies consider this a bit more.


Hiiiinnnnndsiighhhttttt!!!!! Yeaaaahhhhyeahh!

Anyway, for them to take the github.com nameservers out of the mix they would need a completely separate domain name; would you know to look there?

You can delegate subdomains to other providers, but the NS records are still present in the servers listed in the registrar. So, you'd already need multiple DNS providers.. And you wouldn't have been down. Just sayin. I'm not sure anyone rated a DNS provider of this status getting hit this hard or completely as high enough risk to go through the trouble.

It's easy enough to look at a system and point out all the things you depend on as being a risk. The harder part is deciding which risks are high enough priority to address instead of all the other work to be done.


I mean, some organizations do take precautions against this point of failure and use a separate status domain. Most don't.

https://www.dynstatus.com/ (using Route 53, at least today)

https://www.cloudflarestatus.com/ (using Dyn, ironically)


If it helps any, this link seems to work for me to reach the github status page (requires https certificate override, of course):

https://107.22.212.99/


Lots of companies use Twitter for that sort of real-time status reporting, whose own up/down status one would think is sufficiently uncorrelated... unfortunately the internet is complicated.


+1 Logical question!


This is what you can do to restore your GitHub access:

    grep github ~/.ssh/known_hosts
    sudo vim /etc/hosts
    sudo killall -HUP mDNSResponder
    ping github.com


I added

192.30.253.112 github.com

but https://assets-cdn.github.com is failing

EDIT: Use 192.30.253.112 github.com 151.101.24.133 assets-cdn.github.com

or try 8.8.8.8 DNS


Why am I being downvoted for providing useful information? I don't understand HN...


Probably because you say to edit /etc/hosts but not what the content should be.


Is it hard to guess? The output of grep isn't a hint?


…except they did though, at least if you've sshed into github at some point (which I think nearly everyone has).


If you're attempting to understand the behavior of individual users of HN as a collective, I can assure you that your initial principles are hampering you greatly.


Not sure if people aren't OK with the content but you've posted it twice, which is not really cool with most people or the guidelines.

Also probably the "hijacking top comment" part.

The other occurrence being here: https://news.ycombinator.com/item?id=12760156


May not be HN doing the downvotes my friend.


seems like the right thing to do. however the ip address itself won't respond either


Just being curious, why don't you use different DNS servers?


(I'm not Github, but I work for a Dyn customer) Using multiple DNS providers has technical and organizational issues.

From a technical perspective, if you're doing fancy DNS things like geo targetting, round robin though more A records than you'll return to a query, or healtchecks to fail out ips from your rotations, using multiple providers means they're likely to be out of sync, especially if the provider capabilities don't match. That may not be terrible, because some resolvers are going to cache DNS answers for way longer than the TTL and you have to deal with that anyway. You'll also have to think about what to do when an update applied successfully to one provider, but the second provider failed to apply the update.

From an organizational perspective, most enterprise DNS costs a bunch of money, with volume discounts, so paying for two services, each at half the volume, is going to be significantly more expensive than just one. And you have to deal with two enterprise sales teams bugging you to try their other products, asking for testimonials, etc, bleh.

Also, the enterprise DNS I shopped with all claimed they ran multiple distinct clusters, so they should be covered for software risks that come from shipping the same broken software to all servers and having them all fall over at the same time.


Most services, even if they aren't the size of Github, can't change their DNS provider on a dime.


It's not a question of switching; you can host your DNS records at multiple providers.


yup, that's what I meant. they can use different DNS providers, e.g. route53 AND dyn


Route53 doesn't allow using it as slave DNS. https://forums.aws.amazon.com/thread.jspa?threadID=56011


more accurately, they don't support the common standard methodologies for transferring zone data between primary and secondary name servers (like NOTIFY, AXFR, etc).

there is nothing stopping you from having Route53 and $others as NS records for your domains. You just have to make sure they stay consistent. Apparently from the linked discussion, there are people offering scripts and services to do just that.


Keeping Serial numbers in sync can be basically impossible.


Serial numbers don't matter if you're not using NOTIFY/AXFR.


Thats why you should have a different domainname

githubstatus.com instead of status.github.com

You could even through the domain on a free DNS service.


Maybe not, but you can store your records in a local place and push to both.

That's one of the reasons I setup a git -> Route53 setup at https://dns-api.com/


If this is consistently a problem why doesn't Github have fallback TLDs that use different DNS providers? Or even just code the site to work with static IPs. I tried the Github IP and it didn't load, but that could be for an unrelated issue.


> If this is consistently a problem why doesn't Github have fallback TLDs I don't believe this has been consistently a problem in the past. But after today big services probably will have fallback TLDs.


Another status update from GitHub: "We have migrated to an unaffected DNS provider. Some users may experience problems with cached results as the change propagates."

We're maintaining yellow status for the foreseeable future while the changes to our NS records propagate. If you have the ability to flush caches for your resolver, this may help restore access.

Latest status message: https://twitter.com/githubstatus/status/789565863649304576


I love how the White House & GH posted a statement on Twitter.. that we can't access since its down.


Twitter's working fine for me. This attack is affecting different people differently; as a DDOS, attacking a distributed system (DNS) with a lot of redundancy, it's possible for some people to be affected badly while others not affected at all.

I briefly lost access to GitHub, but Twitter has been working fine every time I've checked. Posting status messages in multiple venues helps to ensure that even if one channel is down, people might be able to get status from another channel.


I wish you guys used statuspage or at least allowed email updates for the status of GitHub services.


To get on github you can add to your /etc/hosts:

    192.30.253.113  github.com
    151.101.32.133  assets-cdn.github.com
And it seems faster than normal right (less users).

Edit; for profile pics include:

    151.101.32.133  avatars0.githubusercontent.com
    151.101.32.133  avatars1.githubusercontent.com
    151.101.32.133  avatars2.githubusercontent.com
    151.101.32.133  avatars3.githubusercontent.com
    151.101.32.133  avatars4.githubusercontent.com
    151.101.32.133  avatars5.githubusercontent.com


How about *.github.io?

Edit: saw your other reply and looked it up myself, it's 23.235.33.133


I don't think /etc/hosts will work with wildcard subdomains.


how about npm?


I was able to access everything by changing DNS as mentioned in the other posts [1].

[1] https://news.ycombinator.com/item?id=12762841

edit Of course this is if your local policy allows you to change this!


you can get the ip from a different location using this: https://www.whatsmydns.net/


So who was prepared for this? Pornhub:

pornhub.com:

    Name Server: ns1.p44.dynect.net
    Name Server: ns2.p44.dynect.net
    Name Server: ns3.p44.dynect.net
    Name Server: ns4.p44.dynect.net
    Name Server: sdns3.ultradns.biz
    Name Server: sdns3.ultradns.com
    Name Server: sdns3.ultradns.net
    Name Server: sdns3.ultradns.org
ultradns.biz:

    Name Server: PDNS196.ULTRADNS.ORG
    Name Server: ARI.ALPHA.ARIDNS.NET.AU
    Name Server: ARI.BETA.ARIDNS.NET.AU
    Name Server: ARI.GAMMA.ARIDNS.NET.AU
    Name Server: ARI.DELTA.ARIDNS.NET.AU
    Name Server: PDNS196.ULTRADNS.NET
    Name Server: PDNS196.ULTRADNS.COM
    Name Server: PDNS196.ULTRADNS.BIZ
    Name Server: PDNS196.ULTRADNS.INFO
    Name Server: PDNS196.ULTRADNS.CO.UK


Looks like Pagerduty just dumped Dyn:

pagerduty.com:

    Name Server: NS-219.AWSDNS-27.COM
    Name Server: NS-1198.AWSDNS-21.ORG
    Name Server: NS-1569.AWSDNS-04.CO.UK
    Name Server: NS-739.AWSDNS-28.NET
Pagerduty annoucement: "If you are having issues reaching any pagerduty.com address please flush your DNS cache to resolve the issue."


Github just added AWS DNS:

github.com:

    Name Server: ns2.p16.dynect.net
    Name Server: ns-1283.awsdns-32.org.
    Name Server: ns-1707.awsdns-21.co.uk.
    Name Server: ns-421.awsdns-52.com.
    Name Server: ns1.p16.dynect.net
    Name Server: ns4.p16.dynect.net
    Name Server: ns3.p16.dynect.net
    Name Server: ns-520.awsdns-01.net.


When your business depends on your infrastructure being up and running you try to prepare for anything. Then again Twitter is down so...


Twitter is still 100% on Dyn.

twitter.com:

    Name Server: NS1.P34.DYNECT.NET
    Name Server: NS4.P34.DYNECT.NET
    Name Server: NS2.P34.DYNECT.NET
    Name Server: NS3.P34.DYNECT.NET
They're probably using some geographically based DNS distribution scheme which they can't quickly move to other DNS servers.


Twilio just dumped Dyn, and is now available again.

twilio.com:

    Name Server: ns3.dnsmadeeasy.com
    Name Server: ns2.dnsmadeeasy.com
    Name Server: ns4.dnsmadeeasy.com
    Name Server: ns1.dnsmadeeasy.com
    Name Server: ns0.dnsmadeeasy.com


Amazon was Using Dyn, now they added also UltraDNS too:

  Name Server: pdns1.ultradns.net 
  Name Server: pdns6.ultradns.co.uk 
  Name Server: ns3.p31.dynect.net 
  Name Server: ns1.p31.dynect.net 
  Name Server: ns4.p31.dynect.net 
  Name Server: ns2.p31.dynect.net


Digikey just dumped Dyn, and is now back up.

digikey.com:

    Name Server: cbru.br.ns.els-gms.att.net
    Name Server: ns2.digikey.com
    Name Server: cmtu.mt.ns.els-gms.att.net
    Name Server: ns1.digikey.com


ultradns.biz has been down as well.


"ultradns.biz" is not responding to pings, but their DNS servers are responding to DNS queries properly:

    nslookup
    > server pdns196.ultradns.biz
    Default server: pdns196.ultradns.biz
    Address: 156.154.66.196#53
    Default server: pdns196.ultradns.biz
    Address: 2610:a1:1015::e8#53
    > pornhub.com
    Server:		pdns196.ultradns.biz
    Address:	156.154.66.196#53
    Name:	pornhub.com
    Address: 31.192.120.36
Right now, if your site is in trouble, I'd suggest getting UltraDNS service and AWS DNS service, and some obscure service as well, and put them them all in your domain registration. DNS service is cheap. Get some redundancy going. We have no idea how long this DDOS attack will last. It's not costing the attackers anything. They might leave it running for days.


I was not aware of the attacks going on until this happened:

1. Tried to download "Unknown Horizons" (game featured recently on Hacker News) binary, github-link doesn't work.

2. Think "Ok, might be an old link", google their github-repository, github appears down.

3. Try accessing github status website, is down.

4. Interested, try to visit github status twitter account, twitter is down.

Really weird experience, normally at least the second source of news on a downed website I try during an attack works.


Had a similar experience. When I went to confirm on twitter, that was down too. I was able to acces Twitter from my phone though, where I found a ton of tweets saying "Twitter down!". Strange.


Funny how we're able to tell that twitter's down on twitter, but here in Brazil, when whatsapp was down, nobody could use their own whatsapp to ask or tell if whatsapp was down.


According to Fortune, Hacker News "reported" on the incident. Are we journalists now?

"Popular tech site Hacker News reported many other sites were affected including Etsy, Spotify, Github, Soundcloud, and Heroku." -- http://fortune.com/2016/10/21/internet-outages/


No, not necessarily journalists; rahter, an information source...Fortune - a site/company known for journalism/reporting - now just gave HackerNews more legitimacy as an official information source...Now with this power, please use it responsibly. ;-)


Too bad that the majority of the readers will think that HackerNews is somehow related to the "Hackers" that took down the internet.


Eh the name implies (at least in that context) this website would be used to keep track of the hackers.


No but we are a hivemind of smart individuals that correctly upvote important information and downvote irrelevant information. Most of the time, you can be sure that the top HN listings are going to be relevant.


If a headline says, "Users reported having bouts of explosive diarrhea", it doesn't make them medical journalists.


"Reported" is a generic word; it has a journalism-specific usage but also a general usage.


Perhaps they're confusing Hacker News with thehackernews.com.


I wonder if they used "Hacker News" as the source on that because it contains the word "hacker" and they wanted to say "oh, hackers read this site"... as in "the blackhat, break into stuff, steal your money, deface your website" folks.


Very funny guys, can you stop now? We have a demo in 4 minutes.


Maybe run in localhost bro


Or bra


How did the demo go?


Well, the parts that relied out outside services hooked up via SSO were not demoed, but majority of it worked fine because demo server was misconfigured to not actually rely on the external services. It is pretty funny.


It's Friday. Story/short write up appreciated


I am a bit paranoid about disclosing details, but basically our SAML IDP was down, so the sales person couldn't log in at all. I was messing with the demo server to convince myself that it is 100% IDPs fault and we can't do anything about it, and discovered to my surprise that the form-based authentication was not disabled on it (normally our servers are in one mode or the other, but not both, even though this is an artificial separation). So I gave them the direct link to the form based entry point and most of the demo could be done.


Like all live demos.


my condolences


We had a demo at the exact same time. (Internal weekly product demo, not that critical). We did it on local host, the only one reliable 100% of the time.


I find having a demo video backup is always a good idea.


I had a client ready to pay. Freshbooks was down.


I can't currently get resolution on www.paypal.com.

$ dig @8.8.8.8 www.paypal.com

; <<>> DiG 9.8.1-P1 <<>> @8.8.8.8 www.paypal.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 17925 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION: ;www.paypal.com. IN A

;; Query time: 29 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Fri Oct 21 12:35:33 2016 ;; MSG SIZE rcvd: 32


Chiming in as a TWC user on the West Coast:

Github out, Etsy out, Paypal out, Twitter out, Soundcloud out, Crunchbase out, Heroku out, Spotify intermittent, Netflix only loads a white page with plaintext "who's watching" list and no functionality.


And it's back again. I'm on AT&T in Atlanta.

  $ dig @8.8.8.8 www.paypal.com

  ; <<>> DiG 9.8.1-P1 <<>> @8.8.8.8 www.paypal.com
  ; (1 server found)
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40999
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0

  ;; QUESTION SECTION:
  ;www.paypal.com.      IN  A

  ;; ANSWER SECTION:
  www.paypal.com.    266  IN  CNAME  www.paypal.com.akadns.net.
  www.paypal.com.akadns.net. 29  IN  CNAME  ppdirect.paypal.com.akadns.net.
  ppdirect.paypal.com.akadns.net.  299 IN  CNAME  wlb.paypal.com.akadns.net.
  wlb.paypal.com.akadns.net. 29  IN  CNAME  www.paypal.com.edgekey.net.
  www.paypal.com.edgekey.net. 20  IN  CNAME  e3694.a.akamaiedge.net.
  e3694.a.akamaiedge.net.  19  IN  A  23.73.8.114

  ;; Query time: 146 msec
  ;; SERVER: 8.8.8.8#53(8.8.8.8)
  ;; WHEN: Fri Oct 21 13:05:48 2016
  ;; MSG SIZE  rcvd: 198


Paypal and others still down for me at University of California (I think we're our own ISP?)


I'm in New-York too and can't resolve Paypal, Etsy, Soundcloud, Github, Netflix, Heroku or Twitter


I'm in NYC too. Github.com is resolving/working fine. Netflix.com is resolved but all assets (probably) weren't loading. Additionally Zendesk is also affected.


NYC, fios: github, twitter, soundcloud, heroku back up for me. Tunneling through an ec2 instance on us-east-1d gives the same results - can't find anything that is unreachable now.


Github, Twitter, Quora down in Williamsburg. But Gitlab, Stackoverflow is not


github is down for me in LES


I'm in midtown on TWC. All the things mentioned are down.


Quote from the status page:

> This attack is mainly impacting US East and is impacting Managed DNS customers in this region.

I'm in Italy, using my provider's default DNSs (not Google) and I can't reach paypal.com, thenextweb, twitter, spotify etc either.


Interesting. 8.8.8.8 is not able to provide me with a record. However ns1.p57.dynect.net and ns3.p57.dynect.net give an answer where as ns2.p57.dynect.net and ns4.p57.dynect.net hang.

Shouldn't 8.8.8.8 query another name server if one fails to respond or takes too long?

EDIT: Is there an inherent flaw in how secondary records are queried? And as bhauer mentioned, is there a possibility to fall back to last known record?


Exactly the same right now from Spain. It seems the attackers are now targeting Dyn's European servers.


Me either. I'm in NYC on TWC. I can't resolve PayPal with Google's DNS or with TWC dns servers.

Also, I'm unable to resolve Twitter on TWC dns.



I am confused. Are so many big websites using Dyn, or does Dyn have some special role in the DNS chain in the US?


They sell premium services, have a large sales team, and are very aggressive. I get emails from them weekly discussing millisecond savings of their DNS solutions and the value increase in customers and sales.

Squeaky wheels get grease and their sales team squeaks a lot.


Realistically they compete with Neustar which is shockingly expensive and has less features and is harder to use.

I chose Dyn over Neustar (UltraDNS) when it was time to renew contracts because it was 60% cheaper, had a better latency, their support was great and the interfaces were clear.

Not a fanboy or anything, I really don't like how aggressively they hound me now (even though I have nothing to do with DNS for my current employer), but it's cheap and effective so it's not surprising people use them.


It seems that many people who have dealt with them have a story about their overly aggressive nature, me included. It's really a turn off.


they are cheap compared to Neustar. And Neustar is priced like a Bugatti. Dyn is more Porsche pricing.


Way, way back in time, they offered lifetime DNS hosting for a relatively low price.

I bought that, and they've honored the deal. Admittedly it comes with limits that would make it useless for any large site, but it's just great for individuals.


I have a similar deal with UltraDNS. Nice to have "enterprise" DNS for my little personal sites.


Check out NS1.


I had a NS1 demo account. And then they stopped doing that, but it still worked. And then I lost the credentials, and now my account is invalid for a password reset :(


[flagged]


And yet they are not as widely use. I wonder why or am i missing something important.


Boy, you aren't kidding about an aggressive sales team. They are relentless.

Ironically, a quick search of my Gmail mailbox came up with this gem in the subject line from Dyn.

"Did you know the average cost of a single DDoS outage is $882K?"


They're widely used because they were one of the few providers of geo-aware DNS service for a long time. (These days there are other, cheaper options, including Amazon Route53.)


It's pretty wild that AWS is having impacts for the dyn outage given the shoutout to Route53. Wonder what they know that they don't?


Not sure if people are confusing Amazon with AWS in this case.

$ dig +short ns amazon.com

ns1.p31.dynect.net. pdns1.ultradns.net. ns4.p31.dynect.net. pdns6.ultradns.co.uk. ns3.p31.dynect.net. ns2.p31.dynect.net.


same seems to be the case for parts of AWS, comment below: https://news.ycombinator.com/item?id=12760165


Other option would have been Akamai's geo-aware DNS that has been available for a long time.


If anyone else is like me they assume Akamai = Expensive.


and they would be right.


Dyn has no special role, they just provide DNS for a large number of high-traffic sites and services.


They offer Anycast and have POPs around the globe. They also have some other nice features such as intelligent failover and extra GEO IP features. Things that you would otherwise have to build yourself.

They have been around a long time. For years they had a free product called DynDNS that would allow you get an A record for your dynamic IP at home.


I'm updating a list of confirmed outages as I see them here https://news.ycombinator.com/item?id=12759520

So far twitter, etsy, soundcloud, spotify, github, pagerduty...crazy that this can even happen


I'm surprised; I would have thought such large sites would use more than one DNS provider? I mean:

    $ host -t NS twitter.com
    twitter.com name server ns4.p34.dynect.net.
    twitter.com name server ns3.p34.dynect.net.
    twitter.com name server ns2.p34.dynect.net.
    twitter.com name server ns1.p34.dynect.net.
I would have expected at least one of those to be somewhere else. What is the reason they would not have a backup provider?


I know a lot about some things, but almost nothing about networking, so excuse me if this is a really dumb question but - would your physical location determine what hosts you returned from that query? Like if you were in Asia would you get different ones back?


The network definitely can (and often does) do load balancing by offering different DNS results for different regions.


Yes because DNS typically uses ANYCAST networking. The DNS request routes to the nearest location.


I'd guess the reasoning is that DNS providers these days are all anycast-style DNS. A DDoS would usually just be a blip on a few servers around the world depending on where the attacks originate.

I'm not saying it's a good reason but it's a reason.


The other reason of course to use two different providers is to mitigate the automatic error propagation issue: https://twitter.com/devops_borat/status/41587168870797312


All of these work for me from Germany, and querying their authorative nameservers works just fine (so definitely no caching effect). Anycast for the win!



They don't work from Denmark, at 16:17 UTC.


Yep! Intercom is down as well


Isn't a major feature of DNS that it can be cached? Why aren't records being returned by ISP/Google DNS/OpenDNS servers? Is their TTL set that low?


They all work fine for me.


Me too. I'm in Spain - definitely seems to be geographical.


Caching is a magical thing :)


If anycast routing is in play, which is not unlikely with a DNS service like that, then it may also be that specific servers are being attacked so the outages don't affect users in all locations as some will be routed to infrastructure that is not affected.


Journalist and security researcher Brian Krebs believes this is someone doing a DDoS as payback for research into questionable "DDoS mitigation services" that he and Dyn's Doug Madory did. Doug just presented his results yesterday at NANOG and Krebs believes this is payback. Read more: https://krebsonsecurity.com/2016/10/ddos-on-dyn-impacts-twit...


If so that's a quick turnaround.


Well, Krebs sees this as an extension of the attacks that took down his site a few weeks ago after he wrote about this research. So he wrote about it, attackers take down his site. His co-author Doug Madory speaks about it, attackers take down Madory's employer's site.

Krebs indicates in an update at the end that a source had heard rumors in criminal channels that an attack against Dyn was being planned.

Doug Madory's presentation was on the agenda for NANOG and so attackers would have had plenty of time to know about it.


I'm wondering, from a regulatory perspective, what might be done to mitigate DDoS attacks in the future?

From comments made on this and other similar posts in the past, I've gathered the following:

1) Malicious traffic often uses a spoofed IP address, which is detectable by ISPs. What if ISPs were not allowed to forward such traffic?

2) There is no way for a service to exert back pressure. What if there was? e.g. send a response indicating the request was malicious (or simply unwanted due to current traffic levels), and a router along the way would refuse to send follow up requests for some time. There is HTTP status code 429, but that is entirely dependent on a well-behaved client. I'm talking about something at the packet level, enforced by every hop along the way.

3) I believe it is suspected that a substantial portion of the traffic is from compromised IoT devices. What if IoT devices were required to continually pass some sort of a health check to make other HTTP requests? This could be enforced at the hardware/firmware level (much harder to change with malware), and, say, send a signature of the currently running binary (or binaries) to a remote server which gave the thumbs up/down.


One thing that occured to me regulations wise is to require IoT devices to have some minimum level of security such as a unique hard password rather than it just being "admin" or some such. You could enforce it for items sold in the US or EU and the Chinese manufacturers would probably follow so their goods could be sold easily.


I've noticed a lot of wifi routers are doing this now. Which is pretty great. All appear to be unique passwords for each router.


> What if IoT devices were required to continually pass some sort of a health check to make other HTTP requests?

Even better, what if IoT devices were required to pass some health check to operate at all. This could be as simple as a verified boot plus a forcible reboot every now and then.


Today the peering agreements are made so that ISPs get paid for whatever traffic they pass through. They have no financial motivation to change anything. And as the Internet is decentralized you cannot order them to do anything. So everyone has to protect from DDOS by themselves.


> Today the peering agreements are made so that ISP's get paid for whatever traffic they pass through. They have no financial motivation to change anything.

That seems believable.

> And as the Internet is decentralized you cannot order them to do anything.

...that doesn't. Being decentralized doesn't render them immune to regulation. If all major networks responsible for large scale peering were required not to pass on a certain type of traffic, it would be quite difficult to route around that. Yes, if only some did, this would be routed around.


Let ISP's shutdown traffic of customers with compromised devices.


Lets avoid giving ISP's more power than they have. Next thing we will see is "oh we thought that person was using a compromised device" for any disagreement.


Regarding point 2, I can think of a few ways to utilize that mechanism itself as a way to DDoS something. Sometimes the security mechanisms themselves are the attack vectors.


Can you explain how? Not asking in a challenging way, I'd like to learn for my own edification.


Well, it's similar to when a company tries to stop brute force by blindly blocking people who try 10 invalid passwords, but they have a CSRF on the login page (cross site request forgery). The problem is that I can craft a page to make a POST to their login page with invalid passwords repeatedly via ajax, and lock out legitimate users with a spam campaign pointing to my page on their user base. It seems far fetched until you consider something global like the internet. There are two ways I could see this failing on a global scale:

- Attackers figure out something similar to the attack described above and can entice large amounts of users to visit a page that repeatedly fires at something like s3.aws.com or w/e, the user is unaware but they're essentially DDoSing s3.aws.com via attacker.com's webpage, and in point 2 they would be banned.

- DRDoS is similar to whats described above, but in point 2 it kind of stands alone as the biggest issue. it can be mitigated to a certain level by ISP's, but not entirely. (https://en.wikipedia.org/wiki/Denial-of-service_attack#Refle...) . Point 2 would actually help attackers poison DNS.


Analysis of the Mirai botnet: [1]

This is worth reading. It has links to copies of the code and names the known control servers. Quite a bit is known now about how this thing works.

The bots talk to control servers and report servers. The attacker appears to communicate with the report servers over Tor.

[1] http://blog.level3.com/security/grinch-stole-iot/


Although I don't like to to recommend Google products, they provide a provide a public DNS-over-HTTPS interface that should be useful for people who want to add specific entries into their /etc/hosts files: https://dns.google.com/query?name=github.com&type=A&dnssec=t...


"digikey.com", the big electronic part distributor, is currently inaccessible. DNS lookups are failing with SERVFAIL. Even the Google DNS server (8.8.8.8) can't resolve that domain. Their DNS servers are "ns1.p10.dynect.net" through "ns4.p10.dynect.net", so it's a Dyn problem.

This will cause supply-chain disruption for manufacturers using DigiKey for just-in-time supply.

(justdownforme.com says the site is down, but downforeveryoneorjustme.com says it's up. They're probably caching DNS locally.)


Switch to OpenDNS servers - 208.67.222.222 and 208.67.220.220. Even google NS are down it seems. Heroku works after switching to opendns.


Google's DNS has been working all day here. The problem is that Dyn's DNS server is being DDoS'd; if you request a record that the authoritative DNS server for is hosted by Dyn, then when you query Google's DNS for that record, then Google's server needs to make a query to Dyn, which is down, and thus, your query fails. But queries to Google for non-Dyn domains will continue to work just fine.

OpenDNS works because, as another poster notes, that, for better or worse, they don't strictly obey TTLs: https://news.ycombinator.com/item?id=12762429


Sorry for the confusion about saying Google NS is down :-) I meant dig heroku.com @8.8.8.8 does not work (as pointed by some other poster it is because google NS honors TTL but opendns does not)


Google's OpenDNS servers are not down, they are recursive resolvers, and are not getting data from Dyn's authoritative resolvers. OpenDNS caches DNS data for longer than other recursive DNS providers (they call it smartcache, IIRC), which is why they are still working.


Here's the link that verifies OpenDNS's addresses (if you're wary of trusting a single HN comment): https://use.opendns.com/


If you're having issues with people accessing your running Heroku apps, it's likely because you're running your DNS through herokussl.com (with their SSL endpoint product) which is hosted on Dyn.

If you can update your DNS to CNAME directly to the ELB behind it, it should at least make your site accessible.


Nice, this is working well for us too. We were able to get the CNAME of the ELB by doing a `dig whatever.ourdomain.com` in an EC2 instance we launched in São Paulo (which presumably worked since Dyn's outage is primarily affecting their east coast PoPs.)


Another option may be to switch from the SSL endpoint add-on to the new, free SNI-based SSL termination feature, which will mean CNAMEing to your-domain.herokudns.com. , which seems not to be affected by today's issues.


thanks for the tip! how did you determine the ELB address behind the ssl endpoint?

edit: figured it out. What i did was do:

nslookup your-SSL-endpoint.herokussl.com

then you'll see the elb address.

Switch to the openDNS servers helpfully pointed out by someone above first...


Presumably with something like `dig @208.67.220.220 -t CNAME <your site>.herokussl.com`. This uses the OpenDNS nameservers, that people have been reporting as working. Haven't tested it as I am on the go.


thanks! figured it out but appreciate the help!!


I'm seeing "connection timed out; no servers could be reached". Anyone else seeing that when trying to run the above command?


did you switch your computer's dns servers to openDNS?

208.67.222.222 208.67.220.220

(or specify dns server in the command)


Yes we did dig @208.67.220.220 -t CNAME <ssl-endpoint>.herokussl.com. And we got the following SERVFAIL error:

; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: <id> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION: ;<end-point>.herokussl.com. IN CNAME

;; Query time: 1226 msec ;; SERVER: <server>#53(<server>) ;; WHEN: Fri Oct 21 12:27:55 2016 ;; MSG SIZE rcvd: 44


try nslookup your-SSL-endpoint.herokussl.com

the dig command does not work for me either...

=================================

nslookup iwate-2009.herokussl.com Server: 208.67.222.222 Address: 208.67.222.222#53

Non-authoritative answer: iwate-2009.herokussl.com canonical name = elb030330-152447250.us-east-1.elb.amazonaws.com. Name: elb030330-152447250.us-east-1.elb.amazonaws.com Address: 54.225.242.254 Name: elb030330-152447250.us-east-1.elb.amazonaws.com Address: 54.225.217.226 Name: elb030330-152447250.us-east-1.elb.amazonaws.com Address: 54.235.181.244


In that case I get:

;; Got SERVFAIL reply from 10.17.100.2, trying next server Server: 10.17.100.2 Address: 10.17.100.2#53

server can't find <sslendpoint>.herokussl.com: NXDOMAIN


maybe try this and replace w/ ur sslendpoint? see if that works? this works for me.

http://network-tools.com/nslook/Default.asp?domain=iwate-200...


Ok figured it out. It was actually an issue with cloudflare being affected by the Dyn issue, rather than a heroku-ssl problem. Thanks for the help.


Just to be clear, this is a DDoS against Dynect's NS hosts, right?

I'm confused because of the use of "dyn dns", which to me means dns for hosts that don't have static ip addresses.

I'm actually surprised so many big-name sites rely on Dynect, which I hadn't heard of, but more importantly don't seem to use someone else's NS hosts as 2nd or 4th entries.


The company is just called "Dyn", DynECT is a product name, but yes.


Thanks, I was really confused when I read the title "Massive Dyn DNS outage" and how that affects Twitter or Github.


Twitter and Github are still down here in LA (and confirmed on isup.me)


They were up most of the day in here (Prague, Czech Republic, Europe), but it's down now (started about 20 minutes ago). It seems to be another wave of the attack.


Can confirm, europe was up until then


Both are down here in Utah too.



Same from Ireland.


and mixpanel and zendesk


OpenDNS servers seem the only ones that still work. Kudos.

It may not be the proper action but this kind of soft-fail scenario (use the old DNS until you can contact the DNS servers and get new ones) is much better.

  echo "nameserver 208.67.222.222" | sudo tee -a /etc/resolv.conf


AWS says "We are investigating elevated errors resolving the DNS hostnames used to access some AWS services in the US-EAST-1 Region." Is that coincidental, or are they being DDoSed also?


Apparently us-east-1 is backed by Dyn (and only Dyn) as well?

    $ host -t NS us-east-1.amazonaws.com
    us-east-1.amazonaws.com name server ns3.p31.dynect.net.
    us-east-1.amazonaws.com name server ns1.p31.dynect.net.
    us-east-1.amazonaws.com name server ns2.p31.dynect.net.
    us-east-1.amazonaws.com name server ns4.p31.dynect.net.
That's… utterly bizarre to me. us-east-2 has a more diverse selection:

    $ host -t NS us-east-2.amazonaws.com
    us-east-2.amazonaws.com name server u4.amazonaws.com.
    us-east-2.amazonaws.com name server u6.amazonaws.com.
    us-east-2.amazonaws.com name server u3.amazonaws.com.
    us-east-2.amazonaws.com name server u2.amazonaws.com.
    us-east-2.amazonaws.com name server u1.amazonaws.com.
    us-east-2.amazonaws.com name server u5.amazonaws.com.
    us-east-2.amazonaws.com name server ns2.p31.dynect.net.
    us-east-2.amazonaws.com name server ns1.p31.dynect.net.
    us-east-2.amazonaws.com name server pdns1.ultradns.net.
    us-east-2.amazonaws.com name server pdns5.ultradns.info.
    us-east-2.amazonaws.com name server ns3.p31.dynect.net.
    us-east-2.amazonaws.com name server ns4.p31.dynect.net.
    us-east-2.amazonaws.com name server pdns3.ultradns.org.
Not that anyone should be running a service whose availability they care about solely in us-east-1 anyway…


AWS may have updated this, I now see

    $ host -t NS us-east-1.amazonaws.com
    us-east-1.amazonaws.com name server pdns5.ultradns.info.
    us-east-1.amazonaws.com name server ns3.p31.dynect.net.
    us-east-1.amazonaws.com name server pdns1.ultradns.net.
    us-east-1.amazonaws.com name server pdns3.ultradns.org.
    us-east-1.amazonaws.com name server ns4.p31.dynect.net.
    us-east-1.amazonaws.com name server ns1.p31.dynect.net.
    us-east-1.amazonaws.com name server ns2.p31.dynect.net.
    us-east-1.amazonaws.com name server u1.amazonaws.com.
    us-east-1.amazonaws.com name server u2.amazonaws.com.
    us-east-1.amazonaws.com name server u3.amazonaws.com.
    us-east-1.amazonaws.com name server u4.amazonaws.com.
    us-east-1.amazonaws.com name server u5.amazonaws.com.
    us-east-1.amazonaws.com name server u6.amazonaws.com.


Me too. Someone realized their oopsie :)


Ex-Amazonian here. I worked on the EC2 API for just over a year, and this could simply be a legacy thing. See https://en.wikipedia.org/wiki/Timeline_of_Amazon_Web_Service....

us-east-1 is the oldest region and predates Route 53. Not adding extra DNS providers to the older regions is probably an oversight.

(The EC2 API team requests load balancers from a separate load balancer team. The load balancer team probably didn't exist as a separate team when some of these regions were created.)


Same exact thing with EU-WEST-1/2.

  $ dig ns eu-west-1.amazonaws.com +short
  ns3.p31.dynect.net.
  ns1.p31.dynect.net.
  ns4.p31.dynect.net.
  ns2.p31.dynect.net.
  
  $ dig ns eu-west-2.amazonaws.com +short
  u6.amazonaws.com.
  u5.amazonaws.com.
  u2.amazonaws.com.
  u4.amazonaws.com.
  u1.amazonaws.com.
  u3.amazonaws.com.
  pdns1.ultradns.net.
  pdns3.ultradns.org.
  pdns5.ultradns.info.
  ns2.p31.dynect.net.
  ns1.p31.dynect.net.
  ns4.p31.dynect.net.
  ns3.p31.dynect.net.
I wonder why this is, considering the more extended usage that -1 on each region usually gets. :S


> Not that anyone should be running a service whose availability they care about solely in us-east-1 anyway

Don't confuse regions with availability zones. (Though in this case, the availability zones don't help...)


Oh good point, my bad.


If that were the reason I wouldn't expect this update:

6:36 AM PDT [RESOLVED] Between 4:31 AM and 6:10 AM PDT, we experienced errors resolving the DNS hostnames used to access some AWS services in the US-EAST-1 Region. During the issue, customers may have experienced failures indicating "hostname unknown" or "unknown host exception" when attempting to resolve the hostnames for AWS services and EC2 instances. This issue has been resolved and the service is operating normally.


That might explain why we are down - most of our EC2 instances are in us-east-1. Looks like Amazon SQS is impacted too. We are getting a stream of undeliverable messages, and our 'dead letter' queue is filling up!


Have some errors logged on our side of problems resolving DNS hostname for SES email-smtp.us-east-1.amazonaws.com


Anyone else spend the morning thinking the problem was their setup? I've been flushing my system DNS cache, Chrome's DNS cache, changing DNS servers, rebooting my router, turning VPN on/off, etc.


Yeah. :/

It happened to be at the same time I was getting things configured to connect to a new VPN that I hadn't used before for the first time. Until about 7am today my home network was a 10.0.0.0/8 network. VPN kept bombing in the last phase of connecting and I couldn't figure out why, so I thought it was an IP conflict with my internal network range.

So naturally, I then went into my router and changed my subnet for my entire home network to the more common 192.168.1.0/24 range to see if it'd help. It didn't. Until suddenly VPN "just worked" -- which makes me wonder if I needed to change my network at all to begin with.

Then I started experiencing all sorts of weird issues where the Internet seemed to disappear from one minute until the next.

Then I hit IRC when things finally stabilized and see "Did you hear about Dyn?".

My reaction: wut.

TL;DR: I rearchitected my home network at 7am for no reason.


Yep, I did the same.


I've been singing the praise of AWS Route53 for a long time, they up and running. I can't believe major multi-million dollar companies (Twitter, GitHub, Soundcloud, Pagerduty) would not run a mix of multiple DNS providers.

Also what is happening is a cascade effect, where a 3rd party being down effects others.


> I've been singing the praise of AWS Route53 for a long time, they up and running.

I'm a fan of Route53, too.

But can we say that it weathered the attack? Or was it just lucky that its systems weren't targeted?


One of the reasons why Route53 is good is because they give different nameservers to each hosted zone - unless you choose to use a branded record-set.

I've seen them be hit by dDos attacks in the past, but never had any significant impact.

(I wrap Route53 and handle storing DNS records in a git repository over at https://dns-api.com/ Adding support for other backends is my current priority to allow more redundancy.)


> Or was it just lucky that its systems weren't targeted?

I was wondering that too


OpenDNS DNS Servers (208.67.222.222 and 208.67.220.220) are still resolving websites while my typical fallback to 8.8.8.8 is not.


I noticed the same pattern.


Twitter, Reddit, wow. I was so confused for a moment. Thankfully HN is here to explain.


I had several, sporadic 'secure connection could not be established' yesterday while trying to open HN, amongst others. Painfully slow page load times across the board, too(Craigslist, Monoprice,weather.gov, etc) Still may be my buggy phone SIM...


wait, buggy phone sims is a thing?


Sorta. When I changed phones I cut my micro SIM down to nano size. Cut a wee bit too much off and it now can slide off contacts if jarred... gotta get a new SIM.


Seems to be impacting POPs in US East most severly. We use Ripe Atlas to assess the impact of DNS outages, and in the past hour have measured about 50-60% recursive query failure from a few hundred probes in that region: https://cloudharmony.com/status-for-dyn


Now impacting multiple regions including both US East and US West with very query failure ratios - 50-70%


Is it time for everyone to actually start using secondary name servers/DNS resolvers too from a different provider from primary? DNS _is_ built for this, for the very purpose of handling failure of the primary resolver, isn't it? Just most people don't seem to do it -- including major players?

Or would that not actually solve this particular scenario?


Yes, I think this attack has brought to everyone's attention that many companies have gone away from what used to be the extremely common practice of having your authoritative DNS serving shared across multiple DNS hosting providers. This would have addressed the issue... and we're seeing that by the end of the day many of these sites have gone to having multiple providers.


The attack is on the authoritative name servers, not a DNS resolver. A public DNS resolver will query the authoritative name server for a record if it doesn't exist in it's cache.


Agreed, but there is nothing stopping you from having the authoritative name servers for a domain with different providers. As someone previously said, DNS was designed for this.


It's used to be common for universities to do this, mine still does:

  ic.ac.uk.		45665	IN	NS	ns1.ic.ac.uk.
  ic.ac.uk.		45665	IN	NS	ns2.ic.ac.uk.
  ic.ac.uk.		45665	IN	NS	ns0.ic.ac.uk.
  ic.ac.uk.		45665	IN	NS	authdns1.csx.cam.ac.uk.
(and Cambridge use Imperial College as a secondary) but the best-known American universities are on cloud providers now.


Can you have secondary name servers too though? And would it have worked to avoid outage for domains doing such in this case?


Heroku also seems to be affected. I'm getting this when I run 'heroku status':

>> We are seeing a widespread DNS issue affecting connections to our services both internally and externally.


For me redirecting my DNS to Google public DNS 8.8.8.8 and 8.8.4.4 did the trick.


I added the following to my hosts file for today:

  #8:07 AM 10/21/2016
  199.16.156.70 twitter.com
  104.244.43.231 abs.twimg.com
  104.244.43.231 pbs.twimg.com
  192.30.253.113 github.com
  151.101.24.133 assets-cdn.github.com
after giving up on modifying DNS timeouts. https://blogs.technet.microsoft.com/stdqry/2011/12/14/dns-cl...


that's not going to help much if the authoritative name servers (which is what dyn is, btw) go down for more than a day.

Max record cache time is 86400s (24h), so if the attackers can keep it down for 24h then google will have to have custom instructions in place (or cache more aggressively than the RFC allows)


Is there any reason why Dyn has to be "down" from Google's perspective? Is it possible that the large DNS providers maintain private network between each other, such that DDoS attacks that are effective against the public are ineffective against the private network?


Since the attacked dyndns DNS servers are evidently anycast, the google server you are reaching might connect to a different dyndns server than you do. If google has luck to reach a less overloaded server, they might get an answer where you get none.


Side note:

In addition, Google Public DNS engineers have proposed a technical solution called EDNS Client Subnet. This proposal allows resolvers to pass in part of the client's IP address (the first 24/64 bits or less for IPv4/IPv6 respectively) as the source IP in the DNS message, so that name servers can return optimized results based on the user's location rather than that of the resolver. To date, we have deployed an implementation of the proposal for many large CDNs (including Akamai) and Google properties. The majority of geo-sensitive domain names are already covered.

from https://developers.google.com/speed/public-dns/faq


I was on Google dns until I joined my VPN a bit ago. The public DNS was failing to hit lots of things. (NY, USA)


Not only does it work (for now), many of the sites that are "down" are noticeably faster. :-)


less users connected


Same story for me here today (reporting from Cork, Ireland)


I'm a Verizon FIOS customer in NYC and was unable to reach nytimes.com and several other sites this morning. Switching my DNS to Google's (8.8.8.8 and 8.8.4.4) seemed to fix the problem, but I don't understand why yet.


There's a bit of exquisite irony in the fact that just yesterday an article on the Dyn blog was:

Recent IoT-based Attacks: What Is the Impact On Managed DNS Operators? - http://hub.dyn.com/traffic-management/recent-iot-based-attac...

It's a good piece about how IoT-based DDoS attacks are carried out. And now Dyn has the answer...

HN thread about that article at: https://news.ycombinator.com/item?id=12764650


Is Zendesk being affected? Their status page is reporting external DNS provider is having DNS issue [1] and most of their sites are being affected.

[1] https://status.zendesk.com/


Yes, they were affected.


Microsoft's visualstudio.com's build servers fail to resolve Github and New Relic. So much for my Friday night deploy to staging.


Is it really an internet wide outage?

Only 2 of the points in the US are affected on https://www.whatsmydns.net/ for the domains we've got on Dyn - same for Twitter etc


If it's under a denial-of-service it's possible that it may respond correctly part of the time.


Since many (all?) of Dyn's authoritative server IPs are anycast, attack traffic is probably not well distributed either. If you're routed to a server that's getting a lot of attack traffic, you're likely to have problems, but a server without much attack traffic will work fine.


"Widespread" might be a better term.



Any quick script to see if a given domain ultimately resolves to them? My SaaS company has a lot of custom domains from whatever DNS servers pointed at us and I'd like to be able to tell people whether it's our fault or not.


`dig NS $domain`

Query for the root domain, without any subdomains like www. That is, you need to check the "zone apex," the shortest name purchased from a registrar and potentially delegated to Dyn. Look for dynect.net in the list of authoritative name servers.


Yeah, I tried that. I don't see dnyect in a lot of domains that are failing, and it's clearly related somehow, they didn't all break at the same time by coincidence.


`whois $domain`.

But it should be "obvious" if your users report "Server not found" vs. "Cannot connect" or "Page not found" style errors.


Let's assume, that foreign countries such as Russia or China would be trying to sabotage our elections on Nov 8th night. What are the severe economic and political backlash that we can deal with if we cut off the traffic coming in from those region (not in a "we control the internet" kinda way)? I am sure they already have nodes operating within the USA. A lot of major tech companies use CDNs that can still serve traffic globally to the consumers of those countries. Even better, how about we regulate and slow down all of incoming traffic for say half day on election day? Is it even possible?


But then why would they be doing it right now? I'm sure they already know if they can do it or not. I don't think they need to do a large scale test run that would put people on high alert. They'd keep their head down until election day.

But then, what is China or Russia going to get out of doing something like this? It isn't going to change anything. Hillary is the next president regardless. Hell, even if no votes could be counted I am sure the Supreme Court wouldn't have a problem calling it for her.

So to me the idea that China and Russia is doing this for political reasons doesn't make any sense.


Almost every website I visit except HN seems to be down...


Same here, that's odd isn't it ;)


well, first thing one does to use CloudFlare is migrate to their DNS. A CF-hosted site isn't going to be using Dyn...


Dyn reporting another attack started at 15:52 UTC.



DNS was designed so that you can have multiple operators for your authoritative name servers.

Who would have thought adding a spof to your infrastructure would ever be a problem?


Is it just me or are these kind of attacks becoming way more frequent recently? This kind of widespread outage seems so new, but again, that might just be me.


Damn, I've spent the past 30 minutes trying to update my DNS and playing with my router config! :)

No GitHub, well, it's gonna be a fun Friday...


putting something like

    192.30.253.113 github.com
into your /etc/hosts (or other appropriate location for your OS) should get you going again.


Careful...some people may be trying to get out of work today.


Just remember to take it out when this is resolved. IPs shuffle around all the time, and that IP might not always work for the main github.com domain.


Isn't it the whole idea of git to make user independent from place like github? After all if you have the repository on your local machine you can continue working as if nothing happened, right?


Isn't it the whole idea of git to make user independent from place like github?

Yup. But the whole point of GitHub is to make you dependent on GitHub.

They've been slightly successful.


Build server (e.g. CircleCI, Travis, etc), reviewing/merging pull requests, accessing 3rd party library READMEs. You know, doing software stuff.


Most of this stuff you should have locally, and not depend on 3rd party websites.


I'm so damn tired of the "host it locally" mentality. Not everyone has the resources to host all of that locally.

For example, most open source projects.

But even outside of that, we use github for issue tracking, the new project management kanban stuff, a CI server, reading documentation (which is offline, but the online versions are nicer on the eyes), and a ton more. Not to mention that StackOverflow and other discussion forums tend to be used by many.

Yes you can host all of that locally, but we don't have a few hundred thousand a year to spend on some sysadmins to maintain all of that, and we don't have the time or money to run the machines, vet the software, and keep it up more reliably than github does for next to nothing.


And I'm so damn tired of people complaining about cost to run stuff locally. The true cost of not having some basic stuff setup locally, even for backup purposes is when situation like this happens. It does not take long time or resources to download all of the libraries, with corresponding docs to a local server, or even your laptop. It is not complicated to have all of the new issues sent to an email to have a version of them available at all times. And you don't need a sysadmin to administer all of that.


>And you don't need a sysadmin to administer all of that.

I disagree with that. If you have a server, you need a sysadmin. End of story.

Who is going to secure the system and setup ssh keys? who is going to run updates? who is going to monitor for security issues? who is going to run backups? who is going to secure those backups? who is going to oversee the installation of the network, the battery backups, the racks, the server hardware, etc... Who will swap out bad disks? Who will recover the system when it goes down? Who is going to double the hardware and setup high availability (remember, you are competing with github for uptime here)? And god help you if you have one guy that does all of this. What happens if he gets hit by a bus?

An on-prem server isn't a "backup", it's a liability. And without the resources to maintain it, it's going to become a nightmare. I've been there, and I won't ever do it again.

I'm either going to pay to do it right, or give it to someone who will. And if that means a few hours of downtime every year or so, then that's a wonderful tradeoff for me.

> It does not take long time or resources to download all of the libraries, with corresponding docs to a local server, or even your laptop. It is not complicated to have all of the new issues sent to an email to have a version of them available at all times.

Luckily github (and alternatives) provide all of that. It sends us emails and slack messages on everything, so if it's down, we can still read, and we all have our local repos. But reading is different than working.


If you have any substantial business, you already have a sysadmin on your team. He's not doing his job if he has no local versions of almost everything that is online. He should be staging everything locally, before deploying to the cloud. The currently very popular way of deploying everything live, without any testing, or staging is one of the reasons behind current crappy state of the internet.


I disagree, with very large companies, you have no "local" sysadmin, and no local versions of anything. Especially if your IT department is actually its own company.


Source/proof ?


This is a DNS outage.

If self-hosted, somewhere, you could still be screwed by having Dyn as your DNS provider.

If dev-machine-hosted, then uh, your issue tracker is no longer an issue tracker. Your build server is not a build server. All the services besides Git are not meant to operate offline in a decentralized/distributed fashion.

Library documentation, sure, that could be local. Otherwise, your assertion that all of this tools infrastructure can somehow be replicated, easily, in a way that makes the difference between working online or offline effectively zero, is nonsense.


First of all I'm not saying that the whole infrastructure could be replicated, only critical parts, and parts that can be easily hosted locally.

Second, pointing to a new machine is as simple as updating IP in your hosts file, or dns server.

Third, you can use vmware or any other virtualization stack to replicate your infrastructure locally. In fact that's the best way to build things - create virtual network, use it for testing, troubleshooting and debugging, and deploy only when everything is working.

All I'm saying is that if you're company is making any kind of money, and your development environment depends 100% on online services, you're doing it wrong.


Not that I'm saying that you're completely wrong, but you're oversimplifying the problem and the solution to it. Your last statement is not necessarily true; you have to balance the cost and how much of a PITA it is to set up and maintain vs how much money would be lost by disgruntled customers for a single rare outage. Granted, it depends on the kind of business you are running, but not every business is so fragile. In fact, I'd say that most(as in >50%) are not that fragile. Even better when you can simply put the blame on someone else, which everyone can do in our case right now.


I'm under the impression that you say self-hosted ~= dev-machine-hosted ?

If that's the case, i think you're misguided : imho, the internet as it was designed was conceived so that everyone has its little self-hosted thing, with dev-machine just for the purpose of, well, test and dev, with the latter goal of it being self-hosted.

Just look how email is technically designed and how it was meant to work, and we use it now, relying mostly on Gmail or Outlook, or worse, using Facebook for emails : we put all our eggs in the same basket.


If I were running a business, here are the options as I'd see them:

Option A) Spend no money and experience an outage maybe once a year, if that. And the problem works itself out.

Option B) Spend money and gain technical debt to avoid a problem that happens maybe once a year, if that.

Which one would you pick? I mean, maybe if everything you have is closed-source or you are guaranteeing 99.9% uptime to your customers, perhaps option B makes sense. Otherwise, the choice seems fairly obvious to me.


What's your source that you "experience an outage maybe once a year" ??

I work in IT infrastructure and I see attacks literally every day, moreover, most people just setup a quick LAMP or MEAN stack to prove their concept, and then they leave it like that, so most of the time, no, the problem don't just "work itself out".


They should do it once a year and call it Friday without Internet Day.


Better yet, have one day a year that is "Red Team Day" where people hunt for vulnerabilities so that assessments can be done, and companies can later fix any issues noted. Like how earlier this week there was a statewide earthquake drill in California, local emergency sirens were sounded, schoolkids practiced hiding under desks, etc. The Internet needs periodic tests like that too.


I absolutely love this idea! Would be a bit tricky to implement, but would definitely improve security in the long run.


In (well, after) attacks like this, and really any other massive DDOS, shouldn't it be possible to identify potential botnets and try to take them out (notify their owners that they're being used, notify their hosting providers, etc) so that they can't be used again in the future?


Quick question for you all. Just two days ago I registered two domain names at dynu (not dyn). Early this morning I a cold call from a company in India who knew the domain names and my phone number and was calling to ask if I wanted them to help me manage my website cheaply. Also, this morning I got a spam text from someone who claimed to by godaddy offering the same thing. Now I protect my number really well so this is the first time in 5+ years that I ever got spam texts or calls to my number. Do you think Dynu was also hacked?! Or maybe Dynu sells client numbers (which is how the guy in India claimed to get my number) and it was just by random chance that this happened at the same time as the Dyn hack.


Agree with shortstuffsushi that this is just someone getting your domain name info and spamming you. It sadly happens all the time.

Go to http://whois.icann.org/en and enter your domain name and see what info is public about you. If all your info is public, you may want to see if your registrar offers "private" registration where your info does not appear in WHOIS.


Fwiw, this isn't a hack, this is a DDoS (denial of service). It seems almost certain that your information was either given out by dynu, or your WHOIS record isn't protected. Check your domains out with your favorite WHOIS tool first. Otherwise... time for an awkward conversation with dynu.


Right, hack wasn't the right word. Anyway, thanks for the info.


I've been having the same problem accessing github in particular. Just for fun, I opened the Opera browser and activated the built-in VPN. That got everything going again. At least for browsing, not so useful for my git pulls and pushes.


Can someone explain why this is so bad? I think the internet handled the downtime of Dyn pretty great, not reaching github wasn't exactly pleasing, but i added the ip temporary to /etc/hosts and the problem was solved. Isn't the best strategy to accept that attacks will continue and systems may go down and design for resilience? If so this attack can serve as a warning and as a check that we can handle these types of attacks. I am a bit exaggerating, but i would imagine that constant attacks keep the internet resilient and healthy. An unchallenged internet may be the greater risk.


You're assuming we'll build immunity fast enough. What if we don't?

Attacks at this scale can bring a significant part of Internet down. The economic affect can be just as bad as a war.


but just pretending no one will try bad things won't help either. It may speed up the progress and will prevent bringing a significant part of the internet down in the future and also remind everybody that these things can (and probably will) happen.


We were affected @WSJ as well.


The DDoS problems, at least those not related to spoofing IPs, could be curtailed if we provide a strong incentive to the ISPs to work on it.

Let's hold the ISPs financially liable for the harmful traffic that comes from their network. If a client reports a harmful IP to the ISP, every bit of subsequent traffic sent from that IP to this client carries a penalty.

Yeah, I know, routing tables are small, yada yada. If we put thumbscrews to the ISPs they will find a way to block a few thousands IPs of the typical botnet, even it requires buying new switches from Cisco & co.

Incentives drive behavior.


Put the thumbscrews on the IoT manufacturers instead, so they don't release widgets with bad security, so the problem is eliminated at its root.

You wouldn't allow car manufacturers to sell cars with faulty airbags, why do we allow device manufacturers to provide plentiful firepower for bad actors?


With ISPs it's a lot easier - you know who your ISP is, so either they respect your blacklist or they automatically owe you money.

How would you even start chasing a manufacturer of a cheap IP cam from China? How many of them can you chase at once?


Semi related: I noticed this incident right when it began, but not because I was trying to access a website. This started happening to me: http://imgur.com/PPlaY5o

Then when I went to push to github out of fear my computer was about to soil itself, that failed too, and I noticed the outage.

Does anyone know if the above errors could be related to the outage? I'm using vim inside tmux with zsh as my shell. Maybe zsh does some kind of communication with gh while running?

I restarted my computer and it's still happening


The zsh default git plugin definitely doesn't touch github, or the network in general.

Are you using some oh-my-zsh github plugin by any chance?


plugins=(git rbenv nvm gitfast zsh-autosuggestions github)


So, yes.

https://github.com/robbyrussell/oh-my-zsh/wiki/Plugins#githu...

oh-my-zsh is, imho, way overengineered and bloated. Leads to all sorts of issues like the one you're encountering there.

I would recommend sticking to a plain zshrc file that you can read, edit and fully understand.

The one I wrote and am using day to day is available here, with documentation: https://github.com/jleclanche/dotfiles


removed the github plugin, reloaded zsh and happened again 5 min later. I believe it has to do with slack, because the issue resolved itself after closing. maybe slack got pwned and all slack users are being used as part of the botnet lol


Anyone know any details of what the attack looks like ? I had a quick look in my (albeit small) network to look for odd flows going to their ASN33517, but didnt see much that looked odd on first glance...


I've managed to (seemingly) save my browsing with Yandex DNS:

    77.88.8.8
    77.88.8.1
https://dns.yandex.ru


I'm sure yandex is safe, but I'm wary of using a anything dns.*.ru to route my traffic to potential phishing versions of sites.


If it makes you more at ease, use this one :)

https://dns.yandex.com


Need to get in to dyn.com to download your zone files add this to your hosts file: 204.13.248.106 www.dyn.com 204.13.248.106 dyn.com 216.146.41.66 manage.dynect.net 151.101.33.7 static.dyn.com


While my app isn't resolved using DYN, we are relying on APIs on our EC2 backend that use their DNS. Is there a Linux DNS caching server that will serve from a local cache primarily, and do lookups in the background instead to update the local cache? During the period DYN was down, it would've continued severing from the local cache and retried the background lookups, keeping my app up. I can also see it improving performance as my servers currently do lookups to the EC2 DNS on each http request...


If you're in us-east-1 then you potentially do actually rely on Dyn even for the amazonaws.com instance hostnames.

https://gist.github.com/agh/4e20df0d2d3bfa189477569b77f72e24


Seems then that ELB has a local cache because http requests were reaching my app servers throughout the outage.


It is spreading to other DNS providers, too: https://status.fastly.com/

www.ft.com is unreachable for example.


Fastly is simply putting up a status page so they aren't contacted about issues, and letting them know it's about DYN. And they are having internal issues with communications like zendesk.


Third attack underway: https://twitter.com/AlexJamesFitz (as of 10 mins ago)


No idea if this would work, but could people theoretically just ping flood the IOT devices involved to mitigate the attack?

They run some sort of web server since most devices provide some web interface, so clearly there's a port open which could be hit if the IP is know, and with the shoddy security in these devices I'd wonder if their local (likely low performance) hardware would be susceptible to something as simple as a ping flood attack.


Boulder here. Can't resolve Wufoo or PayPal using 8.8.8.8


I thought DNS (particularly public) was basically immune to DDoS?

If one DNS server is down, use the cached result or another server.

DNS is some of the most distributable, cachable data I can imagine.


Depends on how many PoPs they have. Looks like they have 4 easter US.[0] If they are seeing large attacks that Krebs saw a few weeks ago, that could certainly be enough to take down one or two, and then causing redirected traffic to take down the other two.

I used to work for a DNS/DDoS provider, and this was a very real problem. Leave the PoPs that are being affected out, or risk overloading the other PoPs by overloading real traffic.

Before moving the other traffic, you also have to worry about blocking the DDoS traffic otherwise you're just redirecting them to the other PoPs. Mitigating DDoS attacks are not fun, and hard to block.

[0]http://dyn.com/dns/network-map/


Some sites intentionally disable that, however, by setting a short TTL on replies. The idea is usually that it allows them to very quickly adjust to hardware failures or load across datacenters but it has the consequence of making your infrastructure comparatively brittle.


Surprised to see so many big names relying on a single provider. DNS is designed to be distributed, it should be possible to avoid a single point of failure.


This question came to my mind when I saw this post. The possibility might be the management cost. Synchronize between different providers can go wrong and might hard to debug when end users get different replies.


For people in need of the IPs for their respective services. You can find them here: ipaddress.com or any of the other similar services


How can I, a proficient web developer but one with little experience working directly with its underlying infrastructure, help in whatever effort is being down to thwart this and related attacks? I feel a moral obligation to help as these attacks seem a grave threat to our economy and could cause unrest given the current political climate. Thanks.


Read all the analysis you can to form a better understanding of how this all works. Use that information to design and run more resilient services in the future. Teach what you have learned to others.


https://cloudharmony.com/status-for-dyn is now (12:43pm EDT) showing Dyn's "US East" and "US West" centers as being down. Anyone know anything about this Cloudharmony service? How often does it update? and what is it monitoring?


At work earlier we was seeing hostname resolution errors with applications trying to contact amazon s3 from on premises infrastructure.

This was in eu-west-1, but it coincided with a bunch of other systems in the organisation having problems at the same time.

Additionally CloudWatch logs seemed to be completely broken for about 30 minutes on the Amazon Console.


Here's how to add static mappings temporarily to survive through the outage:

https://www.reddit.com/r/sysadmin/comments/58o5mp/dyn_dns_dd...


And there is no twitter to tweet about it!!!


Currently I am able to get into every site on the web, including GitHub, by using a VPN service based in Hong Kong.


Those distributed alternatives look better everyday... if only there was a working group and a transitional path.


Hmm... Seems to be quite widespread. Some of our Amazon AWS services (located in the US) that rely on SQS are reporting critical errors. Intercom.io is also down at present, which we use for support for our web apps. Not looking very good from here (in Australia).


I'm getting DNS errors on my PS4 when trying to download stuff, I guess it's related!


Switching to Google's public DNS seems to have fixed it!


So I had hardcoded my DNS server to googles, aka:

    dig @8.8.4.4 github.com +short
I was not getting an answer.

However using my routers/dhcp/ISP to set my DNS server, I am able to get answers:

    dig github.com +short
    192.30.253.112


Cached locally?

dig +trace github.com


I'm curious. What kind of infrastructure you need to make this massive attack?


via [1]: "Dyn says today's DDoS are in part being caused by Mirai botnet, which recently caused record-sized attack [2]"

[1] https://twitter.com/AlexJamesFitz/status/789562789920636928

[2] https://krebsonsecurity.com/2016/10/source-code-for-iot-botn...

tl;dr: a substantial part of this attack is a botnet of IoT devices.


This may be dumb, but someone enlighten me:

If this kind of attacking does escalate, wouldn't it be possible to simply cut off requests from outside the United States at the points of entry? Basically, turning the US into an intranet?


We don't know yet, but the attack very easily could have been coming from a botnet of devices entirely inside the US. Geographic borders don't matter much at all for the Internet.


But even if it were, the creator of the botnet would first have to gain control and then issue a command, right? How would either of those things be possible from outside if there was no connection into the US?


Well, there's pretty much no way to impose the geographic borders of the US onto the Internet. Our networks here in the US are all global and integrated with other networks all around the world. The only places where this kind of geographic control is possible are countries like Iraq, Iran, China and others where the government controls all the ISPs. Countries with more freedom have a free flow of information and packets - and to me that is a very GOOD thing.


What this event shows is that using DNS as a load routing/balancing mechanism is a bad idea (because that's why folks have low TTL and an inability to specify truly redundant secondaries).


Not sure if related but circleci.com is down for us do to a "DNS issue" !


Definitely related. Can confirm.


Interesting. Lots of sites have been down for me, here in Mexico City. Twitter. Github. Loads of other random sites. When I turned on my US based VPN. It all started working again.


Why is there even a concept of managed DNS ? Arnt we already paying >$1M/yr so that we can get 32 bit integer from a string ? This does not make sense.


How come you can access these sites from some countries? I imagine there are lots of name servers and that the attackers are specifically targeting servers for US?


It's a strange coincidence that Hover DNS was down for same reason a week ago.

http://hoverstatus.com


Looks like github and braintree both got AWS dns servers mixed in about the same time. Did they both switch over or is Dyn working with AWS on this?


How many DNS services ala Dyn exist? Is it not still massively significant that a successful attack can be launched on even one of these?


Twitter and GitHub is down on Scaleway (AS12876) and Tiktalik (Warsaw, Poland, Europe, AS198717) network too (no response from dynect.net).


Highrise seems to be having problems, as seen by email errors when we forward email to Highrise dropboxes.


Heroku is still having problems as well


Here in Brazil things are pretty slow.

"Oh, maybe its our shitty ISP screwing up everything again."

No, it's in a bigger scale.


Github does not work for 100% the time


Weird, works for me - from Italy (not sure if there isn't just some caching going somewhere down the line and I can see it because of that) edit: nevermind, it's almost certain i've got it cached


Definitely a DNS cache on your computer (Or even in Chrome)


I can query the authoritative ns*.p16.dynect.com DNS servers from Europe (Germany in my case), and the traceroute looks like it's near Frankfurt. So the anycasted copies here seem fine.


and now these seem down as well. EDIT: and up again half an hour later


It could also be geographic. Their update did say "some customers". I'm in New York and I'm also seeing the outage.


GitHub is working for me from NYC.


It's down in NC.


Github is currently inaccessible. Can you still compile Rust programs that depend on Github files?


First of all, the only thing that'd matter is for modifying your dependencies at the moment. If you've previously built the project, and don't touch your deps, GitHub won't be hit.

Second, Cargo only depends on GitHub for the index. For more: http://integer32.com/2016/10/08/bare-minimum-crates-io-mirro...

That includes a link to a mirror run by integer32.


If you need to, you can edit your hosts file to include github's IP addresses (mentioned elsewhere in this thread):

192.30.253.113 github.com

151.101.4.133 assets-cdn.github.com


Explains why the Heroku API is down.


Don't be a dick. I'm sure their staff has a giant collective migraine right now.


What other providers would you recommend than Dyn? Route53? Cloudflare? Something else?


Reposting imglorp's comment on the root of the comment tree, as it's buried currently. This should restore service for those desperately needing to access Github etc ;)

> ....point your machine or router's DNS to use opendns resolvers instead of your regular ones: 208.67.222.222 and 208.67.220.220


I am very surprised this is not getting that much attention on national news.


Fascinating weak spot!


Looks like at least some of it is resolved. spotify is back


You can add Netflix to the list.

    GET https://art-s.nflximg.net net::ERR_NAME_RESOLUTION_FAILED

    GET https://assets.nflxext.com net::ERR_NAME_RESOLUTION_FAILED


Anyone having any issues with WhatsApp? Mobile text seems to work fine but all images fail, Desktop & web browser aren't connecting at the moment (west coast)


Using Google Public DNS fixed things for me.


I'm using Google Public DNS too. I don't really know if there is a relation with DynDNS but I'm still experiencing issues in GitHub and Twitter, like partial loading of images.


Perhaps Google had old (but valid) records still in their cache for a while. Google DNS was working for me for a while, and then stopped. Apparently Dyn has the problem fixed, but maybe there is some TTL based propagation delays still. I updated my internal network to use Dyn's internet guide/public DNS and the problem is fixed.

Maybe this is their strategy: we break it, you buy it ;)

https://help.dyn.com/internet-guide-setup/

If you can't load that page, the public DNS servers are: 216.146.35.35, 216.146.36.36


Github doesn't work again for me :(


Why not use:

OpenDNS - recursive DNS

Cloudflare (DNS only) - authoritative DNS

Both services are free and distributed across the world.


Dyn was supposed to be distributed too.

The takeaway here should be to use multiple authoritative DNS providers, not a single (even if better) one.


PayPal, Braintree, Spreedly down. Some companies are going to lose money today...


and its down again


and the attacker are back. DDoS v2 is here


github.com seems to be down because of this.


Shopify is down


Oo oo, I know! Iran did it!


CNN.com is knocked out by this attack as well. I could see that as a useful target.


Must be trying to stop the latest Julian Assange leak.


The Wikileaks twitter feed is putting out some really weird stuff lately. They are claiming their "supporters" are behind the attacks, which makes little sense.


The Internet is so resilient. LOLz.


The internet is resilient against being completely taken down (as demonstrated by everything working just fine for me from Germany). It's explicitly not resilient against taking parts of it down. You can take some continents off the internet entirely by cutting half a dozend cables, but it's extremely hard to make the internet unusable for everyone.


I'd like to see proof of this attack from an outside network observer.

Is it possible the government could force a DNS provider to pretend to fall victim to a DDoS attack, as a form of a false flag cyber attack?


Why does it always have to be a "Nation State", have been hanging out with 17 year old's that knew far more about DNS configs than a room of "Cyber-Security-Professisonals", they were clueless, these kids could run circles around them.

Kids.


USA cyber defenses are NOT up to the task of defending our critical electronic infrastructure. Letting every company that runs critical services decide their own security posture is not scalable and has left us vulnerable. While no one is getting hurt, we are taking cyber missile hits from our enemies and eventually the damage will be worse. Other countries with more central controls will be less vulnerable than we are to crippling infrastructure take downs.


Who the hell uses the word "cyber" and especially "cyber missiles" non-ironically these days? (Government people, for some reason, but yeah.)

Critical infrastructure MUST NOT rely on PUBLIC networks like the Internet. It's a TERRIBLE idea. If you're working on anything actually critical, build your own fucking ISOLATED network with your own fucking cables.


No. What we need are new techniques for creating back-pressure to all the routers which are forwarding on this type of attack. The issue is that our Routing technology does not give downstream nodes any way to push back on the flood of packets.

Cisco could step up to the plate here. And no, I'm not talking about firewalls. We need newer ICMP type packets to create this back-pressure, so that we can stop floods like this.


>"What we need are new techniques for creating back-pressure to all the routers which are forwarding on this type of attack"

What is this type of attack? TCP/UDP/ICMP has no notion of of a compromised host. Or even that it was crafted packet.

Back pressure already exists in TCP, see slow-start and window sizes, flow control is part of the "control part". When a router's porst buffers are full the router drops the packets on the floor. It does not do further processing of those packets.

What would ICMP do here? If I have a million compromised hosts and each sends a single SYN packet towards a destination host, how would ICMP help?

">Cisco could step up to the plate here" What would Cisco do? Cisco doesn't control the ICMP protocol.

I think you are not understanding ICMP. The job of ICMP is to report error conditions. ICMP serves as a helper to IP which is itself unreliable and has no form of error control or checking. A router or host being overrun is not a network error condition it is a resource condition. No ICMP type is ever going to be able to stop a host from originating UDP/TCP/ICMP towards a destination. Even if it could you would just overwhelm it in the outbound direction by replying to potentially millions of hosts.


It looks like we used to have something similar but it got deprecated. https://tools.ietf.org/html/rfc6633


I was a big fan of ICMP Source Quench in the early 1980s, but it wouldn't help now. It doesn't have authentication.


> we are taking cyber missile hits

A better analogy would be 'mocked cyber mass protests' seeing as how no infrastructure will need to be rebuilt after this passes.


Thankfully, we don't leave in the Star Trek universe, where hacking a computer causes it to explode in a shower of white-hot shrapnel.


Zzzz.

The USG is not responsible for a corporation that runs DNS services.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: