Hacker News new | past | comments | ask | show | jobs | submit login
Load Balancers need static IPs (pagerduty.com)
71 points by bpuvanathasan on Sept 1, 2010 | hide | past | favorite | 25 comments



Here's how we solve this.

We host our DNS with DNS Made Easy, but you could also run your own DNS servers. (We actually have a pair of EC2 instances that we configured as DNS servers and then shut them down, so we're paying a minimal amount for them, but can spin them up very quickly the next time DME is hit with a DDOS.)

We query the ELB CNAME periodically and check its IP. If the IP changes, we update our corresponding A records with the new IP. It's a small amount of code and a cronjob that runs every 5 minutes.

Elegant? No. Get the job done? Absolutely.


Great, except for when your users DNS resolvers cache the DNS entry longer than they're supposed to (many resolvers ignore TTL), and are unable to reach your site.


No. Amazon gives you an A record and asks you to make a CNAME for it. So when they change an ELB's IP, they update their A record. So in our case, we just make sure our A records follow along. Either way, a DNS cache holding onto a stale record too long would cause a problem.

Which is why Amazon keeps the ELB active on both the old and new IPs for a period of time.


If you could use a cname for your MX record, what are the benefits of using ELB over what's already in place (weighted mx priorities, with the ability to relay mail away automatically on failure from AWS) ?

Also there's additional cost involved in using ELB.

I would prefer having a good dns provider, low ttl, EIP, nginx w/haproxy.


(Blog author here) I totally agree -- I don't think there's ever a good reason to use a LB for mail, for exactly the reasons you mention.

However, the problem with using CNAMEs for LBs is that you can't both host mail and a site at the same subdomain. Ideally, what I would want to do is set up redundant MX records for .pagerduty.com to our mail servers, and also set a CNAME from .pagerduty.com to the ELB for to handle the web traffic. The DNS spec doesn't allow this though (it would be ambiguous).

I've thought about using round robin DNS with a low TTL instead of an ELB. Problem there is you don't get all the fancy auto-scaling stuff. I've also heard rumors that some ISPs have their DNS servers configured to put a floor on the retrieved TTL values...


I think you are blaming Amazon for a problem that is inherent in DNS, or possibly your approach to handling your DNS based features.

Unless I've drunk too much AWS Koolaid, I'm fairly certain you can run wildcard DNS for just the MX records for your domain. That entry would look something like:

*.example.com. 3600 IN MX 10 mail server.example.com.

Your mail servers can take the DNS synthesized domain from there I think.

BTW, I agree serving naked domains is a bit of a PITA, (appengine problem too!) but you can solve that by assigning a few elastic IPs to a few web heads and use RR DNS for them, with some code to take them out if one fails. Zerigo, for one, supports doing something like this IIRC.

302 anyone using the naked domain to the www. I doubt it matters much load wise as it sounds as if your running subdomains for your app like we do at Loggly.


I just realized you are probably running CNAME wildcards for the subdomains as well, which wold conflict with the MX one. How about running separate records for each subdomain?


Even with separate records for each subdomain, I think you'd still have the problem.

You'd need to do: acme MX (mail_server_ip) acme CNAME (ELB hostname)

... but that isn't allowed. The problem is with the records conflicting, not with the wildcard.


Doesn't that give you trouble with the reverse lookups though?


Reverse lookups on Amazon are trouble either way. You can't control them.


That sucks for email servers then, after all you can put MX records up until the cows come home, plenty of services will not accept from or deliver to your server if the reverse lookup is not working properly.


I'd rather say "load balancing needs SRV records" http://www.anta.net/nic/draft-andrews-http-srv-01.shtml

Is it widely implemented yet? Why not?


There's no need to use a load balancer for MX. The protocol it's self handles the problem of machines being offline. I agree with your point about not being able to point the root of your domain at an ELB though... I wasn't aware that you can't use a CNAME at the root.


The problem though is that you can't put MX and CNAME records at the same point in the DNS hierarchy. So if you want to host a site on an ELB at acme.pagerduty.com, you can't then put in MX records for acme.pagerduty.com, because they'll conflict with the CNAME you need to put there.

This all happens because it creates ambiguity. If you want to look up the MX record of a name that has both MX and CNAME entries (say acme.pagerduty.com), should the name resolver:

a) Grab the MX record at acme.pagerduty.com; OR

b) Do what is usually implied by a CNAME record, and pull the target of acme.pagerduty.com, and search for an MX record at that name?

Because of the potential for conflict, the DNS spec simply forbids CNAME records from existing alongside most other records.


Of course. That's a pretty poor oversight on Amazons side.


I am not sure I understand the issue here. Sure, the AWS ELS offering is lacking and the issues re DNS are known. But just like any of their other offerings why try and shoe horn the product into your requirements. ?

Our service required a load balanced service and just like you we identified the issues and decided we could not live with the limitations.

So we brought up a small instance and run HAProxy on it to do all our load balancing. We assign an Elastic IP and we get to retain control, avoid the DNS issues, do https etc.

Essentially avoiding all the limitations of the Amazon offering.

AWS themselves use HaProxy for Elastic Load balancing and its solid.


Does the IP of the AWS load balancers ever change over time? Does anyone know the technical reasoning behind the requirement?


Yup -- they most certainly do. I think they switch around the name -> IP mapping in response to traffic rates -- if you suddenly get a surge of traffic, they'll move your virtual ELB instance to a physical load balancer that is currently lightly loaded.

Basically, they are doing load balancing for their load balancers. :)


Does this actually cause a disruption of service though? It seems like it would be trivial to just keep the configuration in place for 72 hours even after changing the DNS entry.


They actually do that -- they keep the old IP live for a while to prevent problems with clients that might have cached the IP.


So could we not update the A records IP address every couple of hours and still get roughly speaking the benefit of load-balanced load-balancers?


Yes, Amazon have been ignoring this problem for a while now. Here's a post I wrote on their forum over a year ago: http://developer.amazonwebservices.com/connect/thread.jspa?t...


Is this a joke or irony? (The blog seems to be down right now).


The title may be a little broad (his concerns are mostly in certain use cases, not in general) but he definitely raises some real concerns.


Let me be clearer - the link does not load for me. http://blog.pagerduty.com is dead here. http://pagerduty.com loads fine though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: