Hacker News new | past | comments | ask | show | jobs | submit login
How To: Hosting with Amazon S3, CloudFront and Route 53 (paulstamatiou.com)
154 points by PStamatiou on Jan 13, 2014 | hide | past | favorite | 67 comments



Nice write up. We have a very similar setup (Jekyll generated static site + S3) for our website[1] and reading through this is kind of nostalgic of getting it set up (and a friendly reminder to go back and gzip some of CSS files).

The biggest plus of this setup is that once it's deployed you don't think about it. It just works and you never think about scaling. Oh and it's cheap (seriously it's like peanuts a month as all you pay for is bandwidth at $.10/GB).

The biggest negative is getting SSL. CloudFront supports it but it's expensive ($600/mo see [2]). Compare that to the pennies it costs to host the non-HTTP site on S3. In our case our cloud app is on completely separate domain (SSL-only) and our public site is informational only so the trade off works. The only SSL enabled link on our public site is for our contact GPG key and it's linked directly to the HTTPS S3 URL.

[1]: http://www.jackdb.com/

[2]: http://aws.amazon.com/cloudfront/pricing/


Why put the GPG key on HTTPS page linked from a HTTP page? If the HTTP site is compromised through MITM the attacker can easily change the link to a bucket he controls, that is also HTTPS (i.e. https://s3.amazonaws.com/secure.jackdb.com/pgp/security_at_j...).

I don't think it adds anything to security, but actually provides for a fake feeling of safety.


You're absolutely right about being able to MITM the HTTP piece and replace the content. That's true for any mixed content site. In this case though I disagree that having the HTTPS link to S3 is entirely useless. It's used specifically for an SSL link to download our GPG key, that additionally is available on a number of key servers and indexed by search engines like that too[1]. In that usage it's one of many ways of getting that key and, like all GPG keys, should really be verified before use anyway. For just about anything else though I agree that mixed content is a very bad idea.

[1]: https://www.google.com/search?q=jackdb+gpg


Alright, I thought I was missing something :)


I wish I had read the wisdom of this comment a couple months ago. I was having problems scaling on EC2 for my business, and ended up hacking together this setup - host static website on S3, and link to SSL-only domain for secure stuff. It's incredibly frustrating when Amazon does nothing to disseminate information about how to more effectively use their cloud infrastructure.


> It's incredibly frustrating when Amazon does nothing to disseminate information about how to more effectively use their cloud infrastructure.

Hmmm. We have thousands of pages of reference documentation and tutorials. We have forums. We have a team of Solution Architects ready, willing, and able to help.


Though I agree that the documentation is comprehensive, that itself can be a barrier. It all seems very deeply nested and enterprisey. I'm often left wanting a layer of noob-friendly UI or documentation, or "You probably want X"-style guidelines that differentiate between typical use cases and non-typical ones.

Compare, for example, the Digital Ocean UI for creating new VMs. It's a much more pleasant experience than the AWS Console for the new user with a small scale use case.


You're absolutely correct in that the documentation and tutorials are thorough and the support is outstanding (I've been helped before and am convinced of the quality of people doing support for AWS). I made that comment from the viewpoint of a programmer and business owner with the most surface-level sysadmin knowledge, who doesn't have the time to attend AWS conferences or read through hundreds of pages of documentation.

I'm not asking Amazon to "dumb down" their documentation or tutorials, I just wish that there was somewhere that Amazon could succinctly state the best way of doing things. This could take the form of a novice guide, as the other commenter stated, or as links in the documentation to articles in the blogosphere that detail practical ways to get set up with a certain AWS technology.

Also, in respect to this specific issue, the knee-jerk reaction that people on the AWS forums have in respect to scaling is to set up auto-scaling policies. Simple and intuitive ideas like hosting static websites on S3, hosting secure content on EC2, and combining the two, has very poor coverage in both the AWS documentation and on the Internet in general.


SSL certificates for CDNs are just a tough problem because there's no cheap/easy way to have one certificate that corresponds to a bunch of different IP addresses (that works on Windows XP).


It's not an issue with multiple IP addresses, but rather with being able to host multiple SSL-enabled domains on a single IP (which is what CDNs need to do). That is where SNI[1] comes in, along with all of its compatibility issues with older network stacks.

[1] http://en.wikipedia.org/wiki/Server_Name_Indication


So IPv6 would not be an issue then, I gues, every client could have an IP on each edge node...


Yes, that is a possibility. SNI will also become a more realistic option as older devices get retired. I've implemented several sites with SNI, and as long as you're aware of what devices/browsers/OSes will be connecting to your site (and their SNI support), things work great.


So why is it so expensive?


Why is it so expensive to add SSL with a custom domain to Amazon CloudFront? Because Amazon actually needs to acquire hundreds of individual certificates on your behalf and distribute them to its edge servers.


No they don't, you provide them with a single cert and they use that on every edge node.


SSL requires additional CPU resources, and is often coupled with hardware acceleration. All of this equates to higher hosting costs.


What is what so expensive?


SSL comes with the $20/month Cloudflare plan, and you may not even need it on your webhost then. They're also planning on introducing it to the free plan this year.


Why use ssl for a fully static site?


The same reason you'd use it for a non-static site; to ensure that visitors to your site are getting your actual site content and not something else(i.e. avoid a MITM).

If you had a purely informational site and it listed phone number, address, or heck even a bitcoin address, wouldn't you want to make sure that your visitors got the actual site and not something malicious?

There was a story last week about a guy who's ISP was inserting content into web pages. I can't find the link as I think it got lost as a result of the HN server crash. SSL prevents crap like that.


Are you referring to “I fought my ISP's bad behavior and won”[1]?

If so, in that case the ISP altered DNS results to point to its own HTTP server and redirect to the real one with a modified URL. SSL would have helped a bit, but the first problem is that DNSSEC isn't more widespread.

[1]: https://news.ycombinator.com/item?id=6992897


I wrote an npm module to automate this workflow. You can read about it at http://caisson.co/.

Simplifies the process to a couple commands:

  $ caisson init yoursite.com
  $ caisson push


This is so awesome! You've taken a 15 minute process and condensed it to 15 seconds.


Hi! I hope somebody can answer me this.

Why do you need this DNS routing? I tried to Google and see a large offer of "hosted DNS" services, but I don't understand something:

I have a small site. It runs over at Digital Ocean. I point the DNS records of the domainname to the Digitial Ocean server by putting then into the text-boxes where I log into the domain-name-reseller.

Where in all this would I require a more advanced solution?


CloudFront has different IPs for different regions and manages this via DNS. So, if you're on an ISP peered directly with CloudFront and make a request using your ISPs DNS servers, CloudFront's DNS might give you back the IP to a server directly on that peering circuit. These kinds of relationships and new servers are added/removed all the time, so the IPs change regularly. A hard-coded IP in an A record won't work because of this. It works for your DO VPS because the IP is static and there is only one server.


there's also the IP Anycast [1] option. That's what you get with OVH CDN [2], for example, but not with CloudFront. Anycast routing is totally independent from DNS, thus it works fine with an A record and also makes it easy to install SSL certificates.

[1] http://en.wikipedia.org/wiki/Anycast

[2] https://www.ovh.co.uk/cdn/


Anycast just refers to BGP announcing an IP address/range out of multiple locations. Most commonly, it's done with DNS because UDP is connectionless. It's becoming increasingly acceptable to do full anycast, i.e. have actual TCP/HTTP(S) on an anycast IP, but there are risks to that approach. If routing changes and you end up changing to a different location during a TCP session, the new location won't have all the state information needed, thus dropping the connection. It's theoretically possible to keep such state information synchronized, but it's sufficiently complex that nobody is doing so. This is why most of the larger, more established CDN's anycast DNS only and don't do full anycast.


Correct, though actually isn't CloudFront just a CNAME that could work with any provider?

One reason you might want something more advanced like Route 53 is if you wanted to point the "naked" domain (example.com vs www.example.com) to CloudFront. You can't just use a CNAME on an apex record, you need to use something like Route 53's "Alias" records.


One other (minor?) reason is performance. Route 53 allows you to have ALIAS records, which are a single DNS lookup - whereas CNAMES will incur a second lookup.


One use-case: at my last gig, we assigned each customer a new sub-domain, under our main one. Route 53 worked great for this. We used boto (python API for AWS) to create EC2 instances, get a static IP, and then create a new DNS entry in Route 53 to map to that IP, all in about 5 minutes.

Prior to Route 53, we used to do the mapping manually, as our primary DNS provider did not have an API.


One big warning here.

If you are wanting to serve static content for multiple domains (e.g. somefont.ttf for foo.example.com, bar.example.com and baz.example.com from a single CloudFront distribution) CloudFront is not your solution, because CloudFront does not vary its cache on the Origin header. So if your first visitor is loading foo.example.com/static/fonts/somefont.ttf, then the Access-Control-Allow-Origin header for somefont.ttf will be set to "foo.example.com". Subsequent requests for that file from (bar|baz).example.com will fail with a CORS error.

It was a pretty shocking thing to find out. We've concluded AWS/CloudFront isn't a viable CDN until this is fixed. Based on the following thread, it isn't clear when or if it will be fixed: https://forums.aws.amazon.com/thread.jspa?threadID=114646#


Any reason to serve this from a single distribution? Their UI basically tells you in no uncertain terms that subdomain == distribution.


Where does it say that? I'm not talking about assigning foo.example.com as the domain name for the distribution.


Right, this is still the scenario where you're using the distribution hostname they give you, e.g. abcdef1234.cloudfront.net

http://docs.aws.amazon.com/AmazonCloudFront/latest/Developer...

"Changing the origin does not require CloudFront to repopulate edge caches with objects from the new origin. As long as the viewer requests in your application have not changed, CloudFront will continue to serve objects that are already in an edge cache until the TTL on each object expires or until seldom-requested objects are evicted."


If you're needing an s3 deployment library for stuff like this, I'm planning a major merge on mine (s3tup) later tonight or tomorrow. It uses yaml files to declaratively control configuration of buckets and keys, and makes it nicer to do more complex things like setting appropriate headers based on pattern rules. Check it out here.

http://github.com/heyimalex/s3tup


I built BitBalloon (https://www.bitballoon.com) to simplify all of this, while bringing benefits such as atomic deploys, built-in form processing, automatic gzipping, bundling and minification of your assets and perfect cache headers.

We have a comparison with S3 here: https://www.bitballoon.com/blog/2013/12/03/bitballoon-amazon...


I use Namecheap DNS (free, as they are my registrar). I can control everything, including APEX. It has never fail me.

And for hosting, Github pages (Jekyll rocks!) do a great job. I think you are still paying too much, Paul.


> And for hosting, Github pages (Jekyll rocks!) do a great job.

Using GitHub pages to auto-build your Jekyll site is great if and only if you have no need to customize your Jekyll build or website environment:

- You must use the versions deployed by GitHub; most of the time they're up to date, but if there's a bug fix you're relying on in any of the libraries, you're out of luck. I ran into this on my own site: RedCarpet had a Markdown parsing bug that was fixed in version 3, but the Jekyll that GitHub Pages uses depends on version 2. Jekyll loosened this dependency weeks ago, but it's not in a full release yet.

- You can't use any Jekyll plugins, even the ones marked safe.

You can avoid the above by building your site locally and uploading the output, there are still a few other caveats:

- Like timrivera mentioned, there is no support for server-side redirects outside the baked-in ones (i.e. www to or from naked domain; name.github.io to domain). S3 has these out of the box.

- You can only use one domain per GitHub Pages repository.

- GitHub will allow Googlebot to index your repository's master branch. If you're used to that convention, you're out of luck: it's hardcoded into github.com's robots.txt. You need to use a different name for your mainline branch.

- You cannot use a private repo for GitHub Pages, and GitHub's terms of service require you to allow other people to fork your public repositories, regardless of license.

- There is no support for SSL.

If you're okay with all of that, GitHub Pages is totally fine. But if you aren't, non-VPS alternatives like S3 are pretty attractive.


Re: Google--if you use a CNAME for your site on GitHub pages, will your site also show up as name.github.io in google search results?


It shouldn't: name.github.io → example.com is a 301 redirect. However, https://github.com/name/name.github.io/blog/master/* and https://github.com/name/name.github.io/tree/master/* will show up in Google searches.

Here's an example, using the atmos.org example the GitHub Pages documentation uses: https://www.google.com/search?q=Saying+how+it+was+%E2%80%9Cs...

One result for atmos.org, and then a duplicate result for https://github.com/atmos/atmos.github.io/blob/master. Here's a screenshot in case you see something different: http://i.imgur.com/TavoyuW.png

The only way to prevent that from happening is to avoid using a branch named "master" in your repository.


That's what I immediately thought when I first read it, especially now that Github Pages are served over Fastly's CDN [1]. Then I realized that Github Pages do not support server side (301) redirects at all. Quite a big turn-off when you move an existing website over a new platform.

[1] https://news.ycombinator.com/item?id=7019148, https://news.ycombinator.com/item?id=6975830


I'm already hosting a website with this setup. Works perfectly and is blazing fast. I recommend it for everyone.

The website is a static marketing front for a web app that is being served from a SSL subdomain on another cluster. The only thing i'm doubting about is that i want to offer a one input field e-mail signup on the frontpage, which of course, will be without SSL in this setup. What would you do? Skip this fast signup and put the whole signup on the subdomain or use the signup with a post to the SSL page (less secure)?


I'd probably resort to an iframe on the SSL site.


Isn't this as safe as just posting to an SSL page? If there is a MITM-attack they will just replace the contents of the iframe to another page?


I built my first mobile-first layout using his writeup(s), and will likely move from Heroku to S3 using this one. High five.


thanks for reading!


Stammy! Nice writeup! There's a really simple way of redirecting your naked domain to the www-bucket at S3: just point the naked domain to ip 174.129.25.170 and it will redirect automatically. Just be warned it's a free service.


If you have gmail setup on the domain google supports a naked domain forward: https://support.google.com/a/answer/2518373


Yeah I came across that a few times but the free service aspect scared me away to be honest. I'd rather pay and be confident it's always working.


Is there documentation of this anywhere?



Great writeup. At the end he links to the AWS docs for this whole process which I found equally if not more helpful. http://docs.aws.amazon.com/gettingstarted/latest/swh/website... but OP's tutorial definitely has some extra informative tips


Yeah the key point is that if you know you are going to do Cloudfront, the AWS docs make you do extra steps. They have you setup DNS for S3, then change it for CF.


Paul's general workflow also works with just Grunt and one of its many S3 plugins. For example, you can clone the Bootstrap github repo (which comes with a nice Grunt build config), npm install an S3 plugin, add S3 deployment tasks to Gruntfile.js, and boom - static site generator and deployer.


I feel like SSL/TLS is a requirement for websites in 2014.

Does Amazon S3 and CloudFront support HTTPS?


Even worse than the $600 a month, there's no way to disable ssl on your site if you're serving off cloudfront, so if one of your users makes an https request and you're not shelling out the cash for custom certs they'll be greeted by a big red warning page in chrome (returned cert for *.cloudfront.com).


This is actually a great point against using custom domains (CNAMEs) with Cloudfront. At least if you can't afford the custom SSL certficate option.

Cloudflare somehow got this right. They serve non HTTPS enabled web sites with different IP addresses so that you can never reach them over HTTPS (could be better "This webpage is not available" vs. scary red "This is probably not the site you are looking for!" message in Chrome). Plus, they have a great free anycast DNS network that can be compared to Route 53. And best off all, you never pay for the bandwidth.


S3 and Cloudfront both support custom HTTPS certificates, to deliver https via your own domain name and certificate. [1]

[1] https://aws.amazon.com/cloudfront/custom-ssl-domains/


$600 per month seems expensive for a static site.


These are personal opinions, not fully factual, I've only worked with 2 cloud based start ups (neither full time).

Cloud services are barely a convenience to the customers/business that run on them. For a start up buying 50-100k in servers starting off is shocking but in most cases high usage cloud computing for hosting/databases will add up to that quickly.

The only thing 'cloud' actually does for its customer is prevent them from actually buying computers and renting rack space. Which isn't 'that' expensive (20k or so for a base line server), and 150 a month in rack rent.

Cloud lowers the bar of entry, but once you've entered staying with cloud isn't optimal.


I think it's because it's a wildcard certificate?


Its expensive since Amazon uses DNS based CDNs. Each pop requires them to assign you a dedicated IP. They don't do shared certs like other CDN providers. One nice thing is that you can use an EV Cert.



No, but if you think that it is a necessity, you can use S3 then Cloudflare in front to force SSL. I believe that Google Pagespeed Service allows a custom SSL Cert too.


Nice! We actually built a service to do this: https://getforge.com/ including a few other nice static hosting tweaks. Takes the hassle out of dealing with Amazon.


A cost effective alternative to this (I'm open to being corrected) is to use a cheap server (say a $5 monthly box from Digital Ocean) and Cloudflare (which is free).


FWIW, after evaluating many solutions, I'm switching my DNS from zerigo to DNS.he.net - free, featureful, and backed by a company I believe will be around.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: