"As we all know, there were a few very public and very unfortunate outages on the Amazon side, but being 100% honest with ourselves, as a brand new company we would have probably cause more downtime ourselves if we were running the conversion and storage for the app."
I rarely hear this type of thinking when discussing AWS. More people need to realize and accept this.
Another argument we don't hear often is that the EC2 model encourages (and almost insists) that you build in failover from the start of a project.
S3, SQS, and SDB provide the same low barrier to entry as EC2 for bootstrapped startups but what doesn't get mentioned often is that as your application matures you can replace each of these components using EC2 with custom, vendor, or open source products. You can do this one by one and eventually what you have is a platform that could be run independently of AWS altogether when/if the need arises.
It really depends on your skill-set and experience with running redundant, HA, systems and services. And if you have the time to deal with it yourself.
There's no reason you can't provide better uptime than Amazon has thus far, if you have the right folks and the right budget (multiple boxes at each tier). If you don't have the experience in-house, and don't want to manage your own servers, that's 100% legitimate, but don't assume beating Amazon's uptime would be hard.
> If you don't have the experience in-house, and don't want to manage your own servers, that's 100% legitimate, but don't assume beating Amazon's uptime would be hard.
The relevant questions aren't whether you can beat Amazon's uptime but:
(1) How much is it worth to beat Amazon's uptime by the amount that you're likely to beat it?
(2) Could you produce more value by spending the time/money that it would take you to beat Amazon on something else?
One other point to consider is that while AWS provides nice scaling and availability features, it still has some latency issues to work out. Latency between two EC2 instances will be higher than two systems you are running on the same LAN, so if you need high-throughput/low-latency communication between nodes you should probably also run your own server farm.
I don't know much about this topic, but wouldn't it be prudent to be signed up for another service as a hedge and have a way to switch/share as a failsafe against outages?
Or should there be a company that provides this service?
This depends a lot on your budget. If you are running things on a shoestring you should definitely go 100% cloud - you will not have the expertise or capital to get close to the availability and scaling cloud services offer. If your budget is somewhat larger then you can use the cloud for your primary servers and have a small colo footprint to try to cover for those times that the cloud is unavailable or for staging/testing prior to updating your cloud services (take a look at the eucalyptus project, an open-source EC2/AWS clone.) If you have a large-enough budget to provide the high-scale, high-avail services then you might still want to consider the cloud for handling surges and as a failover system to keep your capital budget lean.
I rarely hear this type of thinking when discussing AWS. More people need to realize and accept this.