Would really like to see some massive reductions in the operation costs and most importantly, bandwidth costs.
The bandwidth costs are so far out of line with what the network transfer actually costs, it just feels like price fixing between the major cloud players that nobody is drastically reducing those prices, only storage prices.
Charging 5 cents per gigabyte (at their maximum published discount level) is the equivalent to paying $16,000 per month for a 1 gigabit line. This does not count any operation costs either, which could add thousands in cost as well, depending on how you are using S3.
There are several providers that offer a unmetered 1gbps line PLUS a dedicated server for ~600-750/mo.
Providers like OVH offer the bandwidth for as little as 100/month. ( https://www.ovh.com/us/dedicated-servers/bandwidth-upgrade.x... ) I am just not sure how amazon can justify a 160x price increase over OVH or a 30x increase over dedicated server + transfer.
For the time being, the best bet is to use S3 for your storage and then have a heavily caching non amazon CDN on top of it (like cloudflare) to save the ridiculous bandwidth costs.
A consulting customer came to me a year ago, with a growth from 200TB/year in data production to over 6PB/year and their budget couldn't sustain that jump (or anywhere close to it)
Having come from the mass-facilities and data center space with MagicJack, I knew the wholesale cost of bandwidth, power and drives were continuously falling.
There are certain clients and use cases that need access to their data all of the time and the very bones they are built on is based on collaboration (Genomics).
For example, this client is now storing 6PB of data with us, 3 copies in separate data centers. We are half the price of S3, and we include all the bandwidth for free, but limited to a 10GigE per PB stored. This has worked out extremely well - we were about 20% (!!!!) the price of Amazon after you factor in bandwidth.
There are lots of challenges we faced, like over zealous neighbors in the environment, storing lots of small objects and high usage of ancillary features like metadata but for customers of any size. By putting the "tax" on bandwidth, a lot of these business cases are solved. I see why Amazon does that.
AWS is truly great, but as you get into very high scale (specifically in storage - 2PB+), it becomes extremely cost prohibitive.
It makes a lot of sense to be able to run loss making products. Otherwise everyone would use S3 together with Google compute engine and Azure databases (let's assume they'd be cheapest). In this scenario all providers would lose out.
In the current world, they can keep prices for some products below costs but make their money with bandwidth and the other services people are forced to use to avoid egress traffic.
"In the current world, they can keep prices for some products below costs but make their money with bandwidth and the other services people are forced to use to avoid egress traffic."
Which AWS products are loss leaders?
S3 storage pricing is not exactly cheap. Neither is EC2 instance pricing.
"Otherwise everyone would use S3 together with Google compute engine and Azure databases (let's assume they'd be cheapest). In this scenario all providers would lose out."
No, S3 would do well, GCE would do well, Azure would do well. Providers only lose out to the extent their products no longer compete on merit alone.
I can imagine that this is a good reason. Otherwise they could make bandwidth cheaper so that people who cannot move everything can at least move part of their applications.
I think the three providers are smart enough to know why they charge that much for bandwidth. And this is the only reason I could think of why all 3 of them charge that much. And I'm pretty sure that some products run at a loss, they do for nearly every company. But AWS won't tell us which ones.
It's reasonable to think that S3 is loss making or about breakeven on its own but recoups costs due to bandwidth charges.
I guess the latency between AWS Frankfurt and GC Belgium should be low enough (5-10ms) to use it for most applications. E.g. storing large amount of data at one provider and renting compute engines for processing at the other one. The latency shouldn't be an issue there, as long as the throughput is high enough.
Can confirm on this, storage for a lot of stuff is in S3 and compute is GCP preemtpibles. Works if you have a small dataset which requires a large volume of compute.
Sorry for the delay - yes it's ploid.io - nothing up there yet. We've been in stealth mode while we've been building the system for our first client - HudsonAlpha institute for biotechnology
Feel free to ping me at brandon at ploid.io - happy to share any insight we've gained!
Sorry for the delay - yes it's ploid.io - nothing up there yet. We've been in stealth mode while we've been building the system for our first client - HudsonAlpha institute for biotechnology
Feel free to ping me at brandon at ploid.io - happy to share any insight we've gained!
> The bandwidth costs are so far out of line with what the network transfer actually costs, it just feels like price fixing between the major cloud players that nobody is drastically reducing those prices, only storage prices.
Yes, it makes me wonder. For many applications, the ridiculous bandwidth costs will be substantially higher than compute costs, so it makes no sense that nobody in this supposedly competitive market wants to compete in this area.
> There are several providers that offer a unmetered 1gbps line PLUS a dedicated server for ~600-750/mo. Providers like OVH offer the bandwidth for as little as 100/month. ( https://www.ovh.com/us/dedicated-servers/bandwidth-upgrade.x.... ) I am just not sure how amazon can justify a 160x price increase over OVH or a 30x increase over dedicated server + transfer.
OVH and other large volume providers are probably relying on the fact that many customers won't use 100% of their purchased capacity, but even taking that into consideration, AWS/GCE/Azure bandwidth pricing is insane.
It is indeed curiously unspoken among the major providers. (Softlayer too raised all their bw rates when IBM bought them.)
Part of building a moat? On one side we have a big stack of custom APIs. On the other -- once you have a few hundred terabytes of data in our systems -- it will be very costly to emigrate.
I knew as soon as I saw the headline that the would be reducing storage costs (who cares????) and not reducing the bandwidth costs which have been stuck for years, which are price-fixed with the other providers, and which account for the vast majority of our S3 bill.
Are bandwidth in costs free? Are you referring to bandwidth out costs? I don't work for them but I imagine they model bandwidth out as disk reads for accessing and reading your data. Think about how much it would cost to read at the rate of a 1 gigabit line from disk continuously for a month. Put a CDN in front of S3 if you have high bandwidth costs and low diversity of files.
In addition to that, the pricing for "Data Transfer OUT From Amazon S3 To Internet" is exactly the same as for "Data Transfer OUT From Amazon EC2 To Internet", so this is not specific to S3 but to EC2 and AWS as a whole it seems.
AWS appear to have really expensive egress costs (or really profitable egress margins) compared to OVH and Hetzner. If so, then something is stuck, either the costs are not being addressed or the margins are not being passed on.
Due to the high traffic volume, they peer with most large providers. Of course they need to pay rent on the fiber that they rent between DCs and peering points, but at this level the cost should be rather low. And a lot of the traffic will be free, especially in Europe where you can peer with many providers on exchanges.
CloudFront costs are super expensive outside of the US and Europe. We are a small company and just in south America we are paying 1000USD per month for 4 TB of traffic (0.25$ per GB). Based on traffic alone we are loosing money with some customers.
Seems like you should use CloudFront for things that are both small and need to be fast (HTML/ JS/ CSS), and then use a separate service for hosting your fat media files. Heck, rent a pair of servers with unlimited bandwith somewhere for $100 a month. Yes the bandwidth may be a bit oversold and spikey, but if a giant file download slows from 10 MB/ sec to 5 MB/sec for a bit, I doubt it matters THAT much.
For many hosts, bandwidth is what they are really selling. You may rent a server from them and they may make some money from that over the capital costs, but the amount they are charging for bandwidth in comparison to what they pay for it is where the real money is made.
I use s3 as a webserver for static pages. Total bandwidth costs: 17 cents a month. 14 or so cents now. And I have some tens of thousands of visits per month. The only cognitive thought I've had to put in since setup was questioning one aspect of a bill (it amounted to pennies, but I couldn't figure it out, and maybe they were overbilling bigger entities). And they gave me a month free (including DNS costs).
I wish the Route53 would go down... Hard to beat, even if I were running job-replacing amounts of income.
Even with 100kb (assuming it's not only text), 1mln pageviews per month across all sites will bring you to 100GB, a point where a small vps at DO/Vultr/Hetzner/OVH would be cheaper than AWS. And since he was talking of several pages, 1mln pageviews isn't even a very popular page.
I think it is possible to draw some conclusions from the volume discounts AWS is giving on data transfer. From 10TB/month to 150TB/month the price drops $0.09 => $0.05. I don't see how this kind of change in individual customer's volumes would make any difference on how much Amazon pays for the traffic. So I assume it is quite profitable for them to be selling at $0.05 (and lower) and the higher prices are just because smaller customers accept them.
The bandwidth seems to be used for trapping users in. You don't get charged when you're uploading data to AWS but downloading it back is quite expensive.
Well, the costs are nicer, but mostly, Glacier goes from an unusable pricing model to a usable one. I was terrified to use Glacier. The previous model, if you made requests too rapidly, you might be hit with thousands of dollars of bills for relatively small data retrievals -- very easy to make a very expensive bug.
I had wanted Amazon to wrap it in something where they managed that complexity for a long time. Looks like they finally did.
Now the only thing Amazon needs to do is expand free tiers on all of their services, or at least very low cost ones. I prototype a lot of things from home for work -- kinda 20% time style projects where I couldn't really budget resources for it. The free tier is great for that. All services ought to have it -- especially RDS. I ought to be able to have a slice of a database (even kilobytes/tens of accesses/not-guaranteed security/shared server) paying nothing or pennies.
> The previous model, if you made requests too rapidly, you might be hit with thousands of dollars of bills for relatively small data retrievals -- very easy to make a very expensive bug.
Glacier has supported for something like 18 months now, the ability to put a policy on your vault that capped your maximum retrieval cost. Whenever your request would cause you to exceed that limit, it would get throttle response that the SDK handles happily. I've used it when I needed to retrieve a whole bunch of data and wanted to do it faster than the free tier supported. I set it at $5 and just left the retrieval running.
A t2.large costs 10 cents/hour, a t2.medium RDS instance costs 7 cents/hour. If you put in 50 hours/month on this side project, that's $8.50 for compute plus maybe $3 for ~ 30GB of storage.
$11.50/month doesn't sound too hard to budget for.
It might be more about bureaucracy than about cost. At my last job even small expenses would require printing forms and getting the CFO to sign them, so in the end it was pretty much not worth it for small tests.
Your work can't give you a few VMs in a lab somewhere that you can get free rein to prototype on? That is usually not too hard to get..
it seems the idea of the Amazon free tier is to give people a taste of AWS so they can decide to go in more or not. It's not really designed to be a free prototype for existing large customers product. Like the other poster said, you can host a tiny VM for $6 or so a month, not a big expense.
If you are asking for $4000 a month production cluster, yes that is harder to just get..
Dynamo might fill this need for you; you can provision a 1/1 table for less than a dollar per month and just put everything in it. Dynamo will bank some provisioned throughput for you to allow bursting, and the client apis all implement exponential backoff in the event of a throttle, so a 1/1 table isn't nearly as scary as you might initially think.
Same about prototypes! I wonder how many other people have created really great products for their company doing the same thing. It would be in any company's best interest to provide free tiers for experimentation.
While I'm not going to complain about a price reduction, I'd honestly be more excited if S3 implemented support for additional headers and redirect rules. Right now, anyone hosting a single page app (e.g. Angular/React) behind S3 and Cloudfront is going to get an F on securityheaders.io.
And even worse, there is no way to prerender an SPA site for search engines without standing up an nginx proxy on ec2, which completely eliminates almost all of the benefits from Cloudfront. This is because right now S3 can only redirect based on a key prefix or error code, not based on a user agent like Googlebot or whatever.
This means that even if you can technically drop a <meta name="fragment" content="!"> tag in your front end and then have S3 redirect on the key prefix '?_escaped_fragment_=', that will be a 301 redirect. This means that Google will ignore any <link rel="canonical" href="..."> tag on the prerendered page and will instead index https://api.yoursite.com or wherever your prerendered content is being hosted rather than your actual site.
Not only is it a bunch of extra work to stand up an nginx proxy as a workaround, but it's also a whole extra set of security concerns, scaling concerns, etc. Not a good situation.
edit: For more info on the prerendering issues, c.f.:
Tell the CloudFront team to support adding custom HTTP headers to client responses (not the currently supported headers added to the origin requests), such as HSTS. CloudFlare and others already support this.
This is really something that needs to be setup/supported at the CloudFront layer (as CloudFlare and other CDNs already do) instead of S3. The danger is that somebody sets an HSTS or other security-related header on their bucket and breaks access for all other customers inadvertently for their customers that fetch from the S3 domain.
> While I'm not going to complain about a price reduction, I'd honestly be more excited if S3 implemented support for additional headers and redirect rules. Right now, anyone hosting a single page app (e.g. Angular/React) behind S3 and Cloudfront is going to get an F on securityheaders.io.
You can setup "Origin Custom Headers" in CloudFront ;)
That's for sending headers to your web server or S3 though, not for sending headers to the user. There are a few extra headers you can send to the user, but not the security related ones.
Take a look at Netlify (disclaimer, I'm a co-founder), you'll get redirects, built-in prerendering, automated deployments, custom headers, etc. All with full cache-ability on our CDN.
I can vouch for Netlify. We use them for all our stuff at Graphcool and it is awesome. Going from S3+CF to netlify provides a step improvement similar to going from EC2 instances to a managed docker cluster.
Hmm, as someone who is about to move from using a webserver to hosting our single page app on S3 you may have just convinced me not to.
What exactly are the benefits over simply setting up nginx if not simplicity? Yeah, it's great to just serve the asset from s3 but the complexity of what you just described negates it almost entirely.
> What exactly are the benefits over simply setting up nginx if not simplicity?
We weren't fully aware of these limitations when we decided to host our site on S3. Had we been, we may have just used nginx.
There are obviously a ton of benefits to S3 and Cloudfront, it's just than in practice you can't really get them if you need Google to index your site. And while Google claims they can now execute javascript and include async content, in practice this isn't true for any real Angular or React site.
And even if every search engine were to magically execute js correctly, you'd still need to prerender your site in order for Facebook and Twitter to populate the preview cards for your site with the proper headline, summary, and image.
You should keep in mind that S3 really is an object storage. It's named as such, advertised as such, priced as such and the limitations you're hitting are because it's built as such.
It does work if you want to host a static site, and it's nice that they offer a bunch of extra niceties to help make those work... but expecting things like user-agent-specific redirects is a bit much for what's essentially a filesystem.
I mean they let you put S3 behind Cloudfront with a TLS cert. Having read through all the documentation, hundreds of blog posts, etc., I've seen absolutely nothing that indicates that S3 + Cloudfront isn't meant for serious web hosting.
Amen! Have been baffled by the urge to move stuff to S3 for app hosting. I understand the convenience (sort of) and the scalability aspects (mostly) but you seem to lose loads of functionality.
Do Angular sites seem to render correctly in Google Webmaster tools but they aren't actually indexed properly unless you have pre-rendered pages served? I'm curious since when I ask Google to index a specific page it shows everything properly loaded/rendered and I was under the impression that was also being indexed.
I don't really understand the advantage. Getting a small vps with nginx is fast to set up, needs very little maintenance and can handle a large amount of traffic (requesting static pages). Can't really see how S3 can be cheaper or easier.
Yes it gives some scalability, but so do many cloud providers. Digitalocean and Vultr both have SSD storage that you can attach to VMs. The speed I've seen is fast enough to easily saturate the bandwidth. Of course you'd need to scale up when you need bandwidth of more than 100mbit, but this is still cheaper than paying for AWS bandwidth and servers.
Using S3 as a web server means: managed reliability, scalability, security, maintenance, updates, https, and price efficiency (pay only if pages are visited)
Cloudfront or API gateway help you integrating with non static resources.
Confusingly, _escaped_fragment_= is always a prefix if you have a #! in your url, otherwise I think it can come anywhere in the URL params.[1] Here is the exact system of redirects we were using, along with an explanation of why it didn't work:
Is anyone using either S3 or Glacier to store encrypted backups of their personal computer(s)? I've only used Time Machine to back up my machine for a long time, but I don't really trust it and would like to have another back up in the cloud. Any tools that automate back up and restore to/from S3/Glacier? What are your experiences?
I use Arq (https://www.arqbackup.com/) and it works very well. I've only tested retrieving small amounts of data from it so I can't comment much on a large bill. I only wish it worked on Linux. I've been thinking about seeing if it would work with Wine.
Also recommend Arq against any service. At a certain scale, especially after factoring bandwidth for restores, Amazon Cloud Drive at a flat $60/year becomes more attractive.
I have about 150GB in /home backed up daily using Duplicity, which encrypts & compresses everything and saves to a second internal backup drive. Data is kept for 6 months minimum. After several years, the total backup size is 190GB which syncs daily to an S3 bucket and my monthly bills are about $11USD. If I ever had to restore all that from S3 it would cost extra but would not be prohibitively expensive.
Install the AWS CLI (https://aws.amazon.com/cli/) and choose whatever method you like for making an encrypted local backup. Then sync that backup partition to S3 every day.
Here's an example command you can adapt for cron to call via a shell script:
I use Tarsnap (https://www.tarsnap.com/), which uses S3. It's true, though, that I only backup a small enough subset of my files, not anywhere close to a complete image. OTOH, it's very easy to use (CLI only), and as secure as it gets.
Using S4cmd on my Debian box I backup copies of my entire Lightroom folder structure contents. All my personal photos and videos get uploaded to a bucket on S3, then I convert the entire bucket to Glacier. Now with the new pricing of $0.004/GB and easy retrieval, it's a very nice setup.
To backup nearly a terabyte of photos costs about $4/month in storage costs. Uploading costs a bit extra due to the pricing for requests.
I was a masochist and hand-rolled incremental tar snapshots to Glacier using a cronjob. It works great for keeping price down, though I learnt the "lots of retrievals" lesson the hard way once I actually tried to retrieve my data. I now compromise by doing a full snapshot monthly and incremental daily, so I'm upper bound to 30 retrievals.
I'd highly recommend not repeating my mistake - use a real backup service for your actual data. Though rolling your own can be fun and interesting, it's probably a bad idea to bet your data on it.
I use my Synology NAS pointed at Google Nearline (through the included S3 support!). GCS also supports customer supplied encryption keys, but I store the bytes as encrypted on the NAS box itself.
As others have said, if you're just trying to back up your Mac, take a look at Arq.
Disclosure: I work on Google Cloud (and get $30/month of credit, so my first 3 TB are free).
I use duplicity to backup my VPS to S3, and Fast Glacier to archive stuff to Glacier from my Windows machine (things like photographs, important documents etc.)
Note: Glacier is not a backup service, it's an archive service. It's for long term backups, vs the relatively short term of standard backups (Full, delta, delta.. cycles). With Glacier if you delete an archive before 90 days, you'll still be charged for the full 90 days of storage.
I second this. It costs about $50 for a license but they support incremental backups up to any of a NAS, s3, s3 glacier, various tiers of google cloud storage, dropbox, and google drive. Client-side encryption is built in. It's good software.
This a really dumb question, but since I've never used Glacier what does the workflow for a Glacier application look like? I'm used to the world of immediate access needs, and fast API responses, so I can't imagine sending off a request to an API with a response "Your data will be ready in 1-5 hours, come back later".
I work on quality control medical data (MRI images) and have huge data sets from machines going back over a decade. Most of the useful stuff is extracted metrics (stored in a db), but every now and then we need to pull up a data set and run updated analysis algorithms. We'll usually keep the latest couple of years in S3, and the rest in Glacier.
The data trove is fairly unique, and valuable in being the only of its kind, but we don't need anywhere near instant access to most of it.
It may be for backing up infrequently accessed data (compliance logs, etc) for example.
Hypothetical: you create a logging service for users to send all their log data to you. You promise 365 days of archives, but 30 days of data accessible at any time. You create a lifecycle rule on your S3 bucket to automatically archive data to Glacier 30 days after creation. On the 31st day, your user decides they want to look at an old log. They click the big Download button. You display a message saying they'll get an email from you when that data is ready to download.
Why not? If you have ever worked with any asynchronous API, you have already been introduced to the "come back later" model. Does it really matter how much later it is?
Right, but if you want to retrieve some data through their API, how does it work? Normally you open the connection, ask for the data, then receive it and close the connection, does that change if there's a 5+ hour wait between the ask and the receive? Do you just leave the connection open? Provide them with a webhook to call when it's ready? I don't personally care about the answer but I'm pretty sure that's what they were asking.
With Glacier you submit an "InitiateJob" request to say "Fetch me this archive". That returns you a job ID in the response.
From there you can submit a "DescribeJob" request, with that Job ID as the parameter, and the Glacier service responds with the state of the job.
Once the job is marked as complete, you submit a "GetJobOutput" request with that Job ID. That response is the archive body. (similar to how you'd do a GET request from S3).
You've got 24 hours to start the download of the archive before you'll have to repeat the entire InitiateJob->GetJobOutput cycle again.
At work our needs are simple, we manually run the aws cli to sync files up to S3 where there's a 1 day lifecycle policy to move them to Glacier. We don't use the API for restores, we do those through the web console and check back in a few hours to see if the files are downloadable.
I think through the API you do not leave the connection open, you check with whatever frequency you want and when it's ready, the response will include the temporary location on S3 for the file.
Yes, they are high for all 3 major cloud providers. I guess it's to keep you from using the cheapest part of each provider. That way you have to use their whole ecosystem and they can offer some products below costs to lure people in.
What is the mechanism that makes it cheaper to take longer getting data out? Is it that they save money on a lower-throughput interface to the storage? Is it simply just market segmentation?
In theory, tape [1], optical [2], or spun-down disk [3] are cheaper but slower than spinning disk. Erasure coding [4] is also cheaper but slower than replication. One could even imagine putting cold data on the outer tracks of hard disks and warm on the inner tracks. In practice I suspect Glacier is market segmentation.
Pure speculation here, so I'm probably completely wrong, but I imagine it's so that they can essentially reduce the amount of seeking. Glacier uses magnetic tape storage, I think, and I believe each tape has to physically be inserted into a machine to be read, and then be removed afterward. So there would be some downtime as tapes are swapped in/out. Therefore, it would make sense to aggregate reads ahead of time and maybe even physically reorder whole tape accesses to reduce the time it takes to load them.
But this wouldn't explain why the read rate factors into cost. Maybe they scatter data across tapes as well, and higher bandwidth requires loading more tapes concurrently?
Again, total conjecture. Please let me know which parts are wrong and which are right.
I currently use S3 Infrequent Access buckets for some personal projects. These Glacier price reductions, along with the much better retrieval model look really great.
However using Glacier as a simple store from the command-line seems horribly convoluted:
If you don't want to mess with all of that, you could use the standard "aws s3" command to upload your files to your s3 bucket like normal, and just apply an archive policy to your bucket or archive/ prefix or whatnot and it will automatically transfer your files to glacier for you.
Has anyone tried to migrate to Backblaze. Their pricing seems really aggressive but I am not sure if we can compare Amazon and Backblaze when it comes to reliability.
I love the folks at backblaze but the single datacenter thing really worries me (and again, disclosure, I work on Google Cloud). If you're just using it as another backup, maybe that's less of a concern: your house would have to burn down at the same time that they have a catastrophic failure. But it is part of the reason you see S3 and GCS costing more (plus the massive amount of I/O you can do to either S3 or GCS; I'd be curious what happens when there's a run on the Backblaze DC).
Sorry if I wasn't clear: your bytes on GCS and S3 are stored across multiple buildings (GCS Regional, S3 Standard). More copies is more dollars not less ;).
As far as I am aware GCS does erasure coding across sites?
Backblaze could do multiple tiers of erasure coding and they would still be able to reduce prices given more scale, ceteris paribus.
It's not a question of number of replicas, data centers or technical implementation, but a question of pricing policy.
Does one want to use volume and scale to drive prices down (and cheaper prices to increase volume) or does one want to use volume and scale to bloat margins? Backblaze are arguably doing the former.
Does one want to lock customers into an ecosystem by enforcing excessive bandwidth prices or does one want to pass on bandwidth cost-savings to customers? Backblaze are arguably doing the latter.
Backblaze would continue to be cheaper because their pricing policy serves customers across all dimensions.
More scale is definitely less dollars not more (even if it means a fraction of a few more erasure coded shards across sites).
Anyone else finding their S3 bill consisting of mostly PUT/COPY/POST/LIST queries? Our service has a ton of data going in, very little going out and we're sitting with 95% of the bill being P/C/P/L queries and only the remaining 5% being storage.
Either way, good news on the storage price reductions :)
I hit that scenario, when we create tons of small files (e.g. < 10KB ones). In that use case it is often cheaper and easier just to use database such as DynamoDB.
See my other comment, it got link to article about S3 costs optimizations which got more detailed recommendation.
What app/site are you using to upload to S3? I use a combination of CloudBerry Backup and Arq Backup on my Macs/PCs here and the requests aren't that high (on average about 30Gb of data per machine in around 300K files).
I am guessing it comes down to the algorithm used to compare and upload/download files. I believe the two solutions above use a separate 'index' file on S3 to track file compares.
It's more that we have a pretty high throughput system, using Lambda.
Users authenticate with an API gateway endpoint, we do a PUT to store a descriptor file, send a presigned PUT URL back so they can upload their file, we then process the file and do a COPY+DELETE to move it out of the "not yet processed" stage and finally do another PUT to upload the resulting processed file.
Despite a lot of data, the storage bill is barely scratching $40, but we're at almost $700/mo on API calls.
Heyo, sounds not quite right if you wanna shoot me an email randhunt@amazon.com I'd be happy to try to figure it out. Your API calls shouldn't be that much more than the storage cost without a really strange access behavior. I don't know the answer off the top of my head but I'm down to try to find it out. GET's cost money and outbound bandwidth cost money but PUTS/POSTS should be neglible.
Ah, thanks for the extra info. We have several web apps that take user uploads and store it to S3 buckets here too, but still we don't see an adversely high request load. Not sure if the handshaking involved in getting a pre-signed URL will be upping your count?
We just use the AWS SDK on our Ruby back end. The user file is first uploaded to the (EC2) app server, then we use the SDK call to transfer it to the S3 bucket. Our storage and request costs are about equal at this stage.
Using Lambda/Node, I guess that the SDK is not an option and you have to use the pre-signed URL method? Or else use Python and the SDK library?
Any chance Google will match this price for their coldline storage? I was planning to archive a few TBs in Google coldline, but Glacier is now cheaper and has a sane retrieval pricing model.
In our startup, the biggest cost is bandwidth. We live in an age where videos can be created and streamed in seconds to millions of people. With so high cost for bandwidth, it's very difficult for bootstrapped startups to grow as quickly as those who raise VC funding. I hope AWS can reduce the outgoing bandwidth cost by 50%.
I do miss my little 3 TB seagate that I rsync'ed from home to my office. That said, what if someone broke into our little office (encrypted backup of course)?
The reality though is that no business would be comfortable with the plan being "our data is replicated offsite at Steve's house". And, other than maybe a pair of NAS boxes (one at each end) the cheapness of the solution assumes you have a great network connection between the two, a machine to plug it into, and only need a single hard drive. That is, how would you do offsite, active backup of say 50 TB?
Disclosure: I work on Google Cloud (and do offsite backup to GCS Nearline, that I'll move to Coldline when I get a minute to play with our new per-object lifecycle rules).
It's kind of funny when Feral hosting (I know I've been mentioning them a lot but trust me I'm in no way affiliated with them, I'm just satisfied customer, though they have trouble now with OVH) offers 2TB + 10Gbps unlimited bandwidth connection (it is unlimited, I abused it as much as I could with no warnings or anything) for 120 british pounds per year. That's good enough for me for mass storage of non-important data like Movies, Music, TV series, etc., that I convert then serve to users.
I Googled "Feral Hosting", clicked on the link to their web site [0], and was redirected to (presumably) their status page [1], which reads:
> tl;dr www.feralhosting.com is down, database lost, slots are up and will remain up. We're moving to the honour system for paying bills. ETA 25th November.
Not exactly a good first impression, to say the least.
If costs matter to you, e.g. for home backups, don't buy Glacier (and heck don't buy S3). A 3TB drive costs about 110eur, so if you'd have to buy a new one every year (you don't) that'd cost 110/3/1000/12=0.31 cents per gigabyte per month. Glacier? 7 times more expensive at 2.3ct.
Hardware is usually not a business' main cost but it does matter for home users, small businesses or startups that didn't get funded yet, some of whom might consider Tarsnap or some other online storage solution which uses Glacier at best and S3 at worst. Now you could suddenly be 7× cheaper off if you do upkeep yourself (read: buy a raspberry pi) and if you throw away drives after one year.
There is value to having off-premises replicated storage on something more durable than home-user targeted drives.
Google cloud nearline costs $0.12 per gigabyte-year with prices that will continue to fall. For a typical 500g hard drive that saw perhaps 700g of unique data, that's $84/year to have an outside-the-house replicated backup using something like Arq.
They are not the same thing at all. Glacier is the backup of the backup. It's where you go if your house burns down and the offsite backup at a relatives house is destroyed as well.
If you want to compare them, you have to buy space on a different continent, and store your backup there.
> A 3TB drive costs about 110eur, so if you'd have to buy a new one every year (you don't) that'd cost 110/3/1000/12=0.31 cents per gigabyte per month. Glacier? 7 times more expensive at 2.3ct.
Your pricing assumes that the drive is never powered.
I had such a setup when the Joplin 2011 tornado hit: http://www.ancell-ent.com/1715_Rex_Ave_127B_Joplin/images/ and I got off lightly. But the separate room my BackupPC hard drives were in was breached (see 302-2nd-bathroom-with-hole-of-unknown-origin) and those drives become fit only for Ontrack's $$$ recovery service, maybe, and one of my computers with e.g. my email was seriously damaged. The data on it was easily recovered from rsync.net's off site Denver location, who's service I love and will continue to use for my most important and "hot" data.
LTO (-4) tape had gotten capacious and cheap enough that I went back to tape (I'd outgrown DAT); if I didn't have a big sunk cost in a well working tape system and pool of tapes, which are very easy to put in e.g. a safe deposit box (they're a bit fragile, but nothing like a hard drive), I'd already be using one of S3, Glacier, or Backblaze, maybe even GCS since suddenly and irretrievably losing access to my backup data because a bot decided I was evil would not likely coincide with a total data loss at home (Google simply cannot be trusted if you're small fry like myself, as HN has been discussing as of late).
As Glacier has gotten sane enough to use without twisting your mind into a pretzel, with the new price reduction for slow retrieval I can seriously think about adding it to the mix and switching to it when my LTO-4 tape drive dies someday (e.g. ~3TiB for ~$12/month per my quick calculation just now), instead of buying another tape drive.
The biggest difference is that Glacier is still a "suspend/resume" type of accesss. However, if you just want to compare pricin, it'll depend on your access pattern and object sizes.
Retrieval in all Google Cloud Storage is instant and for Coldline is $.05/GB (and Nearline $.01/GB). If you value that instant access, it seems
the closest you'd get with the updates to Glacier is via the Expedited retrieval ($.03/GB and $.01/"request" which is per "Archive" in Glacier). Then you have to decide how much throughput you want to guarantee at $100/month for each 150MB/s. (It's naturally unclear since it was just announced what kind of best-effort throughput we're talking about without the provisioned capacity).
If you're never going to touch the bytes, and each Archive is big enough to make the 40 KB of metadata negligible then the new $.004/GB/month is a nice win over Coldline's $.007. Somewhere in between and one of the bulk/batch retrieval methods might be a better fit for you.
But again, it's still a bit of a challenge to go in and out of Glacier while Coldline (and Nearline and Standard) storage in GCS is a single, uniform API. That's worth a lot to me, and our customers. But if Glacier were a good fit for a problem you have, and you're talking about enough money to make the pain worth it, you should seriously consider it.
Disclosure: I work on Google Cloud, so naturally I'd want you to use GCS ;).
Hey, did you mean Reduced Redundancy (not Infrequent Access)?
I just noticed that the new pricing for Standard (2.3¢) is now less than the pricing for Reduced Redundancy (2.4¢)! So there appears to be no reason to use Reduced Redundancy anymore.
- The price change on glacier is a fucking disaster. They replaced the _single_ expensive glacier fee with the choice among 3 user selectable fee models (Standard, Expedited, Bulk). It's an absolute nightmare added on top of the current nightmare (e.g. try to understand the disks specifications & pricing. It takes months of learning).
I cannot follow the changes, too complicated. I cannot train my devs to understand glacier either, too much of a mess.
AWS if you read this: Please make your offers and your pricing simpler, NEVER more complicated.
(Even a single pricing option would be significantly better than that, even if its more expensive.)
I wrote the post after spending all of about 15 minutes learning about this new feature. We did make things simpler.
> AWS if you read this: Please make your offers and your pricing simpler, NEVER more complicated.
I am reading this. We made them simpler. Decide how quickly you need your data, express your need in the request, and that's that. If you read the post you will see that these options encompass three distinct customer use cases.
There are 3 pricing models for S3, the 3rd one (glacier) having 3 sub-pricing models to be chosen at time of requests.
I don't think you realize how insanely complex the entire S3 pricing model once you get out of the "standard price".
Maybe I just have too much empathy for my poor devs and ops who try to understand how much what they're doing is gonna cost them. It's only one full page, both sided, of text after all.
Is that really that much different than hard drives? No hard drive manufacturer uses the same standards to determine how much space there will actually be on the disk. You can get hard drives that spin at many different RPMs. You can get hard drives with many different connector types. Drives with different numbers of platters. An 8TB, 7200 RPM, SATA, Western Digital drive is not going to have the same seek time as a 1TB, 7200 RPM, SATA, Western Digital drive.
There are so many combinations of hard drives that will result in different performance for different situations all with different costs. Then you start talking about cold storage as well and you've moved into other media formats.
Just because there is a page worth of a pricing model doesn't mean AWS or any cloud provider is doing anything incorrectly. You're paying for on demand X and engineers who are going to utilize that should understand it as well as they would understand how to build an appropriate storage solution of their own. On demand just means now they don't have to take the time to design, implement and operate it themselves.
I'm not saying that S3 isn't complicated, or that the pricing model, even after the changes, isn't nuts.
I'm saying that anyone who thinks this is more complicated than it was does not understand just how crazy glacier pricing was before. Three static glacier pricing tiers is a lot better than the previously system which is so complex that earlier versions of the AWS pricing page just gave up and called it "free".
(Briefly: The old Glacier model's pricing wasn't based around data transfer, but on your automatically retroactively provisioned data transfer capacity based on your peak data transfer rate, billed on a sliding scale as if you'd maintained that rate for the entire month. There's probably a less intuitive way to bill people for downloading data, but if so I've never seen it. It was a system seemingly designed to prevent users from knowing what any given retrieval would cost.)
I appreciate your enthusiasm for our products, but this is a huge step forward for S3 customers. I think it's fair to say you find Glacier too complicated (it's part of the reason Coldline doesn't look like that), but to say this new setup is worse just isn't true.
I can assure you that if AWS had announced a simple flat-rate structure that was more expensive for everyone there would be plenty of unhappy existing customers. It's a tough call to balance simplicity and efficiency, by making this complex for you, they allow you to opt-in to the "screw it, give me my bytes back as fast as you can" model. You could just pretend that's the only option ;).
Again, I'm not criticizing your unhappiness with the complexity of Glacier. But I think it's only fair to recognize that the folks at AWS have just released a major improvement that provides real value for some customers.
This is a good change. The stupid former "rate-based" system meant that if you stored a large file (say, 10GB), you'd have had to pay quite a bit of money if you tried to retrieve just that file. After all, you can't control the rate at which you retrieve one file.
I'm ignorant to exactly how Glacier's API works, but what's stopping you from reading bytes from the socket at any rate you want (for a single file)? e.g. what is stopping me from doing 128KB/s with this code:
byte[] buffer = new byte[128*1024];
while(socket.read(buffer) != -1) {
// do something with buffer here
Thread.sleep(1000);
}
The bandwidth costs are so far out of line with what the network transfer actually costs, it just feels like price fixing between the major cloud players that nobody is drastically reducing those prices, only storage prices.
Charging 5 cents per gigabyte (at their maximum published discount level) is the equivalent to paying $16,000 per month for a 1 gigabit line. This does not count any operation costs either, which could add thousands in cost as well, depending on how you are using S3.
There are several providers that offer a unmetered 1gbps line PLUS a dedicated server for ~600-750/mo. Providers like OVH offer the bandwidth for as little as 100/month. ( https://www.ovh.com/us/dedicated-servers/bandwidth-upgrade.x... ) I am just not sure how amazon can justify a 160x price increase over OVH or a 30x increase over dedicated server + transfer.
For the time being, the best bet is to use S3 for your storage and then have a heavily caching non amazon CDN on top of it (like cloudflare) to save the ridiculous bandwidth costs.