When I returned to JPL after working at Google for a year I was tasked with evaluating a Google Search Appliance. We ultimately decided not to keep it, and so we had to erase the disks, which now contained sensitive data. The appliance had a "self-destruct" feature that supposedly erased all the data, but there was no way to verify it. After lengthy negotiations with Google (some people just have a hard time grasping the idea that just because a file has been deleted doesn't mean the data is actually gone) we eventually got them to agree to let us open the enclosure and take out the disks. Forensic analysis revealed that they had not in fact been erased.
"...The data will be 256-bit encrypted on the host [running the Snowball client?] and stored on the appliance in encrypted form. The appliance can be hosted on a private subnet with limited network access."
So I assume the data is encrypted asymetrically.
"...ship it back to us for ingestion. We’ll decrypt the data [using the private key specified in the job,] and copy it to the S3 bucket(s) that you specified when you made your request[/job]. Then we’ll sanitize the appliance in accordance with NIST Special Publication 800-88 (Guidelines for Media Sanitization)."
There are a few different types of sanitisation (clear/purge/destroy), and Amazon doesn't specify which type. I assume they would go with "clear", and maybe in a few select places (I'd hope storage media) "purge".
"Clear" is scary though, as for network devices, it is only "full manufacturer’s reset to reset the router or switch back to its factory default settings", and for HDD's it is "Overwrite media by using organizationally approved and validated overwriting technologies/methods/tools. The Clear procedure should consist of at least one pass of writes with a fixed data value, such as all zeros. Multiple passes or more complex values may optionally be used".
So what vector do you want to protect? Accidental data egress shouldn't happen as the data is encrypted. However there are more interesting vectors, such as getting hold of the public key and injecting your own data into another companies buckets...
But that's the whole point: Can you trust that the box does what Amazon says it does? Because the Google box did not do what Google said it did, but if I hadn't been very insistent about it (to the point of having a number of people think I was being a total dick), we would never have known.
As usual, the pricing is not very friendly, and apparently designed to lock your data into AWS or exploit your weak negotiating position once you buy in.
While you can send in 50TB for $200, taking the same 50TB out costs an additional $1500 charge (50000 * 0.03).
[assuming they are not transferring the data over the Internet, the cost to AWS should be the same or cheaper for reading]
According to multiple sources, Internet transit in the US now costs less than 1$ for a 1 Mbps line for large deals, which translates to 1$ for 324GB, which translates to 0.003$ / GB.
Amazon charges 15-30 times that.
(it appears that traffic can be much more expensive in places other than the US and presumably Europe)
It may cost that much to buy that capacity, but it costs a lot more than that to run the large scale organizations (CAPEX/OPEX) that build and buy these services. You're not just paying for a pipe, you're paying for the corporation.
Exactly, I don't know why so many people seem to miss this. The charges may not reflect the cost of that particular service, but if one looks to the service as a whole, it's not that badly priced. Costs cover the infrastructure, which we reasonably expect to contain multiple redundancies, as well as a profit margin for the business.
> You're not just paying for a pipe, you're paying for the corporation.
You are paying their profit margin, yeah.
It is wildly overpriced, no matter how you look at it. Operating costs even when done on a much smaller and more inefficient scale than for AWS do not make the total cost for incremental bandwidth usage THAT much larger.
Don't bother, AWS defenders will continue to defend it and replace employees with a larger AWS bill and further lockin until they shoot themselves in the foot.
A large AWS bill (which actually gets cheaper over time!) is much easier/cheaper to get rid of than employees. AWS also won't get recruited by a competitor and come into your office wanting a ton more money.
AWS is the employee that sits there and learns the industry and every inch of the system you have for 5 years, and then one day you wake up and they're not at work. You read the news that morning and they just got 50 million to do their version of your company, and way better than yours.
Or they start getting older, don't keep the skills sharp enough, and die off. Now you've got to painfully convince them to help you train a new employee to keep the show running, or pay 10 fold your savings hiring the smartest in the world to fix it. But the systems too big, and by the time it's on the "latest and most popular cloud architecture with proprietary systems", you're irrelevant.
But you're right, they make the companies current CEO/CTO look good by cutting costs in the beginning, so who cares right?
I think (based on your child comment too) that you're conflating AWS and Amazon. Amazon sells things and also allows you to sell things, and that creates a conflict of interest, which has apperently caused problems. I am unsure how much of the problem was direct maliciousness on Amazon's part, admittedly - I suspect naivete on the part of some marketplace users, imagining that Amazon would never sell the same product as them, or would never lower the price of a competing product. Certainly it would seem obvious not to try and compete with Amazon on price or availability, you will always lose eventually.
However, AWS is a different thing altogether - it is a set of services that can be used to run parts of your business. Now, if your business involves simply reselling those services, or is predicated the availability and pricing of one of those services being a major part of the value for a service you sell (i.e. the value you add is marginal, with the majority of the product or service's value residing in the service AWS provides) then you are again in a situation where AWS may also decide to offer the same service, and will probably be able to do so more cheaply and profitably.
Again, this would not be a malicious act or directed attack on your business through inside information, and it seems naiive to assume so. You must have been able to determine a market existed and a need could be fulfilled profitably by providing this service, there is no reason a company like AWS could not reach the same conclusion with its vastly larger amount of resources. The possibility of this sort of thing happening should have been determined during due diligence and market analysis anyway, so it should not be a surprise if it happens.
But companies in traditional lines of business using AWS to save money, or using AWS to build a product where the value provided to the customer is inherent in the service provided, not the infrastructure used, are going to be fine. Nobody worries about the electricity company stealing your idea for a product run using electricity...
For now. If you want to see the future have a look at the debacle that just went on with prime/appletv/google. If they get a strong enough dominance they will. Given that the IoT is a rising thing, and Amazon is placing itself at the apex of the internets backbone and already has a strangle on the physical goods world, it should be easy to see the next 10 years if it continues. I can see the pitch now... imagine... a world where you can sell your hardware and put your servers all in the same stack! (at least until you POC it for us). If you want to see the future of that, have a look at companies that have had their physical goods ideas stolen and are now mass produced at Amazon for that "everyday low price" (and no that jab at them v walmart isn't an accident).
I was hoping somebody would bring up Netflix. Netflix is AWS's poster child and they know it. They need AWS still just as much as AWS needs them. The statement you just made proves that. They will do everything up to and including take a loss to keep Netflix around. I can assure you that your company will not be getting the same price quote or technical support that Netflix does unless they can bring them just as many sales by being a status symbol and marketing tool for them.
There seems to be a trend of people thinking "AWS is my friend". No. They are a company, and they exist to make money. Have we not all been bitten enough by this thought pattern and loyalty to learn the lesson of "stay flexible"? I'm not advocating that AWS is done away with entirely and nobody should use them ever. I'm advocating that putting your entire stack into their system is a bad idea and that using "black box" software as little as possible is a better approach. When it was just EC2 it was fine, you can build your stuff on an EC2 box with your favorite flavor of *nix and quickly throw boxes up elsewhere if things go south. Now I'm seeing companies put entire critical infrastructures off on Amazon pre built services like they've never seen a tech company go under, or a fad die, or strongarming with brute power.
AWS is already dominant, it's not even close between AWS and the next Cloud platform. Vendor Lock-in is a forgone conclusion. Meanwhile lots of companies are building successful, profitable applications on AWS.
It's not cloud vs cloud. It's cloud vs vps vs colo vs in-house/datacenters. Cloud is one option, and a heavily mis-used one for multiple purposes, security being a popular one this year.
This is a complete non-issue for users of AWS unless they are essentially trying to resell AWS services.
> companies that have had their physical goods ideas stolen and are now mass produced at Amazon
AWS is not Amazon, they do different things - see my GP comment replying to you. I also wonder how many times this has actually happened (stealing ideas) in reality, versus the companies simply having an unsustainable business model, or an obvious product. It sounds like the classic case of looking for someone to blame for their own incompetence, and choosing Amazon instead of some other huge corporation or the government, which are also common scapegoats...
> Do you really think they won't give Netflix the shaft when it suits their strategic interest?
If it suited their strategic interest, sure Netflix could get the shaft. Is that scenario likely to play out, seems unlikely. Netflix is one of AWS earliest and most prominent customers.
I would imagine Netflix would leave Amazon long before Amazon shifts their strategy to take out Netflix.
Their strategic needs of keeping AWS as a trusted platform override any potential gains from giving Netflix the shaft - particularly since they'd be stupid to assume that Netflix doesn't have a contingency plan.
Centurylink has a pretty formidable IaaS offering, it's very well priced and they charge very little (comparatively) for bandwidth. The only problem is their storage solution is extremely expensive but using S3 or soon Backblaze B2 as a storage layer is good enough for me.
I said their IaaS is a cheaper alternative to AWS IaaS with the exception of their S3 alternative. I suggested B2 is the cheapest alternative to S3 do you have a suggestion?
A few years ago, an enterprise network I helped run with about 40k users at several hundred locations cost something like $30-35 per user/month to operate. We had strong incentives to price below what an MSP would charge and beat them on 2 occasions.
About 40-50% of that cost was circuits and transit. The vast majority was tied up in labor and equipment costs. If I making money on the whole stack with the market control that AWS has, I would want at least 60% margins on the business -- it's not like locked in customers have easy options.
1 Mbps * 2 days = 21.6 gigabytes so sure you could transfer a lot of really stale data, but if latency is important the prices are dramatically higher. You also often end up paying for both the upload and download side if your need to regularly do large transfers.
Careful. My previous startup was a storage company that competed with Amazon and Google when they were charging 0.10/gb and I calculated it should be around 0.02/gb. A few months after launch, they both realized it too and dropped their prices. Leads dried up overnight.
The high prices seem to persist until one day they don't.
I sort-of agree with you, but in this case I believe it will persist for years to come, as they have faced competitors charging far less for bandwidth for years already and pretty much ignored it.
The price of something is determined by how much people are willing to pay for it, not the marginal cost. We don't pay people based on the marginal cost to keep them alive.
teraflop was right, it's $200 per snowball device job, which is currently limited to 50TB. Maybe that will change. You can order multiple jobs to import more data. You have 10 days to complete the transfer and ship it back and then it's $15 a day. $200 buys you 50TB per job, great encryption, and speedy migration. Sources:
https://aws.amazon.com/blogs/aws/aws-importexport-snowball-t... and
https://aws.amazon.com/importexport/pricing/
After reading those pages, I think you're wrong, and the $200 is actually per-device.
For one thing, the documentation says you can use multiple Snowball devices, but carefully tiptoes around saying whether or not there's an extra charge for doing so. All of the language that actually talks about pricing just says "the device", singular. For another thing, the screenshots of the "create a job" workflow are missing any way to specify that you want multiple devices. It sure looks like one job == one device.
(This reminds me of the pricing issues around AWS's Glacier service. It's not even that the pricing model itself is bad -- it's that the marketing is obfuscatory to the point of being arguably deceptive.)
Unless i'm mis-judging the scale of this thing, that screen looks a fair bit smaller than a kindle. I think it's just techcrunch using "kindle" as a generic word for e-ink display.
Weird - something about the form factor makes me want to violently toss it off a loading dock. Anyone know how much that would translate to in Gs? (The article says it will survive 6 Gs of shock.)
A whole lot. It depends on how fast it is going. See http://measurespeed.com/deceleration-calculator.php for a quick-and-dirty overview. Gravity accelerates at at 9.8m/s², so if it fell for one second before impacting the ground, and took (for instance) 0.05 seconds to deform before coming to a halt, it'd have experienced 8.91 Gs of shock.
I always found it amusing that hard drives were rated in the hundreds of Gs until someone reminded me that 'time to stop' when dropped on a hard surface was very short indeed...
I have to admit, while it makes for good storage lock-in, I was impressed that they only charge 3 cents/GB to get the data back out.
Someone else in this thread thought $1500 was expensive to get 50TB back out. If you use this for disaster recovery, you could get all of your data back onsite quickly for a very low (comparative) cost, versus trying to provision high speed connectivity.
3 cents/GB is cheap. Go 1 cent a GB S3's infrequent access class (since you won't be incurring the charge for retrieval through S3, you'll be pulling back out through Snowball), and its even cheaper.
$10/TB/month? Where else can I store data reliably that cheap? (Yes, Backblaze is half that price. I hope they become a worthy adversary to AWS S3 to drive prices further down).
If you have moderate volume, you can beat S3 pricing with object solutions like DDN or others. It all depends on your data center capacity and power costs.
If it cost you about $1000 to buy a diskpack (4*6TB drives) you could create backups and send them to at least a half dozen locations for less money than using S3 to store that data.
Yes, S3 is cheap(ish). But given Snowball is a snapshot backup service, it's not comparatively cheaper than it would be to distribute that same data by creating a clone and sending it to a safe place.
That's not really how business IT works (unless you're sending tape off to Iron Mountain, which has its own costs and storage fees).
S3 is the cheapest "real" business storage option besides Backblaze's new storage offering. S3 can't be compared to shipping disks someplace where they sit offline.
Using the calculator for that service, I put in 50TB and 25 hard drives, and the estimated charges came out to $2435.75.
The point where snowball is cheaper is fairly low. 4TB on 2 drives is 194.86. Basically more than 5TB over 2 or more drives and it's cheaper to use snowball. Plus you have to pay to have the drives returned back to you.
"Sneakernet" is an amusing term to me, because prior to the advent of widespread, high-speed Internet access, this was the only way to transfer large amounts of data.
Actually the term was invented long before Internet access was widely available, let alone high speed. It describes transfering data between computers located in the same room (or building) using floppy disks. The alternative was to use some kind of networking, like 10baseT coax ethernet, or one of the many other competing standards, that existed back in the early eighties.
And before the Internet was commonplace, people had to use data lines provisioned by the telephone company to link distant offices, and could also send data that way, as larger companies still do.
I'm curious about how they're going to approach this from the fraud perspective. This is a $200 charge for a device that has 50TB storage, which would probably cost you around $2000 to buy.
There's people out there that will sign a contract under a fake name / address with a phone provider and sell the phones, and the way the providers fight against it usually by running credit checks and verifying address against them. Ultimately, this is very hard to detect when it involves identity theft.
Let's say for the sake of argument that it costs about $100 per use to recover the machine from a labor/time perspective. That means they'll make back the initial investment on the machine in 20 uses. As long as the fraud rate is less than 5%, then this venture from a capital investment perspective still makes sense. Personally, if people intend on using fraud to get free hard drives it will probably be a minority of the population of people who order the services of this device. Plus, you need an AWS account for this.
It's almost literally a black box. That means that they can have all manner of location tracking directly in the box. Perhaps couple that with some type of auto-destruct of the hardware inside if someone tries to temper with it, and it becomes very unappealing to try to steal this thing.
You can rent $10,000+ of cameras/lenses on your credit card through camera rentals places, though they will take a deposit on your card. Perhaps after Amazon is burned a few times, they'll require a deposit.
This isn't really all that useful unless you have a fairly large existing AWS deployment. If you try to open up a new account and order a 50TB storage device without provisioning anything else, they'll probably just refuse you. For normal users, amazon can just hold their AWS account hostage if they fail to return the device.
I can't imagine they care much. Snowball's value doesn't come from renting out these devices, it comes from companies dumping their data into AWS. If they lose a few devices they'll make it back in a couple months of S3 charges.
His MicroSD card calculations are off because it would take a month to insert an airplane-load of those cards one by one into some computer to copy them into your storage.
Which also makes me wonder why this Snowball device only does Ethernet. Seems like it should have an eSATA port.
Presumably they want a little more control over how data is written and stored on the device than they would get with a simple eSATA connected drive.
If they use a simple eSATA drive, they have to worry about formatting, layout, etc., whereas an ethernet connected drive with a bespoke client can hide/control all of that for Amazon.
It also gives them easier options for multi-drive, SSD caching, RAID, etc - things they may or may not use today but could without any impact on the end user.
* Also, it's 10Gb ethernet. And how many desktops do you have laying around that will recognize a 50Tb eSATA drive?
Ethernet is most ubiquitous. You don't need physical access to a server with data, just network access. Most environments outside of datacenters don't have 10GbE switches or NICs yet.
EDIT: It appears it supports 10GbE natively. I assume it'll also support lower Ethernet speeds.
The math for the time to transfer comparison is interesting:
"Even with high-speed Internet connections, it can take months to transfer large amounts of data. For example, 100 terabytes of data will take more than 100 days to transfer over a dedicated 100 Mbps connection. That same transfer can be accomplished in less than one day, plus shipping time, using two Snowball appliances."
With a 100 Mbps connection it takes over 100 days [1] but with a 100 times faster connection (10 Gbps) it takes less than a day :)
In my last role we would often need to upload large amounts of data for our clients to AWS. When this data got into the terabytes we would ship a NAS box to our customer and then send that to Amazon. On more than one occasion Amazon fubar'ed the upload on their end (why would you move drives around in a RAID 5/6 array?). Maybe since this is AWS branded solution it will be more reliable.
This has interesting security implications for both sides. Is the device 100% offline or does it phone home when you connect it to your network or transmit any other data? What if someone gets the device and hacks it to scan Amazon's networks when sent back?
I'm gonna guess they have all sorts of physical tamper detection capabilities to prevent this. And perhaps a software load that gets wiped every time, so in case you find a bug in their software (iSCSI? NFS? whatever) it might be hard to escalate.
Tamper detection won't prevent anything. It would just be an indicator that something appears to have happened.
The software load that is wiped every time is a first, and extremely basic, line of defence.
Realistically I'd hope the OS is on a SD card that they can literally take out and throw away after they have the data off (you can pwn the micro-controller on an SD card) - and replace with a freshly baked card.
Presumably if their sensors say the system has been cracked open, they don't just ship it out to another user. (And they could have many layers of sensors, telling them if it was just via damage (hitting it with a forklift) or someone really getting in.) Considering the potential downside, I'm sure they've done some work here.
But the end user probably trusts the machine 99%, so what if you load it with something malicious, send it back to Amazon and wait for it to be sent to another customer and thus hack their network? That is if Amazon does not completely wipe their drives (Hide it somehow?).
Obviously this device has been designed to be a multi-time use device.
Amazon definitely do NOT have physical control over this box. They would need to do a complete low level reflash of every single bit of firmware on there, every time it came back. That's not actually that inconceivable in a enterprise grade server, and I hope to see some interesting details about that.
But still... just imagine some of the fun HDD firmware hacks making their way onto this. Or NIC firmware. Or even just the embedded Kindle being rooted, and used to sniff out Wifi networks, report its location via 3G, etc.
Not to mention the obvious data recovery attacks if the disks have not been wiped to the highest levels.
I agree with parent comments, and assume Amazon would put this on its own untrusted VLAN when it comes back. But would they weight it first to see if any pwnies have been inserted into the box? Visually inspect inside to see if physical components have been removed to ensure the weight is actually the same, despite a pwnie inside?
I really hope Amazon put out advice to their customers on how to connect this - ideally it should just be on a point to point link to a sacrificial server containing the data.
But yes, if you assume this device is treated as malicious at both ends - just like an unfiltered internet connection, but 10x worse - and that the client software that is doing the load/verification, or unload/verification is doing decent input validation, and your assumption that the user is doing their own encryption prior to transfering it to the device, I agree.
All the innovation at AWS is amazing. If they every stop charging by the Gigabyte for bandwidth and move to a flat model then I would be tempted to switch to them for all my sites.
Why would they do that? It makes sense to charge per gb. It gets people using them when they are small and its cheap to use S3/EC2 and then as they scale up its easier to just stick with AWS.
Peering links are charge on a consumed bandwidth basis (in a weird sliding scale fashion depending on your balance of egress vs ingress. The closer to 1:1 the better) rather than the fixed fee you pay your ISP.
I used to work for an ISP that would juggle which peering links the usenet servers were favouring to make things more favourable :D
Say I wanted to do something similar, but move data around locally between two NAS appliances, but not incur a double disk charge - (i've got a pack of ~ 20 disks in NAS brand a, but i want to move to NAS brand b. The disks work in both, but need reformatting).
Does anyone know of a service where I could rent a 20TB device like snowball but not push it to S3?
I know Dell has glorified portable harddrive arrays in pelican styles cases for moving data between their SAN systems. Unfortunately I don't recall the name and I don't know if they are available for rental. If you have a Dell VAR, have them ask the Dell storage group.
Edit: Correction, it doesn't appear to be an array, just single device so 1.5-2TB max. Also basically targeted at this SAN solution only.
He wants to rent something temporarily, so that he can copy stuff off the drives, take them out of the nas, format them in the new nas so it's compatible, copy stuff off the rented appliance to the new nas, ship the rented thing back off.
He wants to avoid having to have to buy enough drives, etc, to hold all of the data at once
> AWS Import/Export Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS.
They demoed one on stage at re:invent just a bit ago, it is about the size of a desktop PC with hard plastic case all around it. I think he said it is ~47 lbs.
They did but you had to buy them yourself and most of the times you didn't have a use for them after. With snowball they manage the process end to end.
What's interesting is that they don't mention this on the marketing page for snowball. As in "also if you want you can mail your own harddrive, see this page for details". While most would think "why rain on the parade of this new service by mentioning the old service" with Amazon it's more than that. It's this entire idea of weaning people off of legacy ways of doing things (with new names and new processes) so it's harder for any competitor to offer the same type of service, unique way of doing things or handholding. After all anyone can accept (in theory) a mailed in hard drive. Much harder to offer a solution like this with hardware and so on. So to me this is obviously deliberate and consistent with Amazon wanting to raise an entire generation on a new paradigm of getting things done.
Edit: And yes this way it's easier for them as well and removes "missing power supplies" (big deal actually by I get the point..)
There was also a lot of hassle in ensuring that you included the correct power adapter for each external drive, something that got complicated if you were shipping a lot of different models.
They mentioned in the keynote they are using a kindle as the display; given that Amazon sells refurbished kindles for $65, I imagine the actual cost is a good bit lower. They also mentioned that the kindle also functions as the user-interface, so it's more than a glorified shipping label.
Oh amazon please get your naming right. Why does every service have to be named so utterly confusing? I was expected some kind of cloud NLP library. Would have been cool.
Out of curiosity, does anyone have a real life example where they send petabytes over the wire? You know, outside the adult industry.
Lots of scientific experiments can easily generate huge data sets of raw measurements that need to be moved from the locations of measuring instruments back to the lab.
The customer won't keep it because the Snowball contract will put them on the hook for it.
Whether the shippers do or don't keep it, there are much more valuable things shipped every day. Where do you think the gargantuan Catalyst and Nexus switches come from? The line cards in those things probably cost more than a Snowball device.
I think the difference in price between what it costs to get it shipped and the market price of the components (even if you just strip out the hard drives) is what makes it fraud-prone.
From my day job, I know this is an issue for mobile operators
It has a kindle attached to it. Every kindle has a built in cell phone transceiver. I would not be surprised if they use that to send periodic location data.
I'm sure they don't cost Amazon $200 to produce, and assumably they can just bill corporate customers for them if they're not returned. Some amount of loss is probably included in the cost to consumers.
From the looks of it the delivery provider is UPS, if they lose it or a rogue employee guts it during transit, I think AWS will probably have clauses in their contracts to go at them full pelt with a lawsuit
They charge $15/day, so I think the idea is that they'll keep charging you this until it's returned. In fact, if you have a need to do a "slow fill", it might make sense to hang onto this device for months at a time before sending them away.
Well, apart from meaning a ball of snow, such as might be thrown by children at each other for fun. Really, that's where my mind went with the word 'snowball', much like the rest of the anglophone world. Yes, there may be slang and other juvenile repurposings of the word, but in normal, everyday (well, wintertime everyday) usage snowball means exactly what the OED says it means. Not sure why anyone would imagine otherwise?
When I returned to JPL after working at Google for a year I was tasked with evaluating a Google Search Appliance. We ultimately decided not to keep it, and so we had to erase the disks, which now contained sensitive data. The appliance had a "self-destruct" feature that supposedly erased all the data, but there was no way to verify it. After lengthy negotiations with Google (some people just have a hard time grasping the idea that just because a file has been deleted doesn't mean the data is actually gone) we eventually got them to agree to let us open the enclosure and take out the disks. Forensic analysis revealed that they had not in fact been erased.
Caveat emptor.