Hacker News new | past | comments | ask | show | jobs | submit login
Amazon Snowball (amazon.com)
384 points by polmolea on Oct 7, 2015 | hide | past | favorite | 192 comments



But can you trust it?

When I returned to JPL after working at Google for a year I was tasked with evaluating a Google Search Appliance. We ultimately decided not to keep it, and so we had to erase the disks, which now contained sensitive data. The appliance had a "self-destruct" feature that supposedly erased all the data, but there was no way to verify it. After lengthy negotiations with Google (some people just have a hard time grasping the idea that just because a file has been deleted doesn't mean the data is actually gone) we eventually got them to agree to let us open the enclosure and take out the disks. Forensic analysis revealed that they had not in fact been erased.

Caveat emptor.


From: https://aws.amazon.com/blogs/aws/aws-importexport-snowball-t...

"...The data will be 256-bit encrypted on the host [running the Snowball client?] and stored on the appliance in encrypted form. The appliance can be hosted on a private subnet with limited network access."

So I assume the data is encrypted asymetrically.

"...ship it back to us for ingestion. We’ll decrypt the data [using the private key specified in the job,] and copy it to the S3 bucket(s) that you specified when you made your request[/job]. Then we’ll sanitize the appliance in accordance with NIST Special Publication 800-88 (Guidelines for Media Sanitization)."

That links to http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP...

There are a few different types of sanitisation (clear/purge/destroy), and Amazon doesn't specify which type. I assume they would go with "clear", and maybe in a few select places (I'd hope storage media) "purge".

"Clear" is scary though, as for network devices, it is only "full manufacturer’s reset to reset the router or switch back to its factory default settings", and for HDD's it is "Overwrite media by using organizationally approved and validated overwriting technologies/methods/tools. The Clear procedure should consist of at least one pass of writes with a fixed data value, such as all zeros. Multiple passes or more complex values may optionally be used".

So what vector do you want to protect? Accidental data egress shouldn't happen as the data is encrypted. However there are more interesting vectors, such as getting hold of the public key and injecting your own data into another companies buckets...


But that's the whole point: Can you trust that the box does what Amazon says it does? Because the Google box did not do what Google said it did, but if I hadn't been very insistent about it (to the point of having a number of people think I was being a total dick), we would never have known.


"Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway."

- Andrew S. Tanenbaum


You can drive that station wagon right up to your chosen rsync.net location:

http://www.rsync.net/products/oob.html


As usual, the pricing is not very friendly, and apparently designed to lock your data into AWS or exploit your weak negotiating position once you buy in.

While you can send in 50TB for $200, taking the same 50TB out costs an additional $1500 charge (50000 * 0.03).

[assuming they are not transferring the data over the Internet, the cost to AWS should be the same or cheaper for reading]


$1700 is amazingly cheap when you consider that it costs $4300 [0] to transfer that same 50TB out of EC2.

[0] https://aws.amazon.com/ec2/pricing/

$4300 = 10000 * $0.09 + 40000 * $0.085


... which is also wildly overpriced.

According to multiple sources, Internet transit in the US now costs less than 1$ for a 1 Mbps line for large deals, which translates to 1$ for 324GB, which translates to 0.003$ / GB.

Amazon charges 15-30 times that.

(it appears that traffic can be much more expensive in places other than the US and presumably Europe)


It may cost that much to buy that capacity, but it costs a lot more than that to run the large scale organizations (CAPEX/OPEX) that build and buy these services. You're not just paying for a pipe, you're paying for the corporation.


Exactly, I don't know why so many people seem to miss this. The charges may not reflect the cost of that particular service, but if one looks to the service as a whole, it's not that badly priced. Costs cover the infrastructure, which we reasonably expect to contain multiple redundancies, as well as a profit margin for the business.


> You're not just paying for a pipe, you're paying for the corporation.

You are paying their profit margin, yeah.

It is wildly overpriced, no matter how you look at it. Operating costs even when done on a much smaller and more inefficient scale than for AWS do not make the total cost for incremental bandwidth usage THAT much larger.


Are there any alternatives for the service that are cheaper, with the same reliability?


Don't bother, AWS defenders will continue to defend it and replace employees with a larger AWS bill and further lockin until they shoot themselves in the foot.


A large AWS bill (which actually gets cheaper over time!) is much easier/cheaper to get rid of than employees. AWS also won't get recruited by a competitor and come into your office wanting a ton more money.


AWS is the employee that sits there and learns the industry and every inch of the system you have for 5 years, and then one day you wake up and they're not at work. You read the news that morning and they just got 50 million to do their version of your company, and way better than yours.

Or they start getting older, don't keep the skills sharp enough, and die off. Now you've got to painfully convince them to help you train a new employee to keep the show running, or pay 10 fold your savings hiring the smartest in the world to fix it. But the systems too big, and by the time it's on the "latest and most popular cloud architecture with proprietary systems", you're irrelevant.

But you're right, they make the companies current CEO/CTO look good by cutting costs in the beginning, so who cares right?


I think (based on your child comment too) that you're conflating AWS and Amazon. Amazon sells things and also allows you to sell things, and that creates a conflict of interest, which has apperently caused problems. I am unsure how much of the problem was direct maliciousness on Amazon's part, admittedly - I suspect naivete on the part of some marketplace users, imagining that Amazon would never sell the same product as them, or would never lower the price of a competing product. Certainly it would seem obvious not to try and compete with Amazon on price or availability, you will always lose eventually.

However, AWS is a different thing altogether - it is a set of services that can be used to run parts of your business. Now, if your business involves simply reselling those services, or is predicated the availability and pricing of one of those services being a major part of the value for a service you sell (i.e. the value you add is marginal, with the majority of the product or service's value residing in the service AWS provides) then you are again in a situation where AWS may also decide to offer the same service, and will probably be able to do so more cheaply and profitably.

Again, this would not be a malicious act or directed attack on your business through inside information, and it seems naiive to assume so. You must have been able to determine a market existed and a need could be fulfilled profitably by providing this service, there is no reason a company like AWS could not reach the same conclusion with its vastly larger amount of resources. The possibility of this sort of thing happening should have been determined during due diligence and market analysis anyway, so it should not be a surprise if it happens.

But companies in traditional lines of business using AWS to save money, or using AWS to build a product where the value provided to the customer is inherent in the service provided, not the infrastructure used, are going to be fine. Nobody worries about the electricity company stealing your idea for a product run using electricity...


Hasn't really impacted Netflix and pretty much every other customer using AWS.


For now. If you want to see the future have a look at the debacle that just went on with prime/appletv/google. If they get a strong enough dominance they will. Given that the IoT is a rising thing, and Amazon is placing itself at the apex of the internets backbone and already has a strangle on the physical goods world, it should be easy to see the next 10 years if it continues. I can see the pitch now... imagine... a world where you can sell your hardware and put your servers all in the same stack! (at least until you POC it for us). If you want to see the future of that, have a look at companies that have had their physical goods ideas stolen and are now mass produced at Amazon for that "everyday low price" (and no that jab at them v walmart isn't an accident).

I was hoping somebody would bring up Netflix. Netflix is AWS's poster child and they know it. They need AWS still just as much as AWS needs them. The statement you just made proves that. They will do everything up to and including take a loss to keep Netflix around. I can assure you that your company will not be getting the same price quote or technical support that Netflix does unless they can bring them just as many sales by being a status symbol and marketing tool for them.

There seems to be a trend of people thinking "AWS is my friend". No. They are a company, and they exist to make money. Have we not all been bitten enough by this thought pattern and loyalty to learn the lesson of "stay flexible"? I'm not advocating that AWS is done away with entirely and nobody should use them ever. I'm advocating that putting your entire stack into their system is a bad idea and that using "black box" software as little as possible is a better approach. When it was just EC2 it was fine, you can build your stuff on an EC2 box with your favorite flavor of *nix and quickly throw boxes up elsewhere if things go south. Now I'm seeing companies put entire critical infrastructures off on Amazon pre built services like they've never seen a tech company go under, or a fad die, or strongarming with brute power.


AWS is already dominant, it's not even close between AWS and the next Cloud platform. Vendor Lock-in is a forgone conclusion. Meanwhile lots of companies are building successful, profitable applications on AWS.


It's not cloud vs cloud. It's cloud vs vps vs colo vs in-house/datacenters. Cloud is one option, and a heavily mis-used one for multiple purposes, security being a popular one this year.


> at least until you POC it for us

This is a complete non-issue for users of AWS unless they are essentially trying to resell AWS services.

> companies that have had their physical goods ideas stolen and are now mass produced at Amazon

AWS is not Amazon, they do different things - see my GP comment replying to you. I also wonder how many times this has actually happened (stealing ideas) in reality, versus the companies simply having an unsustainable business model, or an obvious product. It sounds like the classic case of looking for someone to blame for their own incompetence, and choosing Amazon instead of some other huge corporation or the government, which are also common scapegoats...


Amazon just went to war with Apple and Google. Do you really think they won't give Netflix the shaft when it suits their strategic interest?


> Do you really think they won't give Netflix the shaft when it suits their strategic interest?

If it suited their strategic interest, sure Netflix could get the shaft. Is that scenario likely to play out, seems unlikely. Netflix is one of AWS earliest and most prominent customers.

I would imagine Netflix would leave Amazon long before Amazon shifts their strategy to take out Netflix.


Their strategic needs of keeping AWS as a trusted platform override any potential gains from giving Netflix the shaft - particularly since they'd be stupid to assume that Netflix doesn't have a contingency plan.


Mainframe Marty shows up.


Who is mainframe marty?


What's the alternative?


Centurylink has a pretty formidable IaaS offering, it's very well priced and they charge very little (comparatively) for bandwidth. The only problem is their storage solution is extremely expensive but using S3 or soon Backblaze B2 as a storage layer is good enough for me.


So you're saying S3 is the alternative to S3 ... ?


I said their IaaS is a cheaper alternative to AWS IaaS with the exception of their S3 alternative. I suggested B2 is the cheapest alternative to S3 do you have a suggestion?


The Intercloud.


You're right, AWS just breaks even with their pricing.


> charges 15-30 times that.

I think you're going to be absolutely furious when you find out what the component costs of a bottle of any random drink is.


A few years ago, an enterprise network I helped run with about 40k users at several hundred locations cost something like $30-35 per user/month to operate. We had strong incentives to price below what an MSP would charge and beat them on 2 occasions.

About 40-50% of that cost was circuits and transit. The vast majority was tied up in labor and equipment costs. If I making money on the whole stack with the market control that AWS has, I would want at least 60% margins on the business -- it's not like locked in customers have easy options.


1 Mbps * 2 days = 21.6 gigabytes so sure you could transfer a lot of really stale data, but if latency is important the prices are dramatically higher. You also often end up paying for both the upload and download side if your need to regularly do large transfers.


My startup is working on this problem. We are working on an IaaS cloud designed for high-bandwidth users, with much lower pricing than amazon.


Careful. My previous startup was a storage company that competed with Amazon and Google when they were charging 0.10/gb and I calculated it should be around 0.02/gb. A few months after launch, they both realized it too and dropped their prices. Leads dried up overnight.

The high prices seem to persist until one day they don't.


Yea, we considered this - but if Amazon and Google compete with us I'll consider it a personal success :)


They compete against you whether they know you exist or not because your customers/prospects know Amazon/Google/etc exists.

It only takes one of the large players to break the pricing stalemate, and overnight they'll all follow to keep their position in the market.


I sort-of agree with you, but in this case I believe it will persist for years to come, as they have faced competitors charging far less for bandwidth for years already and pretty much ignored it.


I live and love AWS but I never understood why they don't tier bandwidth charges per costumer.


what does "overpriced" mean in this context? "overpriced" compared to what? is there another company offering a similar service for a lower price?


Compared to marginal cost.

There are companies offering prices much more similar to cost, for example:

- Hetzner.de servers/colocation offers additional traffic at 1.39 EUR/TB = 0.00158 $/GB

- DigitalOcean offers 1TB traffic with $5/month instances = 0.005 $/GB


The price of something is determined by how much people are willing to pay for it, not the marginal cost. We don't pay people based on the marginal cost to keep them alive.


Actually, in many jobs, we do. That’s why several countries have defined a minimum wage, and why so many people in society work minimum-wage jobs.


In America. There are countries without minimum wage that do just fine. (Singapore, and eg pre-2014 Germany.)


And the current German minimum wage is still only in effect in some industries, and in those no one worked below minimum wage before anyway.

It’s done literally nothing.


teraflop was right, it's $200 per snowball device job, which is currently limited to 50TB. Maybe that will change. You can order multiple jobs to import more data. You have 10 days to complete the transfer and ship it back and then it's $15 a day. $200 buys you 50TB per job, great encryption, and speedy migration. Sources: https://aws.amazon.com/blogs/aws/aws-importexport-snowball-t... and https://aws.amazon.com/importexport/pricing/


After reading those pages, I think you're wrong, and the $200 is actually per-device.

For one thing, the documentation says you can use multiple Snowball devices, but carefully tiptoes around saying whether or not there's an extra charge for doing so. All of the language that actually talks about pricing just says "the device", singular. For another thing, the screenshots of the "create a job" workflow are missing any way to specify that you want multiple devices. It sure looks like one job == one device.

(This reminds me of the pricing issues around AWS's Glacier service. It's not even that the pricing model itself is bad -- it's that the marketing is obfuscatory to the point of being arguably deceptive.)


You can request multiple devices at once. I didn't include that screen in the blog post.


Ah, good to know! I would edit my comment with a correction if the option was still available.


You are right. I think that the value add is that in addition to getting to use 50TB of storage you also get good encryption and usable software.


Well if you sent it in you already have a copy. Don't delete it and you can save the $1500.


I think the idea is that if one needs to recall the data from AWS it's due to something like data loss


If I lost 50TB of data and could fix it for $1500 I would consider myself the luckiest person on Earth.


Well, second luckiest I guess? The luckiest one wouldn't lose 50TB of data in the first place. ;)


Same service needs to run the other way for the same price. $200 to get all your data shipped to you on an appliance/hard drive.


The service is called AWS Import/Export so it seems you probably can get them to export the data to a snowball for you to get your data.


Right, you can, but you pay for export at $0.03 per GB.


i.e. Snowball is the price model of AWS.


plus $200 for the job


The e-ink display showing a shipping label is brilliant.


An AWS service with free shipping and a Kindle glued on. It's the ultimate synergy for Amazon.


This article says it's not e-ink but just a Kindle strapped to the side.

>It has a Kindle on the side, which functions as an automatic shipping label.

http://techcrunch.com/2015/10/07/amazon-launches-snowball-a-...


> This article says it's not e-ink but just a Kindle strapped to the side.

Guess what a Kindle's display is?


Unless i'm mis-judging the scale of this thing, that screen looks a fair bit smaller than a kindle. I think it's just techcrunch using "kindle" as a generic word for e-ink display.


The Amazon exec who introduced the Snowball during the keynote also called it included a "Kindle".


Perhaps a mini kindle, I estimate by the image that's about 2x4" max.


Kept scrolling and looking for a picture of said label, to no avail.



Weird - something about the form factor makes me want to violently toss it off a loading dock. Anyone know how much that would translate to in Gs? (The article says it will survive 6 Gs of shock.)


A whole lot. It depends on how fast it is going. See http://measurespeed.com/deceleration-calculator.php for a quick-and-dirty overview. Gravity accelerates at at 9.8m/s², so if it fell for one second before impacting the ground, and took (for instance) 0.05 seconds to deform before coming to a halt, it'd have experienced 8.91 Gs of shock.


> 0.05 seconds to deform before coming to a halt

I always found it amusing that hard drives were rated in the hundreds of Gs until someone reminded me that 'time to stop' when dropped on a hard surface was very short indeed...


One second is a long time. That's a drop of 16 feet?


For the purpose of conjecture, yes :-)


Really? Why? I mean, Kindles are pretty cheap, but sticky shipping labels are a few bucks for rolls of 500, and very hard to damage in transit.


Oh, this label is smart. It knows when to change.

If you had sticky shipping labels, you'd need some intelligence.


Especially when you consider the fact that they've been making kindle's for years... perfect way to merge existing Amazon technilogies.


Yes, this is the most interesting aspect of snowball.


An AWS version of a sneakernet.

https://en.wikipedia.org/wiki/Sneakernet


Yep, all I could think was "Wow, that's a good deal fancier than a station wagon full of tapes".

Cloud backup companies have had similar services for a while now, but it's nice to see AWS adopting it.


I have to admit, while it makes for good storage lock-in, I was impressed that they only charge 3 cents/GB to get the data back out.

Someone else in this thread thought $1500 was expensive to get 50TB back out. If you use this for disaster recovery, you could get all of your data back onsite quickly for a very low (comparative) cost, versus trying to provision high speed connectivity.


Don't forget, there's the ongoing storage cost at S3, which also adds up really quickly.


3 cents/GB is cheap. Go 1 cent a GB S3's infrequent access class (since you won't be incurring the charge for retrieval through S3, you'll be pulling back out through Snowball), and its even cheaper.

$10/TB/month? Where else can I store data reliably that cheap? (Yes, Backblaze is half that price. I hope they become a worthy adversary to AWS S3 to drive prices further down).


iCloud is $10/TB/mo, albeit for different use cases.


If you have moderate volume, you can beat S3 pricing with object solutions like DDN or others. It all depends on your data center capacity and power costs.


If it cost you about $1000 to buy a diskpack (4*6TB drives) you could create backups and send them to at least a half dozen locations for less money than using S3 to store that data.

Yes, S3 is cheap(ish). But given Snowball is a snapshot backup service, it's not comparatively cheaper than it would be to distribute that same data by creating a clone and sending it to a safe place.


That's not really how business IT works (unless you're sending tape off to Iron Mountain, which has its own costs and storage fees).

S3 is the cheapest "real" business storage option besides Backblaze's new storage offering. S3 can't be compared to shipping disks someplace where they sit offline.


If you are just using it for backup, you wouldn't use S3. You'd use Glacier.

What this offers is a useful way to get TBs of data up to Amazon easily, cheaply, and quickly.


If it is cold storage, you could use AWS Glacier (https://aws.amazon.com/glacier/pricing/) which is way cheaper than S3.


There was already a service where you could mail AWS removable hard disks and have them load it into S3 for you.

https://aws.amazon.com/blogs/aws/send-us-that-data/


Using the calculator for that service, I put in 50TB and 25 hard drives, and the estimated charges came out to $2435.75. The point where snowball is cheaper is fairly low. 4TB on 2 drives is 194.86. Basically more than 5TB over 2 or more drives and it's cheaper to use snowball. Plus you have to pay to have the drives returned back to you.


"Sneakernet" is an amusing term to me, because prior to the advent of widespread, high-speed Internet access, this was the only way to transfer large amounts of data.


Actually the term was invented long before Internet access was widely available, let alone high speed. It describes transfering data between computers located in the same room (or building) using floppy disks. The alternative was to use some kind of networking, like 10baseT coax ethernet, or one of the many other competing standards, that existed back in the early eighties.

And before the Internet was commonplace, people had to use data lines provisioned by the telephone company to link distant offices, and could also send data that way, as larger companies still do.


Never underestimate the bandwidth of a truck full of harddisks...!


They referenced this concept before introducing the product.



I'm curious about how they're going to approach this from the fraud perspective. This is a $200 charge for a device that has 50TB storage, which would probably cost you around $2000 to buy.

There's people out there that will sign a contract under a fake name / address with a phone provider and sell the phones, and the way the providers fight against it usually by running credit checks and verifying address against them. Ultimately, this is very hard to detect when it involves identity theft.


Let's say for the sake of argument that it costs about $100 per use to recover the machine from a labor/time perspective. That means they'll make back the initial investment on the machine in 20 uses. As long as the fraud rate is less than 5%, then this venture from a capital investment perspective still makes sense. Personally, if people intend on using fraud to get free hard drives it will probably be a minority of the population of people who order the services of this device. Plus, you need an AWS account for this.


It's almost literally a black box. That means that they can have all manner of location tracking directly in the box. Perhaps couple that with some type of auto-destruct of the hardware inside if someone tries to temper with it, and it becomes very unappealing to try to steal this thing.


You can rent $10,000+ of cameras/lenses on your credit card through camera rentals places, though they will take a deposit on your card. Perhaps after Amazon is burned a few times, they'll require a deposit.

http://www.borrowlenses.com/product/Canon-EOS-C300-Mark-II-E...


This isn't really all that useful unless you have a fairly large existing AWS deployment. If you try to open up a new account and order a 50TB storage device without provisioning anything else, they'll probably just refuse you. For normal users, amazon can just hold their AWS account hostage if they fail to return the device.


I can't imagine they care much. Snowball's value doesn't come from renting out these devices, it comes from companies dumping their data into AWS. If they lose a few devices they'll make it back in a couple months of S3 charges.


Couldn't you just put a hold for $2k and release it when the device is returned?


Yes, you could, the pricing page doesn't mention anything about it though, hence why I wonder.


Yes.


You still need a credit card attached to the account. A car rental company will give you a $30k vehicle for under $100 on just a credit card.


That's because you pick it up in person and they check / scan your id once you're there.


The XKCD about FedEx's bandwidth seems particularly appropriate: https://what-if.xkcd.com/31/


His MicroSD card calculations are off because it would take a month to insert an airplane-load of those cards one by one into some computer to copy them into your storage.

Which also makes me wonder why this Snowball device only does Ethernet. Seems like it should have an eSATA port.


Presumably they want a little more control over how data is written and stored on the device than they would get with a simple eSATA connected drive.

If they use a simple eSATA drive, they have to worry about formatting, layout, etc., whereas an ethernet connected drive with a bespoke client can hide/control all of that for Amazon.

It also gives them easier options for multi-drive, SSD caching, RAID, etc - things they may or may not use today but could without any impact on the end user.

* Also, it's 10Gb ethernet. And how many desktops do you have laying around that will recognize a 50Tb eSATA drive?


Maybe it'll be a 10GbE interface?


Ethernet is most ubiquitous. You don't need physical access to a server with data, just network access. Most environments outside of datacenters don't have 10GbE switches or NICs yet.

EDIT: It appears it supports 10GbE natively. I assume it'll also support lower Ethernet speeds.


> I assume it'll also support lower Ethernet speeds.

Yeah, ethernet has pretty much always downgraded well. That's why a 10GBASE-T interface is perfect :P

(I should have been more clear with my initialisms)


The math for the time to transfer comparison is interesting:

"Even with high-speed Internet connections, it can take months to transfer large amounts of data. For example, 100 terabytes of data will take more than 100 days to transfer over a dedicated 100 Mbps connection. That same transfer can be accomplished in less than one day, plus shipping time, using two Snowball appliances."

With a 100 Mbps connection it takes over 100 days [1] but with a 100 times faster connection (10 Gbps) it takes less than a day :)

[1] Assuming no network overhead it is 92.6 days


Good luck trying to saturate the 10Gbps port (on both the read and write sides) however.


Snowball is just a device with a 10Gbps port – so if you have 10Gbps, you won’t need Snowball ;)


In my last role we would often need to upload large amounts of data for our clients to AWS. When this data got into the terabytes we would ship a NAS box to our customer and then send that to Amazon. On more than one occasion Amazon fubar'ed the upload on their end (why would you move drives around in a RAID 5/6 array?). Maybe since this is AWS branded solution it will be more reliable.


This has interesting security implications for both sides. Is the device 100% offline or does it phone home when you connect it to your network or transmit any other data? What if someone gets the device and hacks it to scan Amazon's networks when sent back?


I'm gonna guess they have all sorts of physical tamper detection capabilities to prevent this. And perhaps a software load that gets wiped every time, so in case you find a bug in their software (iSCSI? NFS? whatever) it might be hard to escalate.


Tamper detection won't prevent anything. It would just be an indicator that something appears to have happened.

The software load that is wiped every time is a first, and extremely basic, line of defence.

Realistically I'd hope the OS is on a SD card that they can literally take out and throw away after they have the data off (you can pwn the micro-controller on an SD card) - and replace with a freshly baked card.


Presumably if their sensors say the system has been cracked open, they don't just ship it out to another user. (And they could have many layers of sensors, telling them if it was just via damage (hitting it with a forklift) or someone really getting in.) Considering the potential downside, I'm sure they've done some work here.


Most likely, they'll get the same results as if they were scanning from an EC2 machine. I doubt Amazon would put it in a trusted (V)LAN.


But the end user probably trusts the machine 99%, so what if you load it with something malicious, send it back to Amazon and wait for it to be sent to another customer and thus hack their network? That is if Amazon does not completely wipe their drives (Hide it somehow?).


This is 100% the scenario I was just imagining.

Obviously this device has been designed to be a multi-time use device.

Amazon definitely do NOT have physical control over this box. They would need to do a complete low level reflash of every single bit of firmware on there, every time it came back. That's not actually that inconceivable in a enterprise grade server, and I hope to see some interesting details about that.

But still... just imagine some of the fun HDD firmware hacks making their way onto this. Or NIC firmware. Or even just the embedded Kindle being rooted, and used to sniff out Wifi networks, report its location via 3G, etc.

Not to mention the obvious data recovery attacks if the disks have not been wiped to the highest levels.

I agree with parent comments, and assume Amazon would put this on its own untrusted VLAN when it comes back. But would they weight it first to see if any pwnies have been inserted into the box? Visually inspect inside to see if physical components have been removed to ensure the weight is actually the same, despite a pwnie inside?

I really hope Amazon put out advice to their customers on how to connect this - ideally it should just be on a point to point link to a sacrificial server containing the data.


Nobody needs to trust the device, actually.

Assume Amazon only loads it up with encrypted data over network untrusted, and the customer only takes off the encrypted data over untrusted network.


Depends on your definition of trust.

But yes, if you assume this device is treated as malicious at both ends - just like an unfiltered internet connection, but 10x worse - and that the client software that is doing the load/verification, or unload/verification is doing decent input validation, and your assumption that the user is doing their own encryption prior to transfering it to the device, I agree.


All the innovation at AWS is amazing. If they every stop charging by the Gigabyte for bandwidth and move to a flat model then I would be tempted to switch to them for all my sites.


Why would they do that? It makes sense to charge per gb. It gets people using them when they are small and its cheap to use S3/EC2 and then as they scale up its easier to just stick with AWS.


Peering links are charge on a consumed bandwidth basis (in a weird sliding scale fashion depending on your balance of egress vs ingress. The closer to 1:1 the better) rather than the fixed fee you pay your ISP.

I used to work for an ISP that would juggle which peering links the usenet servers were favouring to make things more favourable :D


> Snowball currently supports importing data to AWS. Exporting data out of AWS will be supported in a future release.

I'm interested in hearing about how they are going to do this.


Forgive me, but what exactly is 'petabyte-scale' about a 50TB NAS with a dog-slow link?


Say I wanted to do something similar, but move data around locally between two NAS appliances, but not incur a double disk charge - (i've got a pack of ~ 20 disks in NAS brand a, but i want to move to NAS brand b. The disks work in both, but need reformatting).

Does anyone know of a service where I could rent a 20TB device like snowball but not push it to S3?


I know Dell has glorified portable harddrive arrays in pelican styles cases for moving data between their SAN systems. Unfortunately I don't recall the name and I don't know if they are available for rental. If you have a Dell VAR, have them ask the Dell storage group.

Edit: Correction, it doesn't appear to be an array, just single device so 1.5-2TB max. Also basically targeted at this SAN solution only.


Yeah, I need to move 20TB in one go - so i can repurpose the disks.


https://www.synology.com/en-us/products/DS1515+#spec

5 * 6TB drives, RAID5 would put you at nearly 24TB of storage, if my brain is in gear.


He wants to rent something temporarily, so that he can copy stuff off the drives, take them out of the nas, format them in the new nas so it's compatible, copy stuff off the rented appliance to the new nas, ship the rented thing back off.

He wants to avoid having to have to buy enough drives, etc, to hold all of the data at once


Getting terabytes of data into AWS/Hotel California is great, but wish there was a way to get the data out just as quickly!


> AWS Import/Export Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS.


Yes, but "Currently, Snowball doesn't support exporting data."

http://docs.aws.amazon.com/AWSImportExport/latest/DG/introdu...


The picture on the page doesn't give me an accurate estimate of the size. They are actually 50 pounds (says on the blog)!


They demoed one on stage at re:invent just a bit ago, it is about the size of a desktop PC with hard plastic case all around it. I think he said it is ~47 lbs.


It's amazing that Amazon has been vertically integrated so many of both its own and external product/services into this appliances:

- Kindle's E-Ink

- AWS IAM / KMS/ SNS

- Amazon Carrier? (perhaps in the future?)

- GPS-powered chain-of-custody tracking (AWS working on it, perhaps Amazon Drone delivery in the future?)


When I export data from S3, what do I get for a given bucket? Just basically a file system? How is the metadata stored? What about object versions?

I'm curious what the end result looks like in doing this.


"Currently, Snowball doesn't support exporting data."

http://docs.aws.amazon.com/AWSImportExport/latest/DG/limits....


Wow. Amazon is really embracing the ship fast, break things, roll with the MVP even when it lacks 50% of the features.


Amazon wants to get companies' data into AWS so they're locked in. They don't want that data flowing the other way.


The thing about an MVP is that it is still a VP.


I thought they already allowed you to mail in hard drives?


They did but you had to buy them yourself and most of the times you didn't have a use for them after. With snowball they manage the process end to end.


What's interesting is that they don't mention this on the marketing page for snowball. As in "also if you want you can mail your own harddrive, see this page for details". While most would think "why rain on the parade of this new service by mentioning the old service" with Amazon it's more than that. It's this entire idea of weaning people off of legacy ways of doing things (with new names and new processes) so it's harder for any competitor to offer the same type of service, unique way of doing things or handholding. After all anyone can accept (in theory) a mailed in hard drive. Much harder to offer a solution like this with hardware and so on. So to me this is obviously deliberate and consistent with Amazon wanting to raise an entire generation on a new paradigm of getting things done.

Edit: And yes this way it's easier for them as well and removes "missing power supplies" (big deal actually by I get the point..)


There was also a lot of hassle in ensuring that you included the correct power adapter for each external drive, something that got complicated if you were shipping a lot of different models.


the E Ink shipping label will automatically update

Are you kidding me? Instead of a 25 cent shipping label they use a $100 e-ink display?

The display alone might get the device stolen.


They mentioned in the keynote they are using a kindle as the display; given that Amazon sells refurbished kindles for $65, I imagine the actual cost is a good bit lower. They also mentioned that the kindle also functions as the user-interface, so it's more than a glorified shipping label.


Tearing apart the system would yield significantly more valuable hardware than just the kindle.


I am sure most companies which are the target of this will welcome plugging in an outside device into their internal networks with open arms.


Relying on securing the perimeter is gonna blow up in their face sooner or later.

But in any case, you can just connect it to a sacrificial server on its own network that has nothing else on it.


Oh amazon please get your naming right. Why does every service have to be named so utterly confusing? I was expected some kind of cloud NLP library. Would have been cool.

Out of curiosity, does anyone have a real life example where they send petabytes over the wire? You know, outside the adult industry.


Lots of scientific experiments can easily generate huge data sets of raw measurements that need to be moved from the locations of measuring instruments back to the lab.


Named after the horse in Animal Farm?


I assume named after the fact that a snowball is a lump of cloud matter, in a flight ready state?


Named for what your AWS bill will do after you lock up all your data inside of it.


Isn't this a bit risky? What happens if someone keeps it? 50TB is a lot more than $200.


The customer won't keep it because the Snowball contract will put them on the hook for it.

Whether the shippers do or don't keep it, there are much more valuable things shipped every day. Where do you think the gargantuan Catalyst and Nexus switches come from? The line cards in those things probably cost more than a Snowball device.


I think the difference in price between what it costs to get it shipped and the market price of the components (even if you just strip out the hard drives) is what makes it fraud-prone.

From my day job, I know this is an issue for mobile operators


It might not be the best idea to steal from someone who has your full name and credit card, which Amazon does for AWS customers as far as I know.


The way this usually goes is that you pay with a prepaid debit card (that doesn't verify name or address).

Then you get it shipped to a name and address that are not yours, and you intercept the package before it gets there.

It becomes even harder to trace if the name / address / card all match, for instance if you've taken out a credit card on your elderly neighbor's name


It has a kindle attached to it. Every kindle has a built in cell phone transceiver. I would not be surprised if they use that to send periodic location data.


I'm sure they don't cost Amazon $200 to produce, and assumably they can just bill corporate customers for them if they're not returned. Some amount of loss is probably included in the cost to consumers.


"The data will be 256-bit encrypted on the host and stored on the appliance in encrypted form", key is stored in KMS, according to https://aws.amazon.com/blogs/aws/aws-importexport-snowball-t...


I think the poster was talking about someone simply gutting the appliance for the 50TB drives. With a bit of fraud here and there on top.


From the looks of it the delivery provider is UPS, if they lose it or a rogue employee guts it during transit, I think AWS will probably have clauses in their contracts to go at them full pelt with a lawsuit


I assume they have your credit card and will charge you a larger amount if they don't get it back.


They charge $15/day, so I think the idea is that they'll keep charging you this until it's returned. In fact, if you have a need to do a "slow fill", it might make sense to hang onto this device for months at a time before sending them away.


safer than google option of asking you to ship the un-encrypted drive.


encrypt it before shipping


then they can't import it.


Sneakernet as a Service!


They should have called it 'speedball'


the biggest message this sends is something nobody is talking about: amazon is not afraid of sending hardware on-premises.


Can this work with Arq?


interesting how much we need to pay for the GB


Another easy method to move your customers data to AWS - where I'm sure some three letter agenices feast over each newly arrived platter of data.

I'm still waiting for the big leak on how AWS cooperates with NSA at large.


Well, that's an unfortunate name.

http://www.urbandictionary.com/define.php?term=snowball

Reminiscent of when Microsoft called an overlay dialog a "floater" and all the South Africans and Brits in the room started laughing.

http://www.urbandictionary.com/define.php?term=floater (the 2nd definition)


I really don't think there's a word that doesn't mean /something/ dirty in some part of the world.


They could've called it what most people think it is: http://www.urbandictionary.com/define.php?term=sneakernet

No commonly understood dirty version of that, and I suspect most people would've thought it a very cool name as well as appropriate.


Might not be defensible as a trademark.


Yeah, but this is a bit like calling it Amazon Cumswapper. There's no other valid technical use of the word.


Well, apart from meaning a ball of snow, such as might be thrown by children at each other for fun. Really, that's where my mind went with the word 'snowball', much like the rest of the anglophone world. Yes, there may be slang and other juvenile repurposings of the word, but in normal, everyday (well, wintertime everyday) usage snowball means exactly what the OED says it means. Not sure why anyone would imagine otherwise?


> There's no other valid technical use of the word.

Now there is.


My first connection was "Snowball's chance in hell".


Remember when the iPad came out and everyone made tampon jokes? Fast forward 5 years...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: