Or a shipping container filled with hard drives. Conservatively speaking, you could easily get more than 60 PB in a typical shipping container and have plenty of room left over for padding, using 3.5" 2 TB hard drives. Assume it takes a leisurely week to get from, say, one coast of the US to the other; that's a little over 100 Gbit/s sustained throughput.
I've been rounding down here; the real bandwidth would be quite a bit higher.
If you have to copy to the hard drives and copy from the hard drives on the other end, you might be limited to the total IO bandwidth of however many hard drives you can read/write * speed of IO bus.
Not too long ago I had realized that my 1Gbps Ethernet network lets me have a higher sustain data throughput than my 5400 rpm laptop hard drive. You would have the same problem with the crate of hard drives depending how many times you have to copy the data...
Yep, in practice your HDs would be in NetApp shelves or equivalent, and you would also need to account for the volume of the NetApp heads. That reduces the density by 5-10x. Then there is also the overhead of RAID-DP.
Or you could stuff more than one container on a boat. The Emma Mærsk, one of the largest ships, has over 420,000 m^3 of space, which could hold 81 ZB (8.1 million PB) if Retric's (sibling) comment is correct, moving data at just over 1 Eb (exabit) per second. According to Wolfram Alpha, that's big enough to hold the entirety of human knowledge 7000 times.
I looked incredulously at your dimensions of the 3.5 inch HDD, only to search for the correct ones and found the exact same specs on Wikipedia, then realizing that's probably where you got yours. Looking at hard drive manufacturers' sites, I see that that is in fact the correct dimensions.
What does the 3.5 inch figure represent, then? The size of the plates?
The 3.5 inch figure represents the floppy, remember that? :)
A "3.5in HDD" is a hard disk that fits into the bay originally designed to hold a drive for 3.5" floppy disks.
I thought it wass due to the fact that the platters are 3.5", just as the 2.5" hd has 2.5" size platters.
But no, looked it up, 3.5" have platter sizes of 3.74" apparently, 5.25" had 5.12" platters. Which makes sense when you look at a hdd with the top off.
I feel like there's a market there in a post network-neutrality world. Backups are a great example of internet traffic that is large but inherently time insensitive so long as its secure. I imagine that a lot of international email might also be similarly time insensitive. If it were just a switch between "picodollars for arrival in three days" to "nanodollars for arrival in 10 seconds" I wonder how often people would leave it on sneakernet.
Why would you want to use flashdrives to transport files when you can just use the campus networks? We used DC++ when I was at uni and that worked nicely.
Same here story here too... after hearing multiple friends busted for torrenting, I setting up a DC++ server under my dorm desk and the rest is history. With some help with friends in each dorm area on campus, we mapped out the IP address blocks for the residential network and white-listed all traffic to the server to prevent the heat that the UCLA DC++ servers were experiencing (circa 2003-2004).
The topper however is how almost 6 years later I was referred to and hired by a bio tech start up based on the fact that the CEO remembered me as the DC++ admin during his days in the dorms.
Kids these days... I remember when DirectConnect was a crappy VB program that crashed continuously, and DC++ was just getting started as a replacement (they didn't even have a server at the time, only a client).
We did the same, but used iTunes shares. My machine ran a really fantastic service called mt-daapd, which would periodically scan a folder, then offer everything up on iTunes (I should clarify. iTunes was used as a client to listen to the music, mt-daapd ran on a linux box under my desk).
There was a software called "ourtunes" that would let you download things from these shares. It was amazing. People would name their libraries things like "room 204"...it had a great community feel to it :).
A part of me faded away when Apple updated iTunes in 2006 and effectively blocked MyTunes/OurTunes [1]. What a great facilitator it was for introducing you to cool people with a variety of musical tastes.
Kudos for taking the effort to do it. Nevertheless I'd still prefer the network for transport. As long as you are allowed an encrypted connection with your buddy's machine there is no reason to resort to flashdrives.
This reminded me a little bit of the xbox-based Internet alternative from "Little Brother." Trying to imagine a situation in which this could be useful, I could only think of scenarios involving political oppression.
USB drives or memory are in the range of 10MB/s write and 15MB/s read.
Since you need to both write & read to transfer a file, you are looking at about 7MB/s file transfer which is about 50Mb/s (wireless order of magnitude).
Combined with the inconvenience, the benefits of this system are questionable.
Sounds inspiring. I'm already thinking of some kind of indexed communications medium via a network of small, cheap and disposable usb sticks. Something in the style of freenet and tor.
The jargon file's "tape" reference is likely to mean paper-tape than magnetic 9-track, so the term possibly goes back further to card-punch days. (I first heard the term in 1985 at my first job during high school: a PC consulting shop.)
I'd hazard a guess that the term was lifted from Navy or other such source far predating movement of electronic bits.
Sounds amusing, but stuff like this used to happen before the event of high bandwidth internet a decade or so ago. In the olden days it was physically swapping cassette tapes, then later posting floppy disks. Flash drives are just the modern floppy disk/CD.
So the goal of this is for the program to tell the person with files to load them up on a hard disk and then give them to the person requesting the files, who then picks them up from the dropbox, loads them on her computer, and then puts the flash drive back into the dropbox?
If so, then I guess all this really does is gives you a nice way of understanding who has which files, right?
If this is how it works, then I don't really understand what the external caches do for anyone.
The system you've described doesn't really scale to, say, 1000 people with 10TB of files. Even with a lot of flash drives in that cache.
The sneakernet is engineered basically to let you keep making trips to that cache outside your dorm room and somehow magical files appear from a block or two away. Also, files from you magically appear a block or two away when other people ask for them. The sneakernet takes care of all the routing to make lots of files show up in lots of places geographically close to you without too much overhead.
I've found lots of times I will request a file that the system in its cleverness has already cached locally on my HD. So "getting it off of the sneakernet" is actually instantaneous.
> I've found lots of times I will request a file that the system in its cleverness has already cached locally on my HD. So "getting it off of the sneakernet" is actually instantaneous.
Huh? I thought that the point was that request files and then get them. How do you already have a file that you didn't request? Is this distributing files across people (i.e. FreeNet) so you have a bunch of files that are only on your machine for the purpose of being an intermediary to others? Or is this running some sort of predictive algorithm that tries to figure out what you will request before you do?
In a very generic answer to your question, suppose you're moving a 1GB file from A to B with an 8GB flash drive. It doesn't cost anyone anything to move an extra 7GB of data that somebody might need later. You can 8x your throughput for free.
But, at the same time, you don't want a flash drive to be in a constant state of being near-full so it can't fit the new StarCraft beta.
So there's some logic that takes files that seem to be spreading around anyway and sort of spreads them around ahead of when they're requested. This both increases redundancy and increases network throughput. It's not exactly machine learning, but it does, in practice, provide decent results.
By "magic" do you mean "through the internet" or is something cleverer going on?
Also, by cache are you generally referring to some place where files are stored? Or does it mean something else?
I understand the system I just described doesn't really scale well, I'm just not really understanding how it does work from the documentation in the wiki.
cache == a flash drive or group of flash drives (although it could also be a USB HD, etc.)
Not through the internet. Through moving from flash drive to flash drive. If you follow the "testing" section along with the network maps for both the "basic configuration guide" and the "large configuration guide", that should clear up a lot of the magic.
You might want to say like storage device instead of cache, that would have made it less confusing for me to understand from reading the wiki.
It also seems like the system is just begging for caches to go missing (though I understand that as all the files are encrypted, it should be hard in theory to actually get data if you arent authorized access to sneakernet).
Are you hashing the files all the time to make sure that files aren't getting corrupted in transit or when taken from someones external cache?