Sneakernet: 99% offline file sharing network

einarvollset · on June 16, 2010

“Never underestimate the bandwidth of a station wagon filled with magtape, or a 747 filled with CD-ROMs.”

pjscott · on June 17, 2010

Or a shipping container filled with hard drives. Conservatively speaking, you could easily get more than 60 PB in a typical shipping container and have plenty of room left over for padding, using 3.5" 2 TB hard drives. Assume it takes a leisurely week to get from, say, one coast of the US to the other; that's a little over 100 Gbit/s sustained throughput.

I've been rounding down here; the real bandwidth would be quite a bit higher.

rdtsc · on June 17, 2010

If you have to copy to the hard drives and copy from the hard drives on the other end, you might be limited to the total IO bandwidth of however many hard drives you can read/write * speed of IO bus.

Not too long ago I had realized that my 1Gbps Ethernet network lets me have a higher sustain data throughput than my 5400 rpm laptop hard drive. You would have the same problem with the crate of hard drives depending how many times you have to copy the data...

gaius · on June 17, 2010

Yep, in practice your HDs would be in NetApp shelves or equivalent, and you would also need to account for the volume of the NetApp heads. That reduces the density by 5-10x. Then there is also the overhead of RAID-DP.

Still, that's a shitload of bandwidth.

sophiebits · on June 17, 2010

Or you could stuff more than one container on a boat. The Emma Mærsk, one of the largest ships, has over 420,000 m^3 of space, which could hold 81 ZB (8.1 million PB) if Retric's (sibling) comment is correct, moving data at just over 1 Eb (exabit) per second. According to Wolfram Alpha, that's big enough to hold the entirety of human knowledge 7000 times.

Retric · on June 17, 2010

Last I checked 32 micro SD cards had the highest memory density.

http://www.sandisk.com/products/mobile-memory-products/sandi...

32GB Micro SD = 1 mm * 11 mm * 15 mm = ~6 million / m^3. So 32GB * 6,000,000 ~= 192 PB per m^3.

2TB 3.5 inch HDD = (4 in × 1 in × 5.75 in = ~3000 / m^3. So 3,000 * 2TB = ~6 PB per m^3

PS: Granted that m^3 if SD cards would cost around 1.2 billion and would take a while to upload at the other end.

sophiebits · on June 17, 2010

I looked incredulously at your dimensions of the 3.5 inch HDD, only to search for the correct ones and found the exact same specs on Wikipedia, then realizing that's probably where you got yours. Looking at hard drive manufacturers' sites, I see that that is in fact the correct dimensions.

What does the 3.5 inch figure represent, then? The size of the plates?

c3o · on June 17, 2010

The 3.5 inch figure represents the floppy, remember that? :) A "3.5in HDD" is a hard disk that fits into the bay originally designed to hold a drive for 3.5" floppy disks.

megablast · on June 17, 2010

I thought it wass due to the fact that the platters are 3.5", just as the 2.5" hd has 2.5" size platters.

But no, looked it up, 3.5" have platter sizes of 3.74" apparently, 5.25" had 5.12" platters. Which makes sense when you look at a hdd with the top off.

chancho · on June 17, 2010

So then why are laptop hard drives called 2.5in?

SamReidHughes · on June 17, 2010

Because they're smaller.

plorkyeran · on June 17, 2010

Because 2.4" sounds worse.

(3.5 * 69.9 / 102 = 2.3985)

crsmith · on June 17, 2010

Or pigeon

http://www.zdnet.com/blog/btl/in-south-africa-carrier-pigeon...

sjs · on June 17, 2010

A USB stick? How do they interoperate if they don't follow the spec[1]?!!

[1] http://tools.ietf.org/html/rfc1149

TeHCrAzY · on June 16, 2010

Reminds me of an interesting article re. the bandwidth of a station wagon: http://www.dansdata.com/gz105.htm

crististm · on June 17, 2010

but what about the latency?

tel · on June 17, 2010

I feel like there's a market there in a post network-neutrality world. Backups are a great example of internet traffic that is large but inherently time insensitive so long as its secure. I imagine that a lot of international email might also be similarly time insensitive. If it were just a switch between "picodollars for arrival in three days" to "nanodollars for arrival in 10 seconds" I wonder how often people would leave it on sneakernet.

garibaldi · on June 16, 2010

Why would you want to use flashdrives to transport files when you can just use the campus networks? We used DC++ when I was at uni and that worked nicely.

ellisd · on June 16, 2010

Same here story here too... after hearing multiple friends busted for torrenting, I setting up a DC++ server under my dorm desk and the rest is history. With some help with friends in each dorm area on campus, we mapped out the IP address blocks for the residential network and white-listed all traffic to the server to prevent the heat that the UCLA DC++ servers were experiencing (circa 2003-2004).

The topper however is how almost 6 years later I was referred to and hired by a bio tech start up based on the fact that the CEO remembered me as the DC++ admin during his days in the dorms.

pyre · on June 17, 2010

Kids these days... I remember when DirectConnect was a crappy VB program that crashed continuously, and DC++ was just getting started as a replacement (they didn't even have a server at the time, only a client).

blhack · on June 16, 2010

We did the same, but used iTunes shares. My machine ran a really fantastic service called mt-daapd, which would periodically scan a folder, then offer everything up on iTunes (I should clarify. iTunes was used as a client to listen to the music, mt-daapd ran on a linux box under my desk).

There was a software called "ourtunes" that would let you download things from these shares. It was amazing. People would name their libraries things like "room 204"...it had a great community feel to it :).

kobs · on June 17, 2010

A part of me faded away when Apple updated iTunes in 2006 and effectively blocked MyTunes/OurTunes [1]. What a great facilitator it was for introducing you to cool people with a variety of musical tastes.

[1] http://en.wikipedia.org/wiki/OurTunes

drewcrawford · on June 16, 2010

It's fun.

Also, my campus just shut down (or "officialized", take your pick) DC++.

praptak · on June 17, 2010

Kudos for taking the effort to do it. Nevertheless I'd still prefer the network for transport. As long as you are allowed an encrypted connection with your buddy's machine there is no reason to resort to flashdrives.

irskep · on June 17, 2010

My school's IT department just banned DC++ and implemented some technical measures against it. This may well be the replacement.

BrandonM · on June 17, 2010

It worked nicely at OSU when I was a freshman (2002-2003) until the student responsible for setting it up got raided and prosecuted.

surlyadopter · on June 16, 2010

Ditto. The day after campus IT borked Morpheus, 2 DC++ networks sprang up. Good times.

natfriedman · on June 17, 2010

This reminded me a little bit of the xbox-based Internet alternative from "Little Brother." Trying to imagine a situation in which this could be useful, I could only think of scenarios involving political oppression.

Nevertheless, good hack.

barmstrong · on June 17, 2010

Seriously thought this was a joke at first. I guess it's not.

Installation instruction: 1. sudo gem install sneakernet 2. put your movie on a flashdrive and give it to your friend

DTrejo · on June 17, 2010

Where did you find the installation instructions and build and install guide? I didn't see them on the github page.

drewcrawford · on June 17, 2010

They're in the wiki. Maybe I should link to them from the read me...

lanstein · on June 16, 2010

You've got a non-utf-8 wink in there, Drew.

drewcrawford · on June 17, 2010

Patches accepted ;)

nirai · on June 17, 2010

USB drives or memory are in the range of 10MB/s write and 15MB/s read.

Since you need to both write & read to transfer a file, you are looking at about 7MB/s file transfer which is about 50Mb/s (wireless order of magnitude).

Combined with the inconvenience, the benefits of this system are questionable.

someone_here · on June 17, 2010

Uhm, modern USB hard drives are much faster.

pixel · on June 16, 2010

The implementation idea is much like FIDOnet, but with flash drives rather than modems. nice.

zokier · on June 16, 2010

Sounds inspiring. I'm already thinking of some kind of indexed communications medium via a network of small, cheap and disposable usb sticks. Something in the style of freenet and tor.

seancron · on June 17, 2010

Reminded me of Paperbak (www.ollydbg.de/Paperbak/) which prints out paper copies of your files that you can later scan back into your computer.

makmanalp · on June 17, 2010

For those who don't know, the term sneakernet is actually relatively old. The oldest reference I could find is 20 years ago: http://www.catb.org/~esr/jargon/oldversions/jarg211.txt

dpezely · on June 17, 2010

The jargon file's "tape" reference is likely to mean paper-tape than magnetic 9-track, so the term possibly goes back further to card-punch days. (I first heard the term in 1985 at my first job during high school: a PC consulting shop.)

I'd hazard a guess that the term was lifted from Navy or other such source far predating movement of electronic bits.

Anyone?

motters · on June 17, 2010

Sounds amusing, but stuff like this used to happen before the event of high bandwidth internet a decade or so ago. In the olden days it was physically swapping cassette tapes, then later posting floppy disks. Flash drives are just the modern floppy disk/CD.

mrshoe · on June 16, 2010

Run this over IP/AC to make it even harder to trace (http://www.faqs.org/rfcs/rfc1149.html).

Groxx · on June 16, 2010

Nooooo, didn't you read the Discussion portion?

Audit trails are automatically generated, and can often be found on logs and cable trays.

It automatically logs everything! You'd also be able to identify people with broadband connections pretty easily.

mkramlich · on June 17, 2010

That's so 80's. The cool kids now use Tor over dragonfly UAV's.

swores · on June 17, 2010

Could expand into wider areas with a daily first-class envelope delivery, containing a usb stick or two..

surlyadopter · on June 16, 2010

We await silent Tristero's empire.

zandorg · on June 16, 2010

I had this idea in 2005: Sneakershare.com was going to be my domain. Never mind.

drewcrawford · on June 16, 2010

I can't figure out how to profit from it. Hence, now run your own for free.

zandorg · on June 17, 2010

Just read the article. Looks like a good project.

sschueller · on June 16, 2010

Wouldn't truecrypt + flashdrive do the same thing without all the hassle?

zokier · on June 16, 2010

You still need to negotiate the swapping of disks and keys, and handle indexing somehow.

thefool · on June 17, 2010

I don't really understand.

So the goal of this is for the program to tell the person with files to load them up on a hard disk and then give them to the person requesting the files, who then picks them up from the dropbox, loads them on her computer, and then puts the flash drive back into the dropbox?

If so, then I guess all this really does is gives you a nice way of understanding who has which files, right?

If this is how it works, then I don't really understand what the external caches do for anyone.

drewcrawford · on June 17, 2010

The system you've described doesn't really scale to, say, 1000 people with 10TB of files. Even with a lot of flash drives in that cache.

The sneakernet is engineered basically to let you keep making trips to that cache outside your dorm room and somehow magical files appear from a block or two away. Also, files from you magically appear a block or two away when other people ask for them. The sneakernet takes care of all the routing to make lots of files show up in lots of places geographically close to you without too much overhead.

I've found lots of times I will request a file that the system in its cleverness has already cached locally on my HD. So "getting it off of the sneakernet" is actually instantaneous.

pyre · on June 17, 2010

> I've found lots of times I will request a file that the system in its cleverness has already cached locally on my HD. So "getting it off of the sneakernet" is actually instantaneous.

Huh? I thought that the point was that request files and then get them. How do you already have a file that you didn't request? Is this distributing files across people (i.e. FreeNet) so you have a bunch of files that are only on your machine for the purpose of being an intermediary to others? Or is this running some sort of predictive algorithm that tries to figure out what you will request before you do?

drewcrawford · on June 17, 2010

In a very generic answer to your question, suppose you're moving a 1GB file from A to B with an 8GB flash drive. It doesn't cost anyone anything to move an extra 7GB of data that somebody might need later. You can 8x your throughput for free.

But, at the same time, you don't want a flash drive to be in a constant state of being near-full so it can't fit the new StarCraft beta.

So there's some logic that takes files that seem to be spreading around anyway and sort of spreads them around ahead of when they're requested. This both increases redundancy and increases network throughput. It's not exactly machine learning, but it does, in practice, provide decent results.

JoachimSchipper · on June 17, 2010

I think these files were on the disk for the purpose of distributing them to other who had requested them.

thefool · on June 17, 2010

By "magic" do you mean "through the internet" or is something cleverer going on?

Also, by cache are you generally referring to some place where files are stored? Or does it mean something else?

I understand the system I just described doesn't really scale well, I'm just not really understanding how it does work from the documentation in the wiki.

drewcrawford · on June 17, 2010

cache == a flash drive or group of flash drives (although it could also be a USB HD, etc.)

Not through the internet. Through moving from flash drive to flash drive. If you follow the "testing" section along with the network maps for both the "basic configuration guide" and the "large configuration guide", that should clear up a lot of the magic.

thefool · on June 17, 2010

Oh ok, now it makes sense.

You might want to say like storage device instead of cache, that would have made it less confusing for me to understand from reading the wiki.

It also seems like the system is just begging for caches to go missing (though I understand that as all the files are encrypted, it should be hard in theory to actually get data if you arent authorized access to sneakernet).

Are you hashing the files all the time to make sure that files aren't getting corrupted in transit or when taken from someones external cache?