Hacker News new | past | comments | ask | show | jobs | submit login
Sneakernet: 99% offline file sharing network (github.com/drewcrawford)
182 points by drewcrawford on June 16, 2010 | hide | past | favorite | 59 comments



“Never underestimate the bandwidth of a station wagon filled with magtape, or a 747 filled with CD-ROMs.”


Or a shipping container filled with hard drives. Conservatively speaking, you could easily get more than 60 PB in a typical shipping container and have plenty of room left over for padding, using 3.5" 2 TB hard drives. Assume it takes a leisurely week to get from, say, one coast of the US to the other; that's a little over 100 Gbit/s sustained throughput.

I've been rounding down here; the real bandwidth would be quite a bit higher.


If you have to copy to the hard drives and copy from the hard drives on the other end, you might be limited to the total IO bandwidth of however many hard drives you can read/write * speed of IO bus.

Not too long ago I had realized that my 1Gbps Ethernet network lets me have a higher sustain data throughput than my 5400 rpm laptop hard drive. You would have the same problem with the crate of hard drives depending how many times you have to copy the data...


Yep, in practice your HDs would be in NetApp shelves or equivalent, and you would also need to account for the volume of the NetApp heads. That reduces the density by 5-10x. Then there is also the overhead of RAID-DP.

Still, that's a shitload of bandwidth.


Or you could stuff more than one container on a boat. The Emma Mærsk, one of the largest ships, has over 420,000 m^3 of space, which could hold 81 ZB (8.1 million PB) if Retric's (sibling) comment is correct, moving data at just over 1 Eb (exabit) per second. According to Wolfram Alpha, that's big enough to hold the entirety of human knowledge 7000 times.


Last I checked 32 micro SD cards had the highest memory density.

http://www.sandisk.com/products/mobile-memory-products/sandi...

32GB Micro SD = 1 mm * 11 mm * 15 mm = ~6 million / m^3. So 32GB * 6,000,000 ~= 192 PB per m^3.

2TB 3.5 inch HDD = (4 in × 1 in × 5.75 in = ~3000 / m^3. So 3,000 * 2TB = ~6 PB per m^3

PS: Granted that m^3 if SD cards would cost around 1.2 billion and would take a while to upload at the other end.


I looked incredulously at your dimensions of the 3.5 inch HDD, only to search for the correct ones and found the exact same specs on Wikipedia, then realizing that's probably where you got yours. Looking at hard drive manufacturers' sites, I see that that is in fact the correct dimensions.

What does the 3.5 inch figure represent, then? The size of the plates?


The 3.5 inch figure represents the floppy, remember that? :) A "3.5in HDD" is a hard disk that fits into the bay originally designed to hold a drive for 3.5" floppy disks.


I thought it wass due to the fact that the platters are 3.5", just as the 2.5" hd has 2.5" size platters.

But no, looked it up, 3.5" have platter sizes of 3.74" apparently, 5.25" had 5.12" platters. Which makes sense when you look at a hdd with the top off.


So then why are laptop hard drives called 2.5in?


Because they're smaller.


Because 2.4" sounds worse.

(3.5 * 69.9 / 102 = 2.3985)



A USB stick? How do they interoperate if they don't follow the spec[1]?!!

[1] http://tools.ietf.org/html/rfc1149


Reminds me of an interesting article re. the bandwidth of a station wagon: http://www.dansdata.com/gz105.htm


but what about the latency?


I feel like there's a market there in a post network-neutrality world. Backups are a great example of internet traffic that is large but inherently time insensitive so long as its secure. I imagine that a lot of international email might also be similarly time insensitive. If it were just a switch between "picodollars for arrival in three days" to "nanodollars for arrival in 10 seconds" I wonder how often people would leave it on sneakernet.


Why would you want to use flashdrives to transport files when you can just use the campus networks? We used DC++ when I was at uni and that worked nicely.


Same here story here too... after hearing multiple friends busted for torrenting, I setting up a DC++ server under my dorm desk and the rest is history. With some help with friends in each dorm area on campus, we mapped out the IP address blocks for the residential network and white-listed all traffic to the server to prevent the heat that the UCLA DC++ servers were experiencing (circa 2003-2004).

The topper however is how almost 6 years later I was referred to and hired by a bio tech start up based on the fact that the CEO remembered me as the DC++ admin during his days in the dorms.


Kids these days... I remember when DirectConnect was a crappy VB program that crashed continuously, and DC++ was just getting started as a replacement (they didn't even have a server at the time, only a client).


We did the same, but used iTunes shares. My machine ran a really fantastic service called mt-daapd, which would periodically scan a folder, then offer everything up on iTunes (I should clarify. iTunes was used as a client to listen to the music, mt-daapd ran on a linux box under my desk).

There was a software called "ourtunes" that would let you download things from these shares. It was amazing. People would name their libraries things like "room 204"...it had a great community feel to it :).


A part of me faded away when Apple updated iTunes in 2006 and effectively blocked MyTunes/OurTunes [1]. What a great facilitator it was for introducing you to cool people with a variety of musical tastes.

[1] http://en.wikipedia.org/wiki/OurTunes


It's fun.

Also, my campus just shut down (or "officialized", take your pick) DC++.


Kudos for taking the effort to do it. Nevertheless I'd still prefer the network for transport. As long as you are allowed an encrypted connection with your buddy's machine there is no reason to resort to flashdrives.


My school's IT department just banned DC++ and implemented some technical measures against it. This may well be the replacement.


It worked nicely at OSU when I was a freshman (2002-2003) until the student responsible for setting it up got raided and prosecuted.


Ditto. The day after campus IT borked Morpheus, 2 DC++ networks sprang up. Good times.


This reminded me a little bit of the xbox-based Internet alternative from "Little Brother." Trying to imagine a situation in which this could be useful, I could only think of scenarios involving political oppression.

Nevertheless, good hack.


Seriously thought this was a joke at first. I guess it's not.

Installation instruction: 1. sudo gem install sneakernet 2. put your movie on a flashdrive and give it to your friend


Where did you find the installation instructions and build and install guide? I didn't see them on the github page.


They're in the wiki. Maybe I should link to them from the read me...


You've got a non-utf-8 wink in there, Drew.


Patches accepted ;)


USB drives or memory are in the range of 10MB/s write and 15MB/s read.

Since you need to both write & read to transfer a file, you are looking at about 7MB/s file transfer which is about 50Mb/s (wireless order of magnitude).

Combined with the inconvenience, the benefits of this system are questionable.


Uhm, modern USB hard drives are much faster.


The implementation idea is much like FIDOnet, but with flash drives rather than modems. nice.


Sounds inspiring. I'm already thinking of some kind of indexed communications medium via a network of small, cheap and disposable usb sticks. Something in the style of freenet and tor.


Reminded me of Paperbak (www.ollydbg.de/Paperbak/) which prints out paper copies of your files that you can later scan back into your computer.


For those who don't know, the term sneakernet is actually relatively old. The oldest reference I could find is 20 years ago: http://www.catb.org/~esr/jargon/oldversions/jarg211.txt


The jargon file's "tape" reference is likely to mean paper-tape than magnetic 9-track, so the term possibly goes back further to card-punch days. (I first heard the term in 1985 at my first job during high school: a PC consulting shop.)

I'd hazard a guess that the term was lifted from Navy or other such source far predating movement of electronic bits.

Anyone?


Sounds amusing, but stuff like this used to happen before the event of high bandwidth internet a decade or so ago. In the olden days it was physically swapping cassette tapes, then later posting floppy disks. Flash drives are just the modern floppy disk/CD.


Run this over IP/AC to make it even harder to trace (http://www.faqs.org/rfcs/rfc1149.html).


Nooooo, didn't you read the Discussion portion?

Audit trails are automatically generated, and can often be found on logs and cable trays.

It automatically logs everything! You'd also be able to identify people with broadband connections pretty easily.


That's so 80's. The cool kids now use Tor over dragonfly UAV's.


Could expand into wider areas with a daily first-class envelope delivery, containing a usb stick or two..


We await silent Tristero's empire.


I had this idea in 2005: Sneakershare.com was going to be my domain. Never mind.


I can't figure out how to profit from it. Hence, now run your own for free.


Just read the article. Looks like a good project.


Wouldn't truecrypt + flashdrive do the same thing without all the hassle?


You still need to negotiate the swapping of disks and keys, and handle indexing somehow.


I don't really understand.

So the goal of this is for the program to tell the person with files to load them up on a hard disk and then give them to the person requesting the files, who then picks them up from the dropbox, loads them on her computer, and then puts the flash drive back into the dropbox?

If so, then I guess all this really does is gives you a nice way of understanding who has which files, right?

If this is how it works, then I don't really understand what the external caches do for anyone.


The system you've described doesn't really scale to, say, 1000 people with 10TB of files. Even with a lot of flash drives in that cache.

The sneakernet is engineered basically to let you keep making trips to that cache outside your dorm room and somehow magical files appear from a block or two away. Also, files from you magically appear a block or two away when other people ask for them. The sneakernet takes care of all the routing to make lots of files show up in lots of places geographically close to you without too much overhead.

I've found lots of times I will request a file that the system in its cleverness has already cached locally on my HD. So "getting it off of the sneakernet" is actually instantaneous.


> I've found lots of times I will request a file that the system in its cleverness has already cached locally on my HD. So "getting it off of the sneakernet" is actually instantaneous.

Huh? I thought that the point was that request files and then get them. How do you already have a file that you didn't request? Is this distributing files across people (i.e. FreeNet) so you have a bunch of files that are only on your machine for the purpose of being an intermediary to others? Or is this running some sort of predictive algorithm that tries to figure out what you will request before you do?


In a very generic answer to your question, suppose you're moving a 1GB file from A to B with an 8GB flash drive. It doesn't cost anyone anything to move an extra 7GB of data that somebody might need later. You can 8x your throughput for free.

But, at the same time, you don't want a flash drive to be in a constant state of being near-full so it can't fit the new StarCraft beta.

So there's some logic that takes files that seem to be spreading around anyway and sort of spreads them around ahead of when they're requested. This both increases redundancy and increases network throughput. It's not exactly machine learning, but it does, in practice, provide decent results.


I think these files were on the disk for the purpose of distributing them to other who had requested them.


By "magic" do you mean "through the internet" or is something cleverer going on?

Also, by cache are you generally referring to some place where files are stored? Or does it mean something else?

I understand the system I just described doesn't really scale well, I'm just not really understanding how it does work from the documentation in the wiki.


cache == a flash drive or group of flash drives (although it could also be a USB HD, etc.)

Not through the internet. Through moving from flash drive to flash drive. If you follow the "testing" section along with the network maps for both the "basic configuration guide" and the "large configuration guide", that should clear up a lot of the magic.


Oh ok, now it makes sense.

You might want to say like storage device instead of cache, that would have made it less confusing for me to understand from reading the wiki.

It also seems like the system is just begging for caches to go missing (though I understand that as all the files are encrypted, it should be hard in theory to actually get data if you arent authorized access to sneakernet).

Are you hashing the files all the time to make sure that files aren't getting corrupted in transit or when taken from someones external cache?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: