Build An Opensource Dropbox Clone

telemachos · on Sept 1, 2010

This is an off-topic short rant. Vote as you like (obvious enough), but skip reading this if you want on-topic discussion.

Shoot me if this is how the web is heading. (Recent browsing suggests maybe it is...) We get not only the insufferable "bottom bar" (is there a standard name for those yet?), but also a "top bar", a hover "Click Here to Share" box and a pop-up "Learn More, Instantly" box (not to mention the general clutter and overdense design).

Man, that's foul.

w1ntermute · on Sept 1, 2010

Add the following to your hosts file:

  # Web toolbars/widgets
  127.0.0.1       wibiya.com
  127.0.0.1       cdn.wibiya.com
  127.0.0.1       apture.com
  127.0.0.1       www.apture.com
  127.0.0.1       cim.meebo.com

In this case, the top toolbar is from Apture and the bottom from Wibiya.

telemachos · on Sept 1, 2010

Thanks for taking the time to track these down. (I love a rant with a happy ending.)

sprout · on Sept 2, 2010

Or simply install NoScript, and worry about whitelists, not blacklists.

I can get at the content there with no trouble, and don't have to worry about any of those toolbars.

w1ntermute · on Sept 2, 2010

I've used NoScript, and it's extremely annoying having to enable it on each site you want it on. Not to mention blocking it in the hosts file automatically blocks it in all browsers, and that I don't agree with some of the things the NoScript author has done (http://news.slashdot.org/article.pl?sid=09/05/01/236248).

jpr · on Sept 2, 2010

NoScript is a symptom, not a cure. I haven't yet figured out a good name for the disease though.

pbhjpbhj · on Sept 2, 2010

>NoScript is a symptom, not a cure. I haven't yet figured out a good name for the disease though.

Web3.0?

spirulina · on Sept 2, 2010

Thanks for the list

Just added these to the "Adblock Plus" Preferences in Firefox (under Tools).

Worked like a charm.

stevelosh · on Sept 1, 2010

I activated Safari Reader as soon as I saw those walls of 12px text, so I had no idea what you were talking about until I went back and let the page finish loading.

"Foul" is not a strong enough word for that. Maybe "vile" is better?

mkramlich · on Sept 2, 2010

Same here. And I love Safari Reader aka Readability!

A good rule of thumb I have is that if you "have to put a bag over the head" of something in order to make that thing pleasant to digest or access (whether it's a UI or a bureaucracy or paperwork or a person or physical widget or whatever), then really the source thing itself should be designed that way in the first place. In this case, my own ideal style of web design is very close to what Readability/Reader produces. Thus making it redundant to the extent I've achieved my ideal.

samstokes · on Sept 1, 2010

Those bars are even more irritating on a mobile device. The fixed height and position means they obscure half of the actual content and you can't scroll away from them. Do not want.

jpr · on Sept 2, 2010

They also break the "scroll one whole page at a time" paradigm that many are used to.

EDIT:

I just realized that Chrome devs have apparently already decided that it's not worth fighting the idiotic bars, so Chrome scrolls less than one page.

Groxx · on Sept 1, 2010

I think I'll call them "Mercuries", now that I've thought about it (get it? floating footers!).

dododo · on Sept 1, 2010

lsyncd does the brunt of the work. it uses inotify, which is a facility of the linux kernel, to tell which files should be copied. but i think there is a problem.

inotify is not guaranteed to see every file system access. in fact, it often misses them when the file system is busy because there is an upper bound upon the number of inotify events that can be queued, set in /proc/sys/fs/inotify/max_queued_events.

i.e., you could get data loss using this mechanism alone. you need to also run rsync every so often, i think.

the lsyncd guys are aware of this: http://code.google.com/p/lsyncd/

(maybe this is how dropbox works under linux too, if so it presumably has the same problem..)

brlewis · on Sept 1, 2010

I would guess Dropbox has this same problem on Linux, though I can't say with certainty. What I can say with certainty is that they're thinking about the problem.

As is often the case, you learn a lot about a company through its jobs page. Their latest challenge for applicants looks very much like this problem.

noja · on Sept 1, 2010

Any files not moved elsewhere within X seconds can be handled by a cleanup daemon.

beza1e1 · on Sept 1, 2010

Well, Ubuntu One seems to rely on checking every so often alone ...

rw- · on Sept 1, 2010

inotify is also unable to inherit handlers. after a "mkdir -p a/b/c/d" it is very likely that "d" is untracked. therefore lsyncd is useless on real systems. :(

tommynazareth · on Sept 1, 2010

I don't plan on ditching Dropbox, but I'm definitely interested in having a similar sort of automatic backup with diffs that I can host myself. This looks like it might be an option.

Does anyone else handle this in a different way?

aidanf · on Sept 1, 2010

I've used Unison (http://www.cis.upenn.edu/~bcpierce/unison/) to get a dropbox type setup running across several machines.

larelli · on Sept 1, 2010

I've thought about that as well. Could you give some details on the topology used? Did you run unison in automatic mode via cron?

durin42 · on Sept 2, 2010

I use unison for my code repositories (I use dropbox for other things) and I don't bother with it in cron. It's been fast enough in my experience that I don't mind waiting, and it sometimes (rarely) requires manual conflict resolution.

gvb · on Sept 1, 2010

I use rsnapshot (also built on top of rsync/ssh) http://www.rsnapshot.org

patrickaljord · on Sept 1, 2010

I just use http://rdiff-backup.nongnu.org/

gvb · on Sept 1, 2010

What about using unison http://www.cis.upenn.edu/~bcpierce/unison/ instead of lsyncd? It seems like it would be more resilient. It does look like unison would require a cron job or human trigger where lsyncd is triggered by a file/directory change. On the plus side, it seems like unison would work better in an offline mode.

res0nat0r · on Sept 1, 2010

I've tried to use this on syncing my mp3 directory between two boxes and it seemed to be horribly slow. It needs to stat each file for a second or two before it even begins to sync, this takes way too long.

dagar · on Sept 2, 2010

  -fastcheck
    do fast (and slightly unsafe) update detection on windows

res0nat0r · on Sept 3, 2010

I think I remember doing that and it didn't seem to help, I was using my Ubuntu box if that matters.

mikeyg · on Sept 1, 2010

Zimbra's open source edition's WebDAV sharing works well once you get the URLs figured out. You get a lot more fine grained permissions. It's cross platform, there are shared folders, etc.. if that's what you're looking for. You can make folders public to share with others, and so on. With bandwidth as fat as it is these days DAV's heavyness doesn't feel so heavy anymore. Am I the only one that shares these sentiments? Did I mention it works on MacOS 10.6, Linux, and Windows 7?

If you're using Dropbox to back up some of these other solutions might make more sense. Crafting up a script using hard links and rsync can give you daily snapshots of any filesystem at only the storage cost of the delta (which can be easily turned into a cron job) .. here's a decent resource for that:

http://www.mikerubel.org/computers/rsync_snapshots/

please · on Sept 1, 2010

except that it misses 90% of what dropbox does. like multi client sync, cross platform, shared folders, web interface and so on

fluidcruft · on Sept 1, 2010

except that it covers 99% of what I actually use dropbox for.

kgo · on Sept 2, 2010

Don't forget thirty days of backups and redundant hardware...

res0nat0r · on Sept 1, 2010

I've looked at this project before, it looks like a nice implementation of rsync with inotify, but it seems to be just for a typical use case for rsync, ie, push your local changes to a remote host.

I think if you try to use this like a real dropbox, ie changes occurring on multiple hosts at or around the same time you are going to run in to issues. Also does lysnc handle deleting data on the remote end and propagting those changes out (rsync --del). Also I'm wondering how it will handle conflicts and open files (which it says isn't recommended). I see the .dropbox dir using some type of uuid's or sha1 sigs so I'm sure its doing something more sophisticated to keep things in sync. I've noticed my daemon has died on my linux box every once in a while, and it somehow tracks multiple changes from multiple hosts and replays those transactions correctly and everything rolls up fine.

xxpor · on Sept 1, 2010

I don't understand why dropbox doesn't open source their client. Their business isn't built off of their technology, which, while nice, isn't exactly novel. They make their money from selling storage. Open sourcing their client doesn't change the fundamental fact people need the space to store stuff.

mrduncan · on Sept 1, 2010

From Drew's original YC application [0]:

# Why would your project be hard for someone else to duplicate?

This idea requires executing well in several somewhat orthogonal directions, and missteps in any torpedo the entire product.

For example, there's an academic/theoretical component: designing the protocol and app to behave consistently/recoverably when any power or ethernet cord in the chain could pop out at any time. There's a gross Win32 integration piece (ditto for a Mac port). There's a mostly Linux/Unix-oriented operations/sysadmin and scalability piece. Then there's the web design and UX piece to make things simple and sexy. Most of these hats are pretty different, and if executing in all these directions was easy, a good product/service would already exist.

I'd find it hard to believe that there isn't some very tricky stuff being done in the clients which gives them a big competitive advantage over competitors that pop up.

[0] http://files.dropbox.com/u/2/app.html

pgbovine · on Sept 1, 2010

Their business isn't built off of their technology, which, while nice, isn't exactly novel.

well, be careful with such claims. many technologies don't seem novel at first glance, but i'm sure there are lots of novel tricks they play under-the-hood to get their system to run quickly and reliably.

e.g., google's pagerank is now well-known and public, but i'm sure that they have lots of 'secret sauce' internal algorithms to make their search index work even better, and they definitely don't want to share those secrets

edanm · on Sept 1, 2010

Showing it doesn't necessarily hurt Dropbox isn't enough. You have to show that it actually helps Dropbox to open-source the client. Otherwise, why should they?

xxpor · on Sept 1, 2010

2 things that I can think of immediately

1. Would increase adoption on Linux clients 2. Increased security (the whole security though obscurity doesn't work thing)

edanm · on Sept 1, 2010

Well, I don't think the Linux market is big enough to make it a compelling argument (but I really don't know; I could be dead wrong).

As for security, I'm not sure how big a problem Dropbox has with security. And I would't open-source a client just to get better security for it, but maybe that's just me.

Open sourcing the client is betting on the fact that the client doesn't matter. Maybe it doesn't matter, but maybe it does. More importantly, maybe we think it doesn't matter right now, but we'll find out that we were wrong in 2 years, after it's too late.

michaelbuckbee · on Sept 1, 2010

Linux as a desktop client is potentially too small a market to address, I'm not sure the same could be said about the Linux server market.

edanm · on Sept 1, 2010

Good point, I hadn't thought of that.

Are people using Dropbox on servers, though? A lot of the benefits of Dropbox don't strike me as helping for servers, for the most part. I don't have too much experience, but it seems to me servers are one place you have to actively manage your backups, and can't rely on the "it just works" philosophy of Dropbox.

SwellJoe · on Sept 1, 2010

I think I'm more likely to pay for a service like Dropbox on the server than on my desktop.

Reliable backups are a big deal on a server, and maybe not as much on the desktop. I mean, I'm not too fussed if my mp3 collection gets stomped by a hard disk failure (I just download it again from emusic), but if tens of thousands of forum and ticket tracker posts are lost, I'm gonna be devastated. And, having it in a shared location that makes it easy to copy the database and website to a development machine might be cool. (I'm kinda talking abstractly here, as my own products provide tools to solve all of these problems in different ways, but they're things I've had problems with in the past. And, if Dropbox had a server product, we'd probably add it as a backup target option.)

brlewis · on Sept 1, 2010

Not many are using Dropbox on servers yet, but if their users upload files, they should.

My itty bitty part-time startup has a better uploader than Flickr, thanks to Dropbox on the server. Users share a folder with the server and copy photos into it that they want uploaded. They can be choosing photos to upload even with no connection, e.g. traveling with a laptop.

edanm · on Sept 2, 2010

That's a great idea. I wonder how many people are using Dropbox in this way, i.e. as a framework for sharing things with the server.

brlewis · on Sept 3, 2010

I think there's an audio site out there using Dropbox this way. Other than that I haven't heard of it. Maybe when their API includes folder-sharing functionality more people will do it. Right now I manually accept the shared folder request.

bgaluszka · on Sept 1, 2010

I share folders on server with other Dropbox users (developers, designers). For me it's better alternative than webdav, ftp or scp. They work localy and everything syncs between them and to server plus it has some version control.

I would even use it commercially in my products but API lacks ability to share folders. For now you have to do it manually.

tofumatt · on Sept 1, 2010

Open sourcing the client could mean I run it off my own server, sitting in my apartment, for free. Granted, mostly geeks would be doing that, but I get the impression that many of Dropbox's user base _are_ geeks.

Like someone else said: it's not just whether or not it would HURT Dropbox (though, as I've said, I'd argue it could), but whether or not open-sourcing their clients would actually HELP them. I doubt it would.

dagw · on Sept 2, 2010

I get the impression that many of Dropbox's user base _are_ geeks.

This is not my impression at all. Basically everybody I know uses dropbox, and very few of them are geeks. I was actually introduced to Dropbox by a non-geek boss I had who thought it was the greatest thing she'd ever used and insisted I also get it so we could share files.

Dropbox seems to have done a very good job of penetrating the general market, thanks in no small part to their pretty amazing and minimal UI. It's at the point now where I hear 'normal' people saying things like "can you share it to my dropbox" with the automatic assumption that the other person will know what it means.

wrs · on Sept 1, 2010

I've been involved in a few consumer-oriented file synchronization projects, and I can assure you that the client technology is not at all simple or easy. Consider that all existing applications have no idea that you're changing the semantics of the filesystem, but you have to make them act in a way that makes sense to the end user (not a filesystem expert). Not to mention all the little issues of conflicts, deletion, renaming, crazy stuff that apps do for "atomic save", and on and on.

It's a combination of having to understand app-filesystem interactions as well as user expectations, both of which are largely undocumented and unspoken -- but if you mess with either in the slightest way, somebody will freak out.

huhtenberg · on Sept 1, 2010

> crazy stuff that apps do for "atomic save"

Care for an example?

wrs · on Sept 2, 2010

It's been a while since I worked on this, but if I recall correctly...here's an example: Some Windows apps save by (1) writing a new file, (2) deleting the old file, (3) renaming new to old. The idea is that if it dies in the middle, you don't end up with no complete files. This sequence of events is actually recognized in the Windows filesystem code (if it happens quickly enough) and turned into a file-replace operation, which has very different semantics, particularly the effect of copying the metadata (think extended attributes) from old to new.

Actually the whole question of what to do with "advanced" filesystem semantics (like extended attributes) is pretty hard, especially if you're trying to sync across platforms where those semantics differ. You could just ignore them, but then if an app actually uses them (and why else would they exist?) then the file won't round-trip through the sync platform.

huhtenberg · on Sept 2, 2010

> This sequence of events is actually recognized in the Windows filesystem code.. and turned into a file-replace operation

Woah. I have quite a bit of experience digging through the crap in Windows kernel, but this seems to top all of that. Do you happen to have any references? MSDN, some forum threads, anything? Frankly I just find it hard to believe.

wrs · on Sept 3, 2010

This is documented now, at least for creation times, short names, and file IDs. Apparently the filesystem can choose what to "tunnel" across the delete+rename.

http://support.microsoft.com/kb/172190

http://www.osronline.com/article.cfm?id=22

History: http://blogs.msdn.com/b/oldnewthing/archive/2005/07/15/43926...

huhtenberg · on Sept 3, 2010

Reading... thanks.

rmc · on Sept 1, 2010

If it's open source, anyone can run a "DropBox sever", and charge what they want. This would mean DropBox does all the hardwork and is then undercut by someone else charging less than them.

dagw · on Sept 2, 2010

Dropbox does however have a network effect. One of the big points, at least for me and most dropbox users I know, is sharing folders. That only works if you're on the same server as everybody else is.

dochtman · on Sept 2, 2010

I thought http://www.sparkleshare.org/ was more interesting.

khingebjerg · on Sept 1, 2010

Old news. http://news.ycombinator.com/item?id=1196421