Hacker News new | past | comments | ask | show | jobs | submit login
Zsync: Differential file downloading over HTTP using the rsync algorithm (2010) (moria.org.uk)
215 points by pmoriarty on Oct 17, 2017 | hide | past | favorite | 46 comments



We are using this for addon synchronization in our community through Arma3Sync. On server side we need to "build" repository - it just generates .zsync files, then clients are downloading just diff. Update size came down from 10-15GB to <3GB.


Fossil (https://www.fossil-scm.org) is a SCM tool that uses the rsync algorithm for syncing repositories. It has a built-in web server, and can also be accessed via CGI from any CGI-capable web server. It also has a SSH option.

Its features make it very handy for a number of file transfer/sync tasks, over and above its chief SCM role.


If you like rsync and javascript, I wrote https://github.com/claytongulick/bit-sync which is kinda fun.


2005. So I guess this didn't catch on.


Ubuntu still provides zsync of its installation media: http://cdimage.ubuntu.com/ubuntu/releases/17.04/release/

That said, I'm not sure I know of any other major users of it -- most people just use a .torrent (which similarly has checksums of each piece so you know which pieces need to be downloaded).


Not a major user, but we're using zsync for system updates of our Raspberry Pi based digital signage operating system (https://info-beamer.com/hosted). It's pretty great and offers a few things we couldn't do with bittorrent: Every time we have a new release we put together an install.zip file of everything required (kernel, firmware files, initrd, squashfs). Users can download this file directly and unzip it on their SD card and it will boot our OS. For updates we use a previous (see below) version of our install.zip already available on the device and only download the changes. We then unzip that into a new partition for an A/B booting.

Zsync is awesome as we can specify any number of existing files already available on the device (with the -i command line option) and zsync will try to make use of them to minimize downloads. We really use this feature to our advantage: zsync by default will keep the previous version of a file if it's going to overwrite it. So we have two versions of install.zip on a device. When switching between OS releases (stable / testing...) we can switch back and forth with zero additional downloads as both versions are available and zsync makes use of that. Similarly after a user installs our OS, we just have the unpacked artefacts (kernel, etc.) on the SD. We can quickly recreate an initial version of the install.zip file on the device by seeding the download with those files. It's usually just 500k to construct an initial install.zip file we then later use to minimize all future updates.


Did any IoT device management service fit the bill too (back then or now)? We are heading towards a similar use case.


OS development for info-beamer started in 2013. I'm fairly sure nothing even close was available back then. I'm not sure about today. So far I don't regret the NIH approach we took.


Yes, this seems to have been replaced mostly by bittorrent.

On the other hand, HTTP has the advantage that it works through corporate proxies and that is usually not blocked by over-cautious admins.


I'll note that bittorrent uses HTTP for a few different things:

- HTTP(s) based trackers (although UDP is more common these days) - HTTP webseeds/mirrors (BEP-17) - (if you count it) webtorrent uses Websocket trackers and can support HTTP webseeds (although really of course the P2P is WebRTC)


> which similarly has checksums of each piece so you know which pieces need to be downloaded

Can you achieve differential downloads with bittorrent?


The AntiX Linux distro also provides zsync downloads:

https://sourceforge.net/projects/antix-linux/files/Final/ant...


A long time ago on a crappy restricted internet I used this and jigdo at different times to download the Ubuntu ISO. Being able to do so for Ubuntu and not other distributions was another factor in my using Ubuntu back then.


Thank you to the sibling comments for mentioning that you're using zsync. I'm sorry my comment was so negative.


I'm not related to Itchio.

If you are looking for a maintained system for online systems that provide software updates I would look into https://github.com/itchio/wharf-spec.

Wharf is used for Itchio to sync folder structures differentially / incrementally. It uses the latest compression algorithms. It has a reference server.

Alternatively, a port of zsync is https://github.com/salesforce/zsync4j written in Java.

I had trouble compiling zsync for windows.


How is the wharf protocol better than zsync?


https://itch.io/docs/wharf/design-goals.html Describes the goals.

https://itch.io/docs/wharf/algorithms/diff.html and https://itch.io/docs/wharf/algorithms/apply.html describe patching

The most important thing is that Itchio runs a business on the usability of this system.


Is it outperforming zsync in terms of bandwidth? That's the most important question for me.


This makes the claim that you don't need to run any additional software on the server, other than having an HTTP/1.1 compliant host.

How does zsync diff against the local file without downloading the contents from the server?


"zsync is only useful if people offer zsync downloads."

It works by expecting a .zsync metadata file which gets downloaded first and is used to guide the differential download.


It uses a special file that has to be installed on the server. Perhaps you thought (as did I) that this was supposed to work with any file over http? That does not appear to be the case.

http://zsync.moria.org.uk/server


Installed on? Uploaded to.

It's just a pair of files. The big thing you're trying to transfer and the zsync file that details the content of that first file, to guide the downloader.


I'm not sure I see the difference?


You can upload the .zsync file to anywhere Apache, Nginx, etc., can read, and it all just works.

You don't need a plugin, you don't need to change your httpd.conf, ...


Yeah. I assumed that.


Within the wider context of your comment, "install" implied a setup process or configuration change.


I see, this really threw me off:

"Rsync over HTTP — zsync provides transfers that are nearly as efficient as rsync -z or cvsup, without the need to run a special server application. All that is needed is an HTTP/1.1-compliant web server. So it works through firewalls and on shared hosting accounts, and gives less security worries."

The other child comment also touches on something that I seem to have skipped over, the .zsync metadata file which contains pre-calculated rsync hashes. Using this means software isn't needed, but during upload the file needs to be processed to produce this metadata.


Interesting note on the site is that these files don't need to be on the original server. Anyone can generate and host them.

> in fact, the .zsync can be generated and offered by a third party, while still leaving most of the download to the original distribution site.


Interesting. How would that work for clients? If the destination file already exists on the client look for a .zsync on the hosting server, if its not there look for one at https://thirdpartyzsyncs.com?url=someurl ? What happens if the .zsync is out of sync with the resource?


The client, from my very limited testing, expects the url of the zsync file as an input. The zsync file can point to anywhere else for listing the canonical version.

So you'd zsync http://third-party/file.zsync and this file would contain the url for the main file.

> What happens if the .zsync is out of sync with the resource?

I'm not sure, I've not managed to get it working on a simple file yet.


my uninformed guess is that the zsync URL would be provided in some HTTP header


This reminds me about the demo utility that comes wit rsynclib called rdiff, except they approached this problem in less practical way, although it shows rsynclib better.

It works as follows:

- Let say you already have file that is older version, or perhaps corrupted, you use rdiff to generate its signature

- you go then to the place which contains proper file and use the signature file to generate a patch file

- then you use the patch file to fix your local file


Not that I would know any better but I always saw a user-controlled approach built around rdiff as a better alternative than surrendering files to a non-transparent third party such as Dropbox (who, go figure, used librsync originally).

There is an elegant simplicity to rdiff, IMHO.


http://duplicity.nongnu.org/ implements this approach, FWIW.


I am aware of another project called "rdiff-backup", also at nongnu.org:

rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership (if it is running as root), and modification times. Finally, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync. Thus you can use rdiff-backup and ssh to securely back a hard drive up to a remote location, and only the differences will be transmitted.


Yes! I use this and like that it's so simple, and that the latest version of the backup is easily available as plain files. (Any metadata that the filesystem doesn't support are stored separately in files, so it works across different type of filesystems and operating systems.) There is even a FUSE filesystem, "rdiff-backup-fs", for mounting the whole backup history, with each backup point in a subdirectory of its own, like it should be!

Unfortunately, it seems not to be developed any longer, and it has a few things that would need ironing out:

* You can't pause a backup and continue later. * Some operations (notably recovery after an aborted backup run) is excruciatingly slow. It takes tens of hours for me with a backup of 40 GB or so (on a low-powered computer as server, though). I think rdiff-backup-fs is resource hungry as well, which is perhaps partly understandable, since it has to go through a series of reverse diffs to present old versions of a file. * I tried it on Windows once, and it could apparently not handle paths longer than a few hundred characters (due to using that older Windows API, whatever it's called). * You can't delete intermediate backups, only the oldest one.


Have you tried rsnapshot?


ISTR they may have started w librsync but even years ago, their (modified, actively developed) lib bore little resemblance to the canonical librsync.


Lennart Poettering has an interesting tool, casync , which overlaps with zsync. Claims to be a better solution for image syncing. Reasons given here: http://0pointer.net/blog/casync-a-tool-for-distributing-file...


There's also xdelta which is just an algorithm / program for calculating and applying binary diffs. I suppose the advantage of zsync is that you can always point "new" users to "/current.file", and use zsync to patch "up" to the latest version - with xdelta people would need to explicitly get "my..current.xdelta".

http://xdelta.org/


The big difference is that xdelta (much like any diff program) needs both the old and the new versions to create a patch. With zsync, the server only needs to have the new version (which is the only one of interest), and the clients then gets only the parts they want, because they only have the old version.

zsync also does all the fetching stuff directly.


On a LAN or UDP-based "Layer 2 overlay", there is mrsync for this purpose. One could efficiently distribute regular updates to some data that everyone on the network needs, e.g., "domain names" and IP addresses.


I've been thinking that we should replace "cp" with "scp", and then replace both of them with "rsync". Is that a bad idea?


For what use case? We do use `rsync` instead of `cp` in some capacity, even for local-to-local file copy - as there is slightly more verification of a successful copy, and the destination is a quirky flash medium. Not sure how SCP would help here.


For all use cases.

> Not sure how SCP would help here.

I mean that my train of though was first "scp, it seems, can do everything cp can, and more, so let's drop cp and use scp instead", then "but hey, rsync, it seems, can do everything that scp can, and more, so let's drop scp too and use rsync instead".


Possibly. `cp` is ancient and rather basic; OTOH, it is everywhere (as opposed to `rsync`, which I found out the hard way) and it is tiny (fewer toggles to push - less stuff to break).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: