Another backup possibility I currently use: ZFS on a backup server (not necessarily ZFS on the system that should be backed up), pull data with rsync on the backup host to a ZFS, after that make a snapshot for an "incremental backup".
With zfSnap (https://github.com/graudeejs/zfSnap) you can tell how long incremental backups/snapshots are kept, "rsync && zfSnap -d -a 1w backup"
You can take advantage of the /backup/.zfs/snapshot directory to access all snapshots, built-in compression and possible data deduplication.
If you also have ZFS on the remote host, you can use zfs send and zfs receive to transfer the snapshot directly to the backup server, instead of using rsync for the diff.
I do the same with btrfs. I actually tried bup first, but had a series of problems (see mailing list) and switched to btrfs snapshots. My main disk is also btrfs so I send incremental snapshots ("btrfs send -p"; faster than rsync and I can keep using the machine without making the backed-up state inconsistent), but the rsync method is fine for other source file systems.
All accounts get 7 days (as the parent shows) and all 1TB+ accounts get 7 days + 4 weeks.
However, if you want a custom schedule (say, 30 days, 8 weeks, and 6 months) it only costs more if the space they use exceeds your existing account.
So, if you have a 100 GB account and you use 60 GB and your fancy snapshot schedule (of which the first 7 days is always free) only uses 35 GB ... then there is no additional cost at all.
Bup is lovely. I used it to back up my huge home folder and only switched away to rdiff-backup because (at the time) there was no support for deleting old revisions.
Is there any support for that? (Of course, for a large enough hard drive, it's not much of a problem...)
People have been actively working on a "prune" feature but it seems to never quite get finished. This would indeed be nice to have, although it's less important than you might think, given really good deduplication (which bup has). Currently bup has a very simple model - never delete anything - which is hard to screw up, so you're very unlikely to lose data.
See my github link for the latest source. The other page predates me uploading to github, so there's only a tarball source download from there. I should move everything over to github now, really.
Looking at the README under "Stuff that is stupid":
> bup currently has no way to prune old backups
Thanks for the rdiff-backup shout-out. I'm looking for a nice way to do system backups to my NAS of large VM images without having to install Crashplan. Bup and rdiff-backup both look pretty good.
How do people who would use this kind of thing manage to have remote servers with terabytes of available disk space on them?
Anything is possible with money, of course, but how is this anything other than really expensive?
For example AWS S3 would be $235/month (that's $2,820/year!) for 3TB not even including any data-out transfer charges. Sure there are others that are cheaper but only marginally so.
Is this really what people are doing? Makes the commercial services sound really cheap.
My suggestion is to backup your cloud servers, which are expensive and redundant and have good uplink speeds, to home servers which are cheap and have good downlink speeds. You don't need your backup file server to be ultra-reliable or even up all the time, so the cheapest possible PC sitting on a home internet connection is a pretty good choice. That way, 3TB is just $150 or so plus your electricity, and it's not a per-month fee.
Unless you don't need any redundancy (cloud has some) that's more likely about 39TB (in RAID50, if controller supports such setup).
If so, that's still less than €8/TB/mo, which is better than most cloud storage providers offers. You also have some spare memory and CPU resources (so you could resell them for others as, for example, memcached instances) and a possibility to get a proper SLA, as a bonus.
1. Regularly back up "important" directories (code/, papers/, web/, etc.) to fairly safe/redundant cloud storage with incremental history. I have pretty little of this, <50gb, so it's not super-expensive.
2. Occasionally exchange bulk but less-important backups with my brother, so we're each the other's high-latency, questionable-durability "off-site backup". No incremental dumps here, just rsync. This is where my MP3 collection, DVD rips, and similar goes.
3. Photos, which are important but also bulk, go to Flickr, which is free.
4. Don't back up stuff I can re-acquire, e.g. big public datasets I've downloaded to work on, or Debian ISOs. Also, I don't back up the OS, just my data.
There do, however, seem to be some cloud services that offer big full-disk backups for a surprisingly low flat price, e.g. http://www.backblaze.com/ is $5/mo/machine.
We use a dedicated server. It's a fairly basic machine with a single Xeon CPU and 32GB ram and a lot of drive-bays, which I think doesn't cost much. It has at the moment 16 x 3TB drives in it in RAID 50. RAID 50 is not super on performance, but fast enough to saturate gigabit in sequential operations (which backups are). So it has 32 TB of useful storage for a price of around € 700 per month (leased server at a high-end hosting company, so could be much cheaper if you buy it yourself or use a cheaper provider). Per TB that's around € 21.5 per month. Although our reasons for doing this were not based on the prices of storage in the cloud, it was based on having the data on our own machine with disk encryption and only a connection to our internal management network and not the public internet.
You shouldn't think about backups as paying for $/GB. You should think of it has paying for the ability to restore correctly instantly when you want to restore. That's where the value really lies.
Remember that there's a huge difference between files you need to read and write at any time with low latency and backups which happen at larger intervals and read infrequently, with looser latency demands. For backups, the service to compare is Glacier, which is about an order of magnitude cheaper.
What I'd consider is essentially the Crashplan model: P2P / external backups locally (i.e. full LAN speed) and an off-site replica which can be cheaper and slower as long as you have a high confidence that it'll be available eventually. This way normal operations are fast but if the building burns down you're covered and presumably have higher priorities than waiting for a restore to run.
It's not too hard actually. A line of python is roughly 80x slower than a line of C (no exaggeration). But a typical line of python does a lot more than a typical line of C. So things you can do with a "loose" loop (like once per 64k block) is usually ok in python. Things you have to do with a "tight" loop (like once per byte) need to be in C.
I'd be curious to know if your stance on PyPy has changed at all since 2011 (if indeed it's something that you've taken any new long looks at since) given their progress in that time.
I know that I would humbly submit at the least that my position has moved to believing that PyPy is a viable option for high-speed code (albeit in substantial part due to better interaction with C, nowadays).
Not the author either, but I've made that decision based on profiling the code. It's generally easy to see which functions are slowing down performance and can be refactored to a more performant language.
I love Tarsnap, but S3 storage costs aren't exactly brilliant. Figuring out exactly how you wish to store keys can also be another thought, upfront (albeit one that arises from the increased security that you get 'for free').
Alternately, CrashPlan and other consumer-style services have a bad habit of using very slow, heavy, world-slowing systemwide file update scanning. :/
Having said this, a discussion of the merits and flaws of Tarsnap and similar backup services is something I'm fairly certain I've seen lengthy discussions of on similar HN posts.
so, its 2014 and still people use homegrown variations of tar, rsync, git and whatnot. Or a half done solution like this, or an abandoned solution like box backup.
why on earth isnt there already a perfect cross platform open source backup program? :)
I know,i know..why dont i make one myself? because we dont need a nother half done solution :-b
Every "perfect cross platform open source program" had to start as a homegrown variation of tar, rsync, git, and whatnot. It's not like they fall out of the sky.
of course things dont come out of nothing, i just find it so odd that we have open source top quality OSes, monitoring systems, programming languages, ides, browsers, graphics suites, etc etc, but no backup ..
Seriously. I write all my critical software in assembly so that my super-fast disks and networks aren't bottlenecked by unnecessary CPU instructions! Backup software always values speed over correctness!
You mean like rdiff-backup, which I've been using in production against millions of files for more than half a decade without a single problem?
It may be that some languages or runtimes host consistently more reliable software than others, but I'd bet that the individual programmer, coding style and practice have more of an effect on reliability.
So simplified it's like: rsync -avx remote:/etc /backup/ && zfs snapshot backup@`date`
With zfSnap (https://github.com/graudeejs/zfSnap) you can tell how long incremental backups/snapshots are kept, "rsync && zfSnap -d -a 1w backup"
You can take advantage of the /backup/.zfs/snapshot directory to access all snapshots, built-in compression and possible data deduplication.
If you also have ZFS on the remote host, you can use zfs send and zfs receive to transfer the snapshot directly to the backup server, instead of using rsync for the diff.