Backups suck (2013)

danieldk · on Nov 28, 2014

I agree with many points of the article. I am slowly moving back from cloud-based services to simple local (though synced) text-based files. E.g. I switched from Google Mail to plain IMAP, which I fetch with isync, read with mutt(-kz) and index with notmuch.

I generally do not do presentations and documents in Google Drive of Microsoft Office anymore. When possible, I stick to Markdown and use pandoc to convert to PDF. Since it's plain files, it is easy to backup. And git provides excellent versioning. Most people can also read and write Markdown without too much trouble.

I also decided to move my task planning back to local, using Task Warrior and Dropbox for sync.

For backups of my MacBook and Mac Mini, I am now mostly using Arq [1], which can backup to nearly anywhere (ssh to Amazon S3), and has a sensible interface. The data format is documented, and they have an open source command-line restore utility for the case Arq disappears. It can also restrict backups when connected to known Wifi networks, etc.

If I wasn't on Mac, I would probably look at Attic.

[1] http://www.haystacksoftware.com/arq/

ekianjo · on Nov 28, 2014

> I also decided to move my task planning back to local, using Task Warrior and Dropbox for sync.

Smart, did not think about using TaskWarrior within dropbox. Pretty good idea!

XERQ · on Nov 28, 2014

I've seen many of my clients set up their own backup systems and have those fail at the worst times. Last month a large client of ours called our managed support team at 3AM saying they hired the wrong developer who completely trashed their database and hosed their entire application. They had their own backup system in place and it silently failed, but luckily they ordered our internal backup solution as a secondary. We were able to get them restored in 5 minutes, if they didn't have our solution in place they would've had to spend weeks fixing what the developer broke.

Current Linux backup solutions are not made for humans. Have a look at the mondorescue guide[1], nobody is going to read that and comprehend it with full mastery, meaning you're leaving yourself open to losing data. VPS providers offer backups that are usually in the same datacenter, which means you're SOL if there's a disaster. Those same providers also don't allow you to restore single files/directories from snapshots, usually you have to launch a new instance or revert everything back to snapshot.

[plug] We ended up creating a simple Linux backup solution[2] that's as simple as copying and pasting a single command to get installed, notifies you if your backups aren't running, handles snapshots, and is secure. Restoring your data is a single command away, so you can focus instead on building your startup rocketship. Our mission is to make data loss a thing of the past. [/plug]

[1] http://www.mondorescue.org/docs/mondorescue-howto.html

[2] https://jarvys.io

donmcronald · on Nov 28, 2014

How do you handle a scenario like this...

I have a Redmine install that I want to backup. It uses both a database and the file system. If I attach a file to an issue, the attachment is stored on the FS with a reference in the DB.

I don't know how Redmine deals with keeping the FS vs the DB consistent, but assume it uses some type of transaction across both when an attachment is added.

How do you back that up without possibly getting the FS and the DB out of sync. Ex: What if you snapshot the FS right after an attachment is written, but before the reference is added to the DB? The transaction in the app will succeed, but your backup is inconsistent compared to what the app expects.

How can you get a truly consistent backup without either a) stopping the app or b) integrating with the app to make sure you're not breaking assumptions needed for consistency?

Basically, almost every backup solution I've ever seen is crash consistent at best. How is yours different?

derekp7 · on Nov 28, 2014

I've been struggling with this myself in various cases. The conclusion I've come down to, is that the app has to take some responsibility to provide for consistent backups, either via checkpoints / write barriers, or via a quiesce command that can finalize in-flight transactions, and pause / buffer operations long enough for a volume snapshot to take place. The other thing I've seen apps do is provide a data export functionality, so you end up backing up the export file not the live data. But that requires extra time and disk space. Another option is to run the app in a VM, and do a live VM snapshot which includes the running state of the VM.

Bottom line is if you can't live backup your app, file a bug report with the app vendor. Although it would be really nice if there were standards in this area, so that a backup tool would just need to call one operation to put all supported apps into a hot-backup mode.

dspillett · on Dec 1, 2014

This is why I prefer to have objects in the DB if they are not too large or numerous for that to be practical, that way the transactional integrity of the DB and its backup handles this. Unfortunately it is not always practical or under your control.

From a developers PoV you can increase the consistency of your users backups by being careful how you order operations. If you make sure the DB is updated after the file is in place and tell the app administrators to backup the database first, and your blobs are insert-and-soft-delete only (you can do updates in an insert & soft delete only way with versioning instead of in-place updates). That way you are never in a position where you have a missing blob, though you might end up with orphans where inserts happen during the time the backup took to run.

Other than that, as you say, you have to stop the app or the app has to have built in support for consistent backups (or integration into your chosen solution).

To minimise the downtime associated with this you can use tricks like LVM snapshots or your OS's equivalent. That way the downtime is only as long as it takes to stop the app, take the snapshot, and restart the app, instead of the full length of time it takes to make the backup (which could be a fairly long time if it involves a full backup of a large DB), and it means you can coordinate the backups of apps that are integrated (or just used in sync) but not in a transactionally safe way, without having to stop them all for the full backup run.

That is the best you can do without explicit support in the app - there is only so much a backup solution can be in control of.

XERQ · on Nov 28, 2014

Our core value is around the simplicity of use, so instead of having to read a novel of a manual written by a crusty UNIX sysadmin, along with combining it with storage, you can create an account and copy-and-paste a single command that does all the work for you. We've even tested it with people who've never used Linux before and they were able to install it and restore a file without much direction.

With that said, we do provide the ability to hook your own scripts and commands into the backup process itself, for instance backing up MySQL you'd just put in the mysqldump command (with relevant DB and user/pass info) into the hook script uncleverly titlted 'run-before-backup.sh' in the config directory.

I personally don't have any experience with Redmine. Application-specific backups are outside the scope of our initial launch, but as we get more user feedback we'll be able to build plugins that integrate with specific applications. In the mean time, I'd both: 1) pester the developers to provide a simple backup solution for Redmine, and 2) look into either putting it on a VM that you can snapshot, or use something like ZFS snapshots and send/receive it to a remote location.

moe · on Nov 28, 2014

Um. From your docs:

C. RESTORE FROM ANOTHER SERVER'S BACKUP

This feature is currently in development.

So... My server goes up in a puff of smoke and I can't access the backup?

XERQ · on Nov 28, 2014

This is a beta product, and we put that up on the docs so people can understand in its current state what it's capable of and what it isn't. One thing we didn't put in the docs is that we can change the target of a new system to an old server's backup. So in the case of your server going up in smoke, you would simply open a uservoice ticket (or email me using my profile info), fire up a new server installed with JARVYS (disabling the cronjob until you're restored), and we'd change the target UUID to that of the new server, allowing you to restore from your old backups.

Soon we will have it where you can simply select any other server from your account to restore from it to any other server on your account, and the docs will be updated with how to do it.

mercurial · on Nov 28, 2014

> Last month a large client of ours called our managed support team at 3AM saying they hired the wrong developer who completely trashed their database and hosed their entire application.

Meaning that they didn't have a QA process in place before putting things in production, or that "dev" is also "production"?

acveilleux · on Nov 28, 2014

That's surprisingly common for "internal" applications. And quite frankly, the type of dev that screws everything up like that is also the type of dev that doesn't know any better, doesn't version things, doesn't have an issue tracker and doesn't set up DB backups, etc., etc.

mercurial · on Nov 30, 2014

That's an institutional failure in this case. I certainly wouldn't expect a dev to setup backups for the company database, for instance. A VCS and issue tracker should be a company-wide (or at least department-wide) thing, not something you setup (or not) from project to project.

mattbee · on Nov 28, 2014

Good rant :)

As the co-owner of a hosting provider with its own data centre I might be biased but I totally reject "You can’t trust your datacenter anymore when you’re in the cloud" argument. You _need_ your backup server to do some lifting for you to make half of that wishlist work. If you can't trust your provider not to read your unencrypted some of the time data, you're using the wrong provider, and making backups much harder than they need to be.

Also a lot of that wishlist seems to ignore network realities - you can do a lot more with a backup server in the next rack, or one that's the end of a private link, than you can with a very remote storage provider, so insisting on the same tools no matter where your backups are located seems a bit hopeful.

I've worked on a backup system for our customers called "byteback" which I'm slowly finishing off and documenting - will only be 2-3000 lines of Ruby when finished.

It currently leans on rsync, ssh and btrfs's copy-on-write snapshots to keep efficient copies of entire servers. There's a server-side pruning algorithm that allows the disc to fill up with daily snapshots for multiple servers, then prunes the least "useful" ones.

It's trying to be zero-configuration, so it builds its list from a list of "local" filesystems, so that you can copy the whole snapshot back quickly to restore a system.

The only other feature I'm going to need is to automatically drive "snapshot" functionality when it finds it on the server - e.g. LVs, btrfs subvolumes and other points on the filesystem where it can make a safe snapshot, it should do that automatically.

The zero-configuration rationale is just that where we've had backup failures, it's been manual misconfigurations and misguided attempts to be "efficient" that caused important files to be missed. So I'm trying to bake in defaults and features to cover every mistake we've ever made :)

As the name implies it's made for our customers, and our defaults, but I'm pretty sure it'll work for a lot of other server use cases as well. I'm still going over a few 10s of live backups fixing problems and adding new defaults as I find them. If anyone's interested, shout and I'll try to push on with the documentation and put it up in its current state.

mpasternacki · on Nov 28, 2014

Original ranter here, thanks :)

Regarding "trusting the datacenter": it was an overstatement, but while I can risk my own data, I am double-wary with my clients' data, and triple-wary with my clients' users' data. And while I don't assume that cloud provider is necessarily evil and unreliable, I need to survive losing a provider (be it because provider went bankrupt, or because I tried out GAE and got locked out in a Google Checkout mishap, or because of whatever reason), and I cannot rule out possibility of a leak (think Dropbox: hindsight's 20/20, but I don't believe I can predict which one of cloud storages will be next Dropbox). An extra bonus of good encryption is that I can spread storage over cheaper, but less reliable providers.

Your remark about ignoring network realities is good, thanks! I work mostly with cloud or remote servers, it's been at least 10 years since lat time I in the same room as a server I manage (and back then it was an office Samba file/print server). Maybe my perspective shows here, and it may be just as limiting as older tools just phrasing everything as "tapes" and "autochangers". I am closely looking at FreeBSD/ZFS right now, and this may not fit well with `zfs send`-based backups.

I'd be definitely interested in taking a closer look at Byteback! Since it's based on btrfs' snapshots, it may also work well with zfs (and may be exactly the plumbing I am about to write soon). If you manage to publish it, please let me know. Not sure if my email is listed on my HN profile: it's maciej at pasternacki dot net. Thanks!

ScottBurson · on Nov 28, 2014

Have you looked at Tahoe-LAFS? I don't think it's a complete solution, but it might be part of one.

rev_bird · on Nov 29, 2014

> If you can't trust your provider not to read your unencrypted some of the time data, you're using the wrong provider, and making backups much harder than they need to be.

This could just be me acting like a post-Snowden jaded American, but the providers themselves aren't whom I'm skeptical of anymore. When the author wrote that a company's "cloud" could be spread out between "five data centers, three continents, three hosting providers, and two storage providers," they seemed to be implying that there are a lot of opportunities for a company other than yours to hire someone just shady enough to do some damage.

What I first thought of, though, was that were sooo many companies for a government to confront, demand access, and then threaten to ruin them if they told anybody.

mattbee · on Nov 29, 2014

Yeah, it's important to know where your service provider is, both physically and judicially. The majority of hosting providers are surely located in just one country. Amazon, Google, Microsoft are the exceptions here.

Mainly because of http://www.rte.ie/news/business/2014/0904/641416-us-microsof... - I wonder whether this is still being fought?

prohor · on Nov 28, 2014

There is another very interesting tool: http://www.boxbackup.org/ . Unfortunately, it looks like it is dead now, like many other open-source tools of this kind. So at some point I've done a backup script around rsync, which also works with snapshots, so has strong data deduplication. Well, it is also dead, but at least it is short enough for anyone to fix it. If anyone is interested, here it goes: http://okrasz-techblog.blogspot.com/2011/02/backing-up-with-...

dspillett · on Nov 28, 2014

There is a decade old guide to using rsync for backups and snapshots at http://www.mikerubel.org/computers/rsync_snapshots/ which I originally based my hand-rolled arrangements on. It hasn't been updated since 2004 but is still relevant. There are tools that make this more hand-holdy if you prefer to do less work/thinking yourself, like rsnapshot.

For extra safety against hack+delete+ransom attacks, I make sure my backup servers and main kit have different credentials can't talk directly to each other at all - this way if someone hacks into my mains they can't automatically get at my backups and vice versa. I have an intermediate backup location: the active machines push data to that and the backup services pull from there, it can't login to either of the other sets. For automated backup testing, which I recommend you find time to setup, some data goes the other way (backups push to intermediate, other sites pull from there).

rsync · on Nov 28, 2014

The "Mike Rubel" guide is a great one, and one that we have pointed customers at for years - especially for his explanations of "rsync snapshots".

FWIW, we finally wrote our own "rsync HOWTO", which is ironic, given that we ran rsync.net for almost a decade without one.[1] It is NOT rsync.net specific, which is why I am mentioning it here. Just our attempt at a simple, concise rsync HOWTO. It includes crontab explanations and examples, as well as all of the SSH key generation steps.

"For extra safety against hack+delete+ransom attacks..."

rsync.net customers get protection from this in two ways. First, all accounts have ZFS snapshots enabled by default, and ZFS snapshots are absolutely immutable. Only local root can destroy them, and only with a snapshot-specific destruction command.

Second, we do server side "pulls" for all customers who request it, so you can have your backups at rsync.net without any credentials on your end for an attacker to use.

[1] http://rsync.net/resources/howto/rsync.html

dspillett · on Dec 1, 2014

Nice. I'll have to remember that next time I'm rearranging things and reconsidering options.

jamesog · on Nov 28, 2014

Box Backup isn't dead, but there is pretty much only one guy maintaining the code these days and he doesn't get time to release very often.

It's good software but still not really feature-complete. For example, doing mass-restore (e.g. an entire directory) is very tedious. Single files are fine, but this doesn't make it very useful.

(Point of note, I maintain the servers for the project, but don't really contribute to the project itself these days.)

freework · on Nov 28, 2014

The problem is that backing up data is fundamentally a management task. It requires skills to actually do right. People view backing up their computer in the same way they view changing the oil in their car. Just like there is no button you press inside the car that says "change oil", there is never going to be a backup solution that is easy to use for the regular user. At least not one that does a complete job is backing everything up.

Here is how I handle 'backups': I never do it. If the data is important, I make sure I store it on a device that is already redundant. If I take a photo that I really want to keep, I'll send it to dropdox or google drive or gmail or some place like that. Neither my macbook, nor my iphone, nor my linux laptop is the canonical living place of anything important. If any of those devices were to disappear, I will lose stuff, but nothing important. This system I've sort of subconsciously migrated towards over the past decade or so of hard drive failures and lost phones.

acveilleux · on Nov 28, 2014

Redundancy != Backups. As a recovering sys admin, I want to state that fact again and again until it syncs in.

The difference is that backups should provide point-in-time snapshots.

Now some of the things you suggest provide some of that (Dropbox has some versioning) and manually copying things to multiple locations effectively makes a snapshot of that thing. Backup systems should do this automatically, on a schedule and comprehensively.

It says a lot that the best I've used was time machine and it's not especially powerful. It does however "just work." I've wasted hundreds of hours of my life managing various backup software and they all pretty much sucked. They did however save my bacon a number of times.

Rapzid · on Nov 28, 2014

Backing up a few servers isn't that hard. I managed the backups(and pretty much re-wrote the entire system) at a VPS provider. We essentially went in and backed up every LV found that didn't match certain patterns. You definitely want some sort of snapshot functionality to help you out and lvm provides that to you. It also makes it a piece of cake to add another server to the backups; as long as it has LV's they will get backed up. There was of course a lot of logic to handle edge cases, DRBD, ntfs, alerting, rsync includes/excludes, etc.

The trouble comes when you need to backup an entire business, including all the VPS's, every night. The shear volume of files and the LSTAT's was killing. I remember when we crossed the number of files we had enough RAM for XFS to cache the metadata for and performance started to plummet. In the end we ended up moving to ZFS backup boxes, tons of RAM, rsync inplace, and snapshot send/receive to replicate between data centres.

iamtew · on Nov 28, 2014

Over a year old article, but I'm still curious; what options are there for backups these days?

rakoo · on Nov 28, 2014

Simple, Secure Backup That Just Works:

https://www.tarsnap.com/

EDIT: it will require some manual fiddling, especially if have multiple machines, so it's not an option that passes all criterias from the article.

derekp7 · on Nov 28, 2014

Is it possible to set up a local server for tarsnap, or is the client only for usage with the tarsnap service? And since the client isn't open source (but is distributed as source code), would it be a license violation if someone reverse-engineered the protocol to write an open server component?

For cloud-based backups, my preference would be to backup locally first, then sync that to a cloud service, since restoring from a local box is faster than remote. It might be a fun project to create an intermediary server which the tarsnap client talks to, and have that server then forward everything to the cloud service.

icebraining · on Nov 28, 2014

Is it possible to set up a local server for tarsnap, or is the client only for usage with the tarsnap service?

It's just for the Tarsnap service.

And since the client isn't open source (but is distributed as source code), would it be a license violation if someone reverse-engineered the protocol to write an open server component?

It's not clear: https://www.eff.org/issues/coders/reverse-engineering-faq

xorcist · on Nov 28, 2014

That depends on what you requirements are. Do you need application specific backups (which?), windows support, mac support, hierarchical storage, tape support?

For personal/development use on Linux, you should absolutely take a look at obnam (and the similar attic and bup). They are simple software that do dedup, encryption and cloud storage.

iamtew · on Nov 28, 2014

Right, I was quite sparse with details...

I do mainly Linux stuff, and obnam looks nice and feature-packed enough to fit my current needs, I'll dig in to that a bit. Cheers :-)

ThatGeoGuy · on Nov 28, 2014

To be honest, I had the same question last year, and wrote a short guide on how I (eventually) came to do backups [1]. Funny enough, I didn't even notice the topic article until today, which is kind of funny that I see it almost exactly a year later (only off by two days, really).

Of course, this is very Linux centric, but it fits my needs and likewise allows me to rsync back anything on the backup at any time, even if I accidentally all of /bin/ or something equally stupid. Though now I've somewhat updated it (I use a fully encrypted system and set cryptsetup to use pass-files on the main disk for the backups in addition to a set password), but the fundamentals are all there.

[1] https://thatgeoguy.ca/blog/2013/12/26/encrypted-backups-in-d...

pyre · on Nov 28, 2014

Is there anything about obnam that is Linux-specific? As long as I don't need any HFS-specific features, it should work on OSX, right?

wazoox · on Nov 28, 2014

Apparently BareOS is making progress, and has now scripting capabilities. Go BareOS! Die, Bacula!

derekp7 · on Nov 28, 2014

There is my entry, Snebu, which I've been using for the past couple years, at http://www.snebu.com (hosted at Github). There are two things that it does not do, encryption and block-level deduplication. For encryption, my preference is to write it to an encrypted volume. And block-level deduplication would require inventing another proprietary storage format, which I try to avoid.

It works a lot like rsnapshot (each backup is a self contained snapshot of a system at that point in time, and only the incremental changes are transferred to the backup server), but with compression and full file level deduplication (not only across backup generations, but across all files including ones from multiple clients). It uses GNU `find` and `tar` on the client to do the heavy lifting of getting the files (so only a simple shell script is needed on the client, and then only if you are doing push backups). On the backend side, it stores the metadata in an SQLite table, and individual files are stored in a vault directory using lzop compatible compression.

The advantages over tools like rsnapshot is that it has compression, full file level deduplication, and metadata is stored in a DB file (which avoids a large hardlink-farm for dedup (avoiding problems like https://news.ycombinator.com/item?id=8305283), also removes the need for root privileges on the backup server to store file ownership/permissions). The disadvantages compared to other tools, is it doesn't do encryption on the client (this would break dedup if done right, and I'd rather leave encryption to the experts), and it only does file level dedup, so isn't suitable for backing up VM image files (unless you combine it with something like libguestfs, something I'll document in an upcoming release).

There are also some nice-to-have features, for example if you have a job to expire all daily backups older than 10 days, and monthly older than 6 months -- and you have a host that hasn't been running / backed up for a year, the expire job will keep a minimum number of backups for each host (default 3 backups, but is tuneable). And the next major release will support multiple repositories, so you can move old backup sets to archival storage (external USB disks, possibly tape library support, etc). Also included is support for storing extended / SElinux attributes, and the current client shell script has a plugin architecture to execute pre/post backup steps for app-specific backups, such as Oracle hot backups.

Edit: for Windows backups, I'm currently running it under Cygwin. But I'm not able to preserve Windows specific permissions. To fix this, I need a "tar" implementation for Windows which stores the extended ACL information in PAX headers in the tar file, similar to how Red Hat's patched GNU tar does SElinux attributes. If I can't find something, I may take a crack at it myself.

PhantomGremlin · on Nov 28, 2014

Let's question a major premise. Does backup have to be open source??? Maybe there are good commercial solutions available?

There hasn't been a great deal of discussion of that topic so far. Surely there are people using commercial software who can offer "non marketing copy" opinions on those products?

arca_vorago · on Nov 28, 2014

"Does backup have to be open source??? Maybe there are good commercial solutions available?"

I've dealt with almost every major commercial backup solution, and in my opinion, all of them suck. Why? Mostly because of intuitiveness. I've seen knowledgeable techs get confused by terminology during setup and during testin/restoring. Differential, incremental, archive bit, etc.

Another note is cost. At one the support startups I was at, we licensed a backup solution that we customized so the customers thought it was our own in house, when it wasn't really. Many of the bigger names can get really $$$ really fast. (looking at you EMC)

Backup doesn't have to be open-source, but honestly I don't know how much I would trust anything else? I have, reluctantly at first, recently embraced the google drive ecosystem, combined with vault for regulatory compliance, and the versioning that comes with google, makes me feel more confident regarding documents stored there than documents put on random servers and being backed up to other places.

The problem with your original question though, is that open-source solutions still suck as well. That's why, depending on the complexity and needs of the situation, will generally just roll my own using rsync, git, rsnapshot, etc, in a transparent way to the user. eg, user uses file server A, wich backs up to backup 1, with backup 1 backing up to backup 2.

I would also like to point out the growing issue of raid. In my mind, at hdd sizes where they are, ZFS or a similar solution is the only way to go. Raid-5 is dead to me.

billybofh · on Nov 28, 2014

I have suffered a few commercial backup packages, and by and large they've been pretty grim to use. Veritas was the last one and (at the time) its interface was a horrid java lump combined with a fairly arcane backend of config files. And it was eye-wateringly expensive for what we needed (or had the budget for).

If I was still backing up to tape units I might still consider them as last time I looked at OSS tape-backed backup it was pretty spartan.

derekp7 · on Nov 28, 2014

The only commercial tool I've heard people excitedly rave about is Veeam, which is specifically targeted at backing up VMs. The big feature draw of it is that it backs up the .vmdk image files directly, and incrementally. And you can spin up a VM directly from a backed up image without having to restore it first.

sustrai · on Nov 29, 2014

If you like RSnapshot maybe you like ElkarBackup http://github.com/elkarbackup/elkarbackup

It uses RSnapshot and the GUI is an easy-to-use web interface.

junto · on Nov 28, 2014

Something that would make cloud backups easier would be the ability to manage my cable connection's upload versus download speed.

Sometimes I just want to upload backups and it not take 3 weeks.