If it is backup, it doesn't need to be super safe. The risk that you lose both y...

klodolph · on June 8, 2017

> The risk that you lose both your primary data, (optionally your local backup) and your online backup at the same time is pretty insignificant, given that the online backup will be uncorrelated as in a different physical location (primary and local backup very correlated I agree).

Actually, that reasoning is exactly what I wanted to talk about. When the primary is lost, there is a much higher chance than you'd expect that a secondary contains unrecoverable errors. This is why RAID 5 arrays fail to rebuild after a disk failure so often—they're supposed to be able to tolerate a single disk failing, but they can't tolerate a disk failing and any other IO error at the same time. Part of this is due to how short the timeout is for failed reads in RAID setups, but I've still seen a lot of RAID 5 arrays fail, and I've seen a few RAID 6 arrays fail too.

On top of that, there's the high chance of configuration errors in DIY systems.

cm2187 · on June 8, 2017

Agree. Having a script doing a full read of all the data every couple of months and sending a report by email (that you would notice if not sent) is a sort of must have.

w458cmau · on June 8, 2017

I am still looking for a good way of doing this.

Weekly integrity check of all backed up data. E-Mail which informs of result. Web interface which shows overview of results of historical checks. External service which sends E-Mail if integrity check failed to run (e.g. https://deadmanssnitch.com/).

rwiggins · on June 8, 2017

FreeNAS gets fairly close to this out of the box. My home server runs ZFS with RAIDZ2. By default, I think, there's a weekly cronjob to scrub the ZFS pool (integrity check everything, as I understand it), and then the results of that scrub are emailed to me.

I don't believe it has a web interface with historical checks, although I could be wrong. That said, it might be stored in a log file somewhere.

I also don't have an external service that would send me an email if it failed to run. That said, I would get an email if cronjobs had a mysterious error; otherwise, if the server itself was dead, my data would not be accessible on my home network, so I'd notice.

If the home server dies tragically, well, I hope Google Cloud Storage is doing similar integrity checks -- that's where my offsite backups are.

cm2187 · on June 8, 2017

Integrity is harder because you have to sort of maintain a signature of every file on the side. What I do is to just have a script that reads all data. If some data is unreadable an exception will be thrown and a text message will be sent to me. And a confirmation email is sent otherwise.

In .net it's only a few lines of codes. Haven't thought of deadmansnitch but it's a good idea. Would just take one more line of code.

drdaeman · on June 8, 2017

git-annex has fsck command that can test the data and rclone remote so you can use any configuration of "cloud" storages to hold the chunks.

mongol · on June 8, 2017

I would argue the opposite. When you reach for your backup, you are in a dire situation. If it fails you then, things will be very gloom.

himlion · on June 8, 2017

The problem is sometimes you only find out your backup is corrupted when you need it.

qeternity · on June 8, 2017

If you don't test your backups, then they're not backups.

jjn2009 · on June 8, 2017

yup, the assertion that both failing is highly unlikely is under the assumption that you are checking the status of each copy regularly.

cm2187 · on June 8, 2017

In fact for any disk, it is important to have a script that reads all the data at least every couple of months. It will force the bad sectors to be identified or to be notified you have a bad disk before things get worse.

cm2187 · on June 8, 2017

Yeah, agree. And I am not talking about enterprise level of reliability. More like personnal / small office.

But for anything above a few TB, running your own hardware will be way cheaper if you take say a 5y horizon. Disks are so large today that you don't need a big config. In fact you might even keep two copies to reduce the risk.

Of course there is the occasional trip to the datacentre. I made that mistake. I pay more in uber than I saved by picking a datacentre far away.

w458cmau · on June 8, 2017

Do you need co-location though? I dropped an HP Microserver at a friend's place, which backs up snapshots of my main backup server at home daily. Seems to work well so far, at quite limited costs. Since my main backup server also backs up his NAS we both win. Since all backups are encrypted there is also little privacy risk from theft (or snooping on one another).

cm2187 · on June 8, 2017

With 1Gbps symetrical connections slowly becoming more frequent this would be a much cheaper alternative (assuming you only run it for backups, I also run a mail server and some websites). But in London it is not very practical. I know very few people who have really fast broadband.

jo909 · on June 8, 2017

My backup contains files and old versions of files I no longer have on the primary storage. It is much more than a 1:1 copy, and can not be recreated.

I need to feel safe that I can reasonably undo my changes on the primary storage, even if it takes me years to realise what I did.

This is why I keep two backups. Until now, the secondary offsite copy was on ACD...

dsr_ · on June 8, 2017

That's not a backup. That's an archive. You need to backup archives the same way you do any other data you care about.

jorangreef · on June 8, 2017

If it is backup, then it needs to be scrubbed regularly to detect and repair bit rot.