If it is backup, it doesn't need to be super safe. The risk that you lose both your primary data, (optionally your local backup) and your online backup at the same time is pretty insignificant, given that the online backup will be uncorrelated as in a different physical location (primary and local backup very correlated I agree).
> The risk that you lose both your primary data, (optionally your local backup) and your online backup at the same time is pretty insignificant, given that the online backup will be uncorrelated as in a different physical location (primary and local backup very correlated I agree).
Actually, that reasoning is exactly what I wanted to talk about. When the primary is lost, there is a much higher chance than you'd expect that a secondary contains unrecoverable errors. This is why RAID 5 arrays fail to rebuild after a disk failure so often—they're supposed to be able to tolerate a single disk failing, but they can't tolerate a disk failing and any other IO error at the same time. Part of this is due to how short the timeout is for failed reads in RAID setups, but I've still seen a lot of RAID 5 arrays fail, and I've seen a few RAID 6 arrays fail too.
On top of that, there's the high chance of configuration errors in DIY systems.
Agree. Having a script doing a full read of all the data every couple of months and sending a report by email (that you would notice if not sent) is a sort of must have.
Weekly integrity check of all backed up data.
E-Mail which informs of result.
Web interface which shows overview of results of historical checks.
External service which sends E-Mail if integrity check failed to run (e.g. https://deadmanssnitch.com/).
FreeNAS gets fairly close to this out of the box. My home server runs ZFS with RAIDZ2. By default, I think, there's a weekly cronjob to scrub the ZFS pool (integrity check everything, as I understand it), and then the results of that scrub are emailed to me.
I don't believe it has a web interface with historical checks, although I could be wrong. That said, it might be stored in a log file somewhere.
I also don't have an external service that would send me an email if it failed to run. That said, I would get an email if cronjobs had a mysterious error; otherwise, if the server itself was dead, my data would not be accessible on my home network, so I'd notice.
If the home server dies tragically, well, I hope Google Cloud Storage is doing similar integrity checks -- that's where my offsite backups are.
Integrity is harder because you have to sort of maintain a signature of every file on the side. What I do is to just have a script that reads all data. If some data is unreadable an exception will be thrown and a text message will be sent to me. And a confirmation email is sent otherwise.
In .net it's only a few lines of codes. Haven't thought of deadmansnitch but it's a good idea. Would just take one more line of code.
In fact for any disk, it is important to have a script that reads all the data at least every couple of months. It will force the bad sectors to be identified or to be notified you have a bad disk before things get worse.
Yeah, agree. And I am not talking about enterprise level of reliability. More like personnal / small office.
But for anything above a few TB, running your own hardware will be way cheaper if you take say a 5y horizon. Disks are so large today that you don't need a big config. In fact you might even keep two copies to reduce the risk.
Of course there is the occasional trip to the datacentre. I made that mistake. I pay more in uber than I saved by picking a datacentre far away.
Do you need co-location though?
I dropped an HP Microserver at a friend's place, which backs up snapshots of my main backup server at home daily. Seems to work well so far, at quite limited costs. Since my main backup server also backs up his NAS we both win. Since all backups are encrypted there is also little privacy risk from theft (or snooping on one another).
With 1Gbps symetrical connections slowly becoming more frequent this would be a much cheaper alternative (assuming you only run it for backups, I also run a mail server and some websites). But in London it is not very practical. I know very few people who have really fast broadband.