Tarsnap outage postmortem

cperciva · on July 27, 2023

blinks

Ok, I really wasn't expecting this to land at the top of HN. I'd love to stick around to answer any questions people have, but it's 10PM and my toddler decided to go to bed at 5PM... so if I'm lucky I can get about 4 hours of sleep before she decides that it's time to get up. I'll check in and answer questions in the morning.

stigz · on July 27, 2023

Why would I use your service over restic?

God bless you Colin, but reading this, it appears you're the only one in charge of the infrastructure for this service. I'm glad you're clear about no SLA, but this seems like a big liability between me and my backups.

ivanhoe · on July 27, 2023

It's a pretty well-known fact for years that tarsnap is basically a one-man show, and yet Colin has managed to provide fantastic service so far. Sometimes having ppl who built the service also managing it is actually a big plus, compared to other services where you first have to fight through outsourced & underpaid support that's limited to template answers, only to finally get some "engineer" who got that job 2 months ago and is more clueless on their system than myself...

stouset · on July 27, 2023

And to be frank, I've seen plenty of mission-critical services at $bigco which may have had a team of engineers working on them, but the core functionality was maintained, understood, and supported by effectively one senior engineer. If anything went wrong, the supporting junior staff might have been able to fix reasonably simple stuff, but there was essentially one person who understood the system deeply enough to handle problems of any real significance.

Spooky23 · on July 29, 2023

Absolutely.

Early in my career, I became the second person able to support and operate a system that was public facing and responsible for billions of dollars of activity that mattered to many individuals and stakeholders. The entire team retired over a period of six months, after giving the folks in charge a year or more notice. After about 12 weeks, I was the sole guy, training a 4-5 new people.

We’re all probably using a service like this. As demonstrated by Twitter, well engineered systems can persist, even without proper care and feeding, until they don’t.

andrewmunsell · on July 27, 2023

I hate to bring this up, but what about the bus factor? If Colin is physically unable to continue maintaining the service and something like this happens again, how will anyone be able to get their data out? It's not really a concern about the service Tarsnap provides today

Alupis · on July 27, 2023

There's an old Sys Admin saying (perhaps from Allan Jude of ScaleEngine) that goes something like "if your data doesn't exist in at least three places, it doesn't actually exist at all..."

That is to say, if Tarsnap is the only place you've keeping sensitive/important data, then you're "not doing it right" as a backup. Things happen... your hard drive can die suddenly, and a data center bursts into flames all on the same day.

dryrun · on July 27, 2023

I feel like ovh will never stop earing about this. This has been, frankly, a traumatic event for many sysadmins I believe, and one that was shared by many from the same source, which is quite different from the standard variation of "that time when I erased the production database" (looking at you gitlab, but also at myself!). I mean, at this point it's between a legend and a warning tale and I don't know what else to call it. A bad Wednesday probably.

vntok · on July 28, 2023

> I feel like ovh will never stop earing about this.

To be fair, they deserve it a bit as they got up in flames twice .

Indeed, after the first fire, the geniuses over there collected all the UPS and batteries they could find from the DC and stored them all in a pile in a closed container... where they predictably bulged, failed, sparked and eventually triggered another fire after a couple days.

tomjakubowski · on July 27, 2023

Why the scare quotes? I would expect any well-experienced power user to know a complicated system better than a fresh engineer two months into working on it, with no previous experience on the system. Especially if the power user is an engineer themself.

crossroadsguy · on July 27, 2023

You really shouldn’t if that’s a major concern for you and that is a valid concern. For the same reason I’ll never use PurelyMail otherwise it’s perfect.

I know you didn’t ask me — but I don’t think Colin can answer differently other than saying that he is training a family member or friend to take over if needed.

Here’s more https://news.ycombinator.com/item?id=7514753 this is also linked there http://mail.tarsnap.com/tarsnap-users/msg00846.html

Very old threads but I am not sure much has changed there https://www.tarsnap.com/contact.html

Why would you use it instead of restic? Well, for pricing in pico dollars ;-)

and for it has a functional GUI with tiny system footprint and that there really aren’t many such solutions out there.

twic · on July 27, 2023

> God bless you Colin, but reading this, it appears you're the only one in charge of the infrastructure for this service

Hence the toddler.

devonkim · on July 28, 2023

I am really confused by the communication thread and am interpreting that the toddler is somehow in charge of running the infrastructure as a joke. Yet I can’t see it as either a joke or serious.

I’m a native English speaker but sometimes I swear I’m losing grasp on communication in the Internet age and am sincerely trying to understand this all.

NhanH · on July 28, 2023

The joke is that the toddler is for future maintenance, not now.

Gordonjcp · on July 28, 2023

My toddler runs https://rangerovers.pub and it mostly holds up okay. He's not great at yaml because he can't really read so the significant whitespace is a problem, but he knows how to run the backups and ensure the mail handler isn't choking on all the Russian spammers. We try to limit his screen time though so he's only on for the 15-minute maintenance window. The Aprilscherz frontend for Docker is a big help.

hunter2_ · on July 27, 2023

Are you suggesting that those who build enterprises don't have time for kids? Seems plausible, but is the difference in lifestyle so consistently prevalent as to be stereotypical? Elon has 10!

molsongolden · on July 27, 2023

Raising the toddler to have some help running the business.

cperciva · on July 28, 2023

Might take a while. Tarsnap has never had an employee without a doctorate. She's a very bright girl but I'll be surprised if she gets her doctorate before 2040.

mdda · on July 28, 2023

So you're the one in charge of the unix epoch rollover?

saurik · on July 28, 2023

Not just help: there is now a clear heir to take over if (the gods forbid) cperciva ever succumbs to illness or is defeated in battle.

amluto · on July 27, 2023

tarsnap natively protects against inadvertent or malicious deletion or corruption — old tarsnap backups are immutablez The low-cost competitors (restic, borg, etc) seem to have this feature as an afterthought, and they make it surprisingly difficult.

(FWIW, S3 can be somewhat straightforwardly configured so that old data is effectively immutable. Google Cloud Storage’s similarly named versioning feature appears to be far weaker.)

e1g · on July 27, 2023

Yep, S3 is reasonably easy to configure for immutability. I personally use restic to send (encrypted) blobs to https://www.borgbase.com which has append-only mode and monitoring to warn me if some backups didn't happen.

crossroadsguy · on July 27, 2023

borgbase is another “little” service that I use and like just like tarsnap and to some extent rsync.net. And they also have an excellent gui app Vorta (it’s FOSS; borgbase dev is the maintainer).

88913527 · on July 27, 2023

Even large organizations can have fairly regular availability issues. I appreciate the noted flaws of "single point of failure", but I also see orgs where 100s of people have access to the infrastructure, make a change, and then it breaks something. I wouldn't do business with an org just because they have many people, that won't mean they're operationally sound, at least not to my expectations.

k8sToGo · on July 27, 2023

If the data is super important you should be setting on two different providers anyways for backups.

VWWHFSfQ · on July 27, 2023

Honestly, whose data isn't "super important"?. All my data is super important. Even the crap I just throw on my Google drive. I want to keep it.

What is this mythical unimportant data that people still want to back up?

meesles · on July 27, 2023

I mean, you called it crap and then said it's super important. That's what hoarders say.

Subjectively you may feel that your data is super important, but objectively it probably isn't.

When people talk about 'super important' (totally a technical term), I think of things like DB backups in software companies, backups of financial reporting for firms, etc. Not your tax return from 2008.

rocqua · on July 27, 2023

My nginx config is not super important. My old reports written for study are not super important. My pirated movie copies are not super important.

These are examples of data that I could easily live without. Where losing it would either be a matter of re-doing old work, or just forgetting about old and minor things.

Roark66 · on July 27, 2023

>What is this mythical unimportant data that people still want to back up?

I have lots of stuff like this. Often it is easier to just back up an entire folder than go through sub/sub folders separating stuff into: important, not very important. Storage costs are low enough to just backup everything (almost). Also, one often doesn't know what may be important/useful in future. For example a couple of years ago I had this huge buildroot system (600gb) to build firmware images for a single board computer I spent quite a while to put together. The project I was doing it for got cancelled so I had no need to keep it. Still I wish I did, as I'd love to be able to tinker with it now, but 600gb is not a trivial amount to store so it got deleted. Most of this data was pulled from various online resources that don't exist anymore too.

What's the morale of my story? If you have a fast internet connection (I don't) backup "everything" to cloud. Then find "really important stuff" like the pictures of your children etc and back it up again to a different cloud.

If you're in a middle of nowhere on a slow LTE connection like me, building a nas box is not a bad idea for backups.

lanstin · on July 28, 2023

I have stuff like that on a hard drive in my home, on a persistent storage volume from Linode, and on Dropbox.

ivanhoe · on July 27, 2023

Anything that you stashed just for convenience, but you could re-download or re-generate it if really needed, or simply live without it... frankly, like 90% of stuff on my disks fall in the category "I'll read/view it one day", which in reality I'll probably never have time or patience to open ever again.

nixpulvis · on July 27, 2023

Strange, 90% of the things trapped in my flash memory are system files.

vntok · on July 28, 2023

You should get another drive and reserve it to Data. The cost is negligible and it really makes everything much simpler.

Optimizing your system or upgrading it just becomes a "trash boot drive and reinstall" operation, applied without a care in the world.

ilyt · on July 27, 2023

The stuff I don't want to fuck around searching re-downloading from torrent for example.

nixpulvis · on July 27, 2023

I really used to enjoy formatting my machines about every 6 months.

Well I used to until macOS kinda went off the rails a bit. Now it’s mostly an exercise in running my arch script for my thinkpad.

Being stuck between operating systems is kinda a mess though, makes backup and file sync in general really hard. But everyone’s gotta have their own cloud, right?!

Why can’t I just put a cloud under my bed and forget about it?

vntok · on July 28, 2023

> Why can’t I just put a cloud under my bed and forget about it?

Just buy a Synology NAS. Keep default settings, set up a few user accounts, tweak a few things here and there, enable encryption, install Active Backup on all your devices, done.

There are many cheaper/more open options for self-owned NAS storage, but contrary to a Synology they're definitely far and away from "and forget about it".

throwaway290 · on July 27, 2023

What use is SLA? If a service goes down for too long, are you really going to hire a lawyer sue it over SLA or just... use another backup?

abofh · on July 27, 2023

Not even then - most SLAs say that if it's breached, you pay less. Not that you get money back

AceJohnny2 · on July 27, 2023

It's not about suing, but defining expectations about how you can rely on a service.

For example, my team has people across the world for HW bringup, so we can't allow our code hosting or CI to be down for more than a few hours. Of course, backups have different uptime requirements, but as for everything, it's a tradeoff between features, of which an SLA is one.

Tarsnap's features are granularity of cost, reliability of storage, and encryption, but not 99.999% uptime.

mrweasel · on July 27, 2023

> It's not about suing, but defining expectations about how you can rely on a service.

Meeeeh, my ISP cut of around 100+ fiber connections in my town and spend three weeks fixing it. My neighbor have business line, there's an SLA on those that among other things, require them if reestablish his connection within 3 - 5 hours. It took them over 500 hours, so that SLA is useless for anything but forcing compensations.

The problem is that the SLA should give an indication of available resources, but in reality it's mostly a contractual thing for most companies, they'll pay the "fine" or refund a customer if they fail to hit their SLA and that's about it. Tarsnap most likely have better availability than many midsize competitors simply because it's just one person who really cares about it. Doesn't help if he's hit by a bus though.

bluGill · on July 27, 2023

SLAs can be meaningless like that. However the better ISPs have in place a backup system that doesn't use the same fiber/wires. Sure the backup might be a radio or satellite feed and so be slower, but it will get/keep you online. This costs are lot more per month though, so if you are not paying for that service your SLA will probably just be we give you a free month (which hurts them enough that they will do some things to prevent downtime, but not enough that they put redundant fiber paths in the ground)

ilyt · on July 27, 2023

The "problem" is that no sane company will sign for any damage compensation on some cheapo few dollars a month service.

throwaway290 · on July 27, 2023

A company could... if you have N users and you pay M for storage per user and downtime cost you X then it could be that a discount of Y means (M - Y) * N = X

AtNightWeCode · on July 28, 2023

Agree. You get a discount if something breaks. But SLA really only works for larger services where the cost of fixing something is small when compared to the discounts.

playingalong · on July 27, 2023

Then SLO (service level objectives) should be enough.

AceJohnny2 · on July 27, 2023

huh, never heard of SLO before.

toong · on July 27, 2023

and now also go google what SLI means ;-)

lifeisstillgood · on July 27, 2023

Very roughly:

SLI - Service Level Indicators - Metrics ie Latency of each request / response cycle

SLO - Service Level Objective - What threshold we are aiming for - 10 ms from request to response averaged over 1 hour period.

SLA - SL agreement - contract with custom yet what happens if we breach (credits given, put the CTO in stocks and throw eggs at him etc)

justin_oaks · on July 27, 2023

I know it's a joke, but I think if an SLA involved putting a CTO in stocks and throwing eggs at him then that'd encourage me to sign up for the service. Especially if the video of it were posted after every incident.

Instead we get refunded some pitiful amount when our business is seriously disrupted for an extended period of time.

lifeisstillgood · on July 27, 2023

:-)

My youngest once found some sort of chocolate drops called "unicorn poo" - which seems a more ironic thing to chuck at CTOs !

Phlarp · on July 28, 2023

Don't let the CTO be a scapegoat. Entire executive leadership, all board members and the 5 largest shareholders.

lifeisstillgood · on July 28, 2023

We have just written the Sarbanes-Oxley for the tech regulation industry- all we need now is a congresswoman and a senator and a good acronym

Secure

Technology

Oversight for

Corporate

Software

STOCS Act here we come !

Edit : yeah I could not get the K in ... that's hard

mrlonglong · on July 28, 2023

Korporate.

IntelMiner · on July 27, 2023

I'm curious how the prices shake out against services like Wasabi, since it's just dumping to an AWS S3 bucket

Wasabi does $7/TB with no ingress/egress fees. My NAS is set up to rclone to it about once a day and I've yet to have any problems

crossroadsguy · on July 27, 2023

I haven’t checked the pricing in a long time but you can use Tarsnap also if you have to backup only 7.3kb (okay I might ne exaggerating here but you get the drift) and pay for only that much. You can’t do that with Wasabi et al.

Also it’s really simple and does what it says it does, nothing more, nothing less. In today’s everything convoluted and bloated world this is a luxury imho. The GUI app is also quite good and functional. Support is prompt (that is if you need it).

You don’t have to worry about file being deleted just because your machine didn’t connect or backup for some time even if you keep paying (hello Backblaze) etc. I mean there’s no circus, melodrama , and cliffhangers involved.

I personally would never use it backup my entire laptop, due to price alone. But I have a subset of VVI files and Tarsnap is one of more than one backups for those files. So for that use-case Tarsnap is perfect for me, so far.

Aeolun · on July 27, 2023

Backblaze has kept my ‘shutdown two years ago’ machine data without issue. What problems did you have with them (or did others have)?

terinjokes · on July 27, 2023

Backblaze has a policy of allowing backups of external disks, but the disks have to be connected at least once every 30 days, or they'll delete the backups. I understand they want to avoid abuse, but the lack of any grace period, or ability of support to ad an override, really soured the service for me.

andrioni · on July 27, 2023

You can just pay extra for extended or infinite retention. https://www.backblaze.com/cloud-backup/features/extended-ver...

seanthemon · on July 27, 2023

I did this, it's come in use a few times..

Aeolun · on July 27, 2023

Huh, I had to start paying them $2 more for my nonexistent PC I think, but otherwise was fine. I have only 1 TB of total storage on that PC though, so maybe that’s the reason.

Mawr · on July 27, 2023

Uptime isn't an important property of a backup solution, so I'm not sure where the expectation comes from?

vntok · on July 28, 2023

It sure should be up when you need it, exactly at the time you need it.

jacquesm · on July 27, 2023

In future postmortems (of which I hope there will be very few or even none) you may want to spell out your 'lessons learned' to show why particular items will never recur.

idlewords · on July 27, 2023

It always amuses me how people want reassurance that the next crisis will be a fresh, new problem, and not one the person can demonstrably solve.

A lot of 'lessons learned' analysis boils down to this: in order to prevent a recurrence of X, we introduced complex subsystem Y, the unexpected effects of which you can read about in our next post-mortem.

bityard · on July 27, 2023

That's an overly cynical take, post-mortems are not for anyone's reassurance, they are a learning opportunity.

The airline industry is as safe as it is because every accident gets thoroughly investigated with detailed reports ("post-mortems") including what to do differently going forward. These are taken as gospel among all players in the industry and as a result, you very rarely see two different accidents caused by the same thing anymore.

jacquesm · on July 27, 2023

That was entirely not what I was getting at and is a cheap shot that is well beneath you, especially because I suspect that you know that that wasn't what I was getting at.

idlewords · on July 27, 2023

My comment wasn't intended personally; your words about "will never recur" just reminded me of this peculiarity of software systems, where it's often error handling/monitoring/backups/etc. that cause cascading failures in the systems they're intended to safeguard.

I'm sorry if I misconstrued your meaning, but I am flattered that you think there are things beneath me!

jacquesm · on July 27, 2023

Fair enough. I see the whole function of a postmortem in a very simple way: to avoid recurrence of the same fault. Yes, there will be plenty of new ones to make. But if you don't change your processes as the result of a failure you are almost certainly going to see a repeat because the bulk of the conditions are still the same. All it takes then is a minor regression and you're back to where you were before. This I've seen many times in practice and I suspect that Colin isn't immune to it. And yes, I look up to you, your writing is usually sharp and on point and has both amused me and educated me. So you have an image to live up to ;)

hgsgm · on July 28, 2023

You'd love my team's recent postmortem, featuring the comment "action items haeve been copied from the previous postmortem".

nijave · on Aug 1, 2023

Could be as simple as "test restore a new server every 1-2 years"

rsync · on July 27, 2023

You should consider this possible lesson:

"Our simple model that fails gracefully did so and was simple to recover"

Redundancies and failsafes are not free - they add complexity.

99.9% availability fails in boring ways.

99.999% availability fails in fascinating ways.

cperciva · on July 27, 2023

Yeah, I was going to do that but it was getting late, I wanted to get some sleep, and the post-mortem had already been waiting far too long to be sent out.

The main lesson learned was "rehearse this process at least once a year".

jacquesm · on July 27, 2023

Agreed, that's the big one. But also: when sleep deprived: take a nap!

vintagedave · on July 27, 2023

The infrastructure page* says,

> at the present time it is possible — but quite unlikely — that a hardware failure would result in the Tarsnap service becoming unavailable until a new EC2 instance can be launched and the Tarsnap server code can be restarted ... So far such an outage has never occurred

I read the postmortem as that a hardware failure did cause it to be unavailable and the code could not be restarted, a new server had to be built.

If that is correct, as well as writing up learning (as Jacques mentions) this page could be updated with outage information -- or even info on changes to reduce risk of repetition.

For what it's worth, one outage of a single day in fifteen years is impressive. If my ballpark math is correct, that's 99.992% uptime, ie four nines.

* http://www.tarsnap.com/infrastructure.html

mike_d · on July 27, 2023

This was an extremely well written and thoughtful postmortem, but I hope to never see one from you again. :)

Tepix · on July 27, 2023

It was a postmortem without the mandatory "how can we prevent this in the future" steps…

tptacek · on July 27, 2023

In 15+ years of running this service, this is one of two (2) postmortems he's ever published, and the first in eleven (yes, 11) years.

bluehatbrit · on July 27, 2023

I think that's a little unfair given what was in the postmortem. It may not be a separate section with the key points, but the information is all there of what the issues were and what the solutions are. I think it's fair to assume they're actually acting on those without them needing to be reiterated at the bottom of the page.

acedTrex · on July 27, 2023

I agree, we don't really need a "key points/future actions" section that boils down to "The service will be geo redundant"

Tepix · on July 27, 2023

Well, for sure he has fixed several bugs, but he didn't say that he would be testing his disaster recovery procedure every year in the future for example.

cperciva · on July 27, 2023

Yes, rehearsing the process every year is the main lesson learned. Sorry, it was getting late and I wanted to get the email out so I cut it short.

bombcar · on July 27, 2023

Time to get your toddler providing round-the-clock support! ;)

Have been having some luck reading https://www.amazon.com/No-Cry-Sleep-Solution-Toddlers-Presch... - available everywhere libraries (blockbuster for books!) are found.

cperciva · on July 27, 2023

She's generally a wonderful girl. Right now she's dealing with her second molars coming out and just picked up a cold though, which is throwing off her sleep schedule.

gfv · on July 27, 2023

How long do you keep the transaction logs before rewriting them?

I too had a few EC2 instances go down with signs of being severed from the EBS in the recent couple of weeks; mine were in eu-west.

cperciva · on July 27, 2023

There's a continual background cleaning process which depends on the amount of storage which can be reclaimed -- there's a tradeoff between cleaning too slowly (and paying for wasted storage) and cleaning too fast (and paying for lots of S3 operations). I think it averages a couple weeks right now.

dharmapure · on July 27, 2023

Thank you for the post-mortem Colin and I hope you get some sleep!

cperciva · on July 27, 2023

Thanks, I did! My long suffering wife was up at 3:30 though. :-(

LinAGKar · on July 28, 2023

What I'm wondering is, I had data on Tarsnap, why am I only hearing about this now?

nodesocket · on July 27, 2023

Some recommendations on the AWS front (not sure if some of these are already implemented since the postmortem does not go into AWS details).

- Setup nightly automatic snapshots of EBS volumes (this is supported natively now in AWS under lifecycle manager).

- Use EBS volumes of the new GP3 type, and perhaps use provisioned IOPS.

- Setup a auto-scaling group with automatic failover. Of course increases cost, but should be able to automatically failover to a standby EC2 instance (assuming all the code works automatically which the blog post indicates is not currently the case).

e63f67dd-065b · on July 27, 2023

Can you say a bit more about the log-structured S3 filesystem? I wrote something very similar recently (https://github.com/isaackhor/objectfs) and I'm curious what made you settle on that architecture. The closest thing I know of that's similar is Nvidia's ProxyFS (https://github.com/NVIDIA/proxyfs)

nextaccountic · on July 28, 2023

> the central Tarsnap server (hosted in Amazon's EC2 us-east-1 region)

What prevents you to distribute load among other regions?

(Also: did you ever think about abandoning AWS?)

rlt · on July 28, 2023

Nice write up. A couple questions:

- The use of “I” begs the question: what’s the “bus factor” of Tarsnap? If you were unavailable, temporarily or permanently, what are the contingency plans?

- Will you be making any other changes to improve the recovery time, or did the system mostly function as designed? For example having a hot spare central server?

throwawaaarrgh · on July 27, 2023

Are you gonna switch to us-east-2?

deathanatos · on July 27, 2023

> Following my ill-defined "Tarsnap doesn't have an SLA but I'll give people credits for outages when it seems fair" policy, on 2023-07-13 (after some dust settled and I caught up on some sleep) I credited everyone's Tarsnap accounts with 50% of a month's storage costs.

This speaks volumes to me about what kind of person Percival is; that credit would appear to be generously on the "make customer whole" side of the fence, and unlike the major cloud providers, he didn't make each customer come and individually grovel for it. And a clearly written, technical, detailed PM, too. This is how it ought to be done, and done everywhere. Thanks for being a beacon of light in the dark.

rsync · on July 27, 2023

"Thanks for being a beacon of light in the dark."

That's well put.

It makes me very happy to live in a world where tarsnap exists and is priced in picodollars.

cperciva · on July 27, 2023

For the record, I'm happy to live in a world where rsync.net exists. I've pointed quite a few customers in your direction over the years, when tarsnap hasn't been suitable for their needs for a variety of reasons.

jpgvm · on July 27, 2023

They make a good pairing. I backup my ZFS NAS to rsync.net for all my media and Tarsnap for all my documents/critical things.

cl3misch · on July 28, 2023

I am only using rsync.net at the moment, more specifically with the discounted "borg" mode without an explicit full shell.

Your comment sounds like tarsnap is more secure (in terms of longevity) than rsync.net. Is this true? If yes, why?

Genuine question, because I'm using rsync.net for my critical stuff and would gladly move to tarsnap if appropriate.

jpgvm · on July 28, 2023

For me it's three fold.

First is that my usage of Tarsnap pre-dates my usage of rsync.net so it's been the primary backup of my home directory since ~2010. I haven't felt a need to change it and the 2 occasions I needed to restore it everything worked perfectly. i.e don't fix what isn't broken.

The second is that while I can in theory restore from rsync.net I actually never have... this is more a testament to the relability of ZFS though I guess and local snapshots have always been enough. That said the convenience of send/recv is sort of awesome.

Lastly is that I don't use ZFS on my client machines. If I did I would probably consider rysnc.net for everything.

So it's not really an explicit I think one is more secure or durable than the other it's that Tarsnap has fulfilled my DR needs successfully for a long time and I have come to trust it to do so in the future.

hightrees2023 · on July 27, 2023

The downtime could have been much shortened if you had properly setup and _tested_ disaster recovery steps. Create a full fledged separate staging system which you can bring down and recreate and periodically test various failure modes + document all detailed steps of system restore etc.

Also I would suggest to think about the business long term and seeing if you can increase the revenue enough to enable you to hire a part-timer who can be of great help in case a similar event happens.

We are also a small cloud solution provider (we focus on ML API's) and over the years it has become clear to us that when you use cloud hardware (either dedicated or virtual), from time to time the outages periodically happen. RAM, HDD or other parts of the hardware just can malfunction anytime. So this is something which 100% needs to be taken into consideration when running any high availability online service over long-term.

idlewords · on July 27, 2023

Hats off to you for an honest postmortem and your capable handling of a difficult situation. The only remark I would offer is with respect to sleep deprivation—when you're the only person who can fix a problem, there's no shame in trading some additional outage time for a fresh mind. Though it feels weird to go nap when all the klaxons are blaring, problems are too easy to compound under the combination of adrenaline and inadequate sleep.

cperciva · on July 27, 2023

Don't worry, I had a couple naps in there. "This seems to be running smoothly but it will take several more hours; I'll set my alarm to wake me up in two hours and have a nap" is part of why I didn't notice the second step was unnecessarily I/O bound.

gus_massa · on July 27, 2023

IIUC the process had a few steps were you only had to wait while data was transferred or processed for long times. They were probably useful to take a nap or eat or just drink more coffee.

zokier · on July 27, 2023

Based on the description it sounds like it should be relatively easy to test this recovery process on a regular basis, to catch any lingering bugs and evaluate the recovery time. As they say, the only backups are the ones you have tested.

baz00 · on July 27, 2023

As someone who just discovered my DR process does not work by testing it, 100% this. The only plan that is likely to work is a repeatable tested one.

tialaramex · on July 27, 2023

Ideally, the thing you do in an emergency is largely routine, so that it happens by instinct rather than being a special case you need to remember. It should not be different in arbitrary ways.

For example in both trains and cars, thanks to anti-lock braking, the correct way to stop the vehicle ASAP is to brake just like normal but as hard as you can, the computers will automatically solve the much trickier problem of turning your input into maximum deliverable braking force by periodically releasing brakes on sticking wheels.

If you run a fire drill, it's surprisingly difficult to get employees to use fire doors that they're used to finding alarmed and unusable. Even though intellectually they know that, say, the door at the bottom of the stairwell is a fire door, with crash bars and leads directly to the outside world, and this is a fire drill, they are likely to (for example) exit on a higher floor and go through a chokepoint lobby, as they would normally, instead of following this safer path that is emergency only. Sadly it is hard to fix buildings after construction if they were designed with such "unused" emergency exits.

For a backup process, having restoring machine images be a service that is sometimes, though not constantly, used anyway for some other reason, is a good way to be comfortable with how it works, that it works, etc. At work for example we routinely test upgrades on test servers restored from a recent backup. Restore serviceA to testA, apply upgrade, discover upgrade completely ruins the service, throw testA away and report this upgrade is garbage. But in the process we gained confidence in the restore process, infrastructure people instead of trying to recall something they only ever did in a drill, when things go badly wrong are very used to this procedure because they do it "all the time".

baz00 · on July 27, 2023

This is terrible. Instinct cannot be trusted. Write it down.

bombcar · on July 27, 2023

There are two types of emergencies - checklist ones, and panic ones. You need to have both, but realize that in the panic ones people do NOT operate rationally.

This is why house doors open in but business doors have to open out - if there’s a crush against a fire door it opens.

You even see this in aviation, where everything is checkisted; the pilots will first stabilize the plane in an emergency and then run the checklist. And small plane that operate unexpectedly are always higher in crash rates.

jpgvm · on July 27, 2023

Pilots are a little special, their panic mode is also a checklist, known as the memory items.

This doesn't work for normal people because normal people don't drill non-normal events until the response is instinctive.

bombcar · on July 27, 2023

Normal people should drill certain non-normal events (for example, all drivers should know how to deaccelerate and get off the road quickly).

But you should NEVER design a system that requires normal people to drill non-normal events; even planes have been redesigned to "fix" problems where the pilot had to do something unintuitive or unexpected, because eventually it WILL catch up to you.

bluGill · on July 27, 2023

Note that in airplanes (unlike cars) you normally cannot just get in a new one and fly. You first get training on that particular plane. If everything goes perfect any pilot can get in any plane and fly it, but if any little thing goes wrong they better know how the plane flys very well so they can get it stable enough to run the checklist.

tialaramex · on July 27, 2023

You probably shouldn't just get in a new car and drive it, but people do. I remember at a hire car place once the team I worked with were given an automatic, the guy driving has never driven an automatic transmission before, but his license authorises it (UK licenses allow everybody to drive an automatic, but you need to test in a manual to drive manual), and so they just lent him a car with a completely different driving style. He had to get them to show him how to even drive it away out of their car park.

I learned in the small car from the same brand as my father's larger car, so that the controls are in the same place, the symbols on stuff are identical, all that was different once I have a license and borrow dad's car is it's longer and has more power.

It also probably shouldn't be legal for me to drive today, but it is. I learned 25 years ago, and I haven't driven anything in over a decade, so a rational system would say nah, you're too rusty, get a refresher course, but there's no mandate for that.

bombcar · on July 27, 2023

It is kind of mind-boggling insane that you can be 25 years or (younger in some states/places), having only ever driven a smart car (so you have your license) and you can walk into U-Haul and rent a 26 foot box truck with a trailer, and the most they do is tell you not to go under low overpasses or into drive-thrus.

cperciva · on July 27, 2023

Yep! I've been meaning to do it for a while but there was always something higher priority... I didn't realize until this outage that it had been almost a decade since I had tested it.

Rehearsing this annually is definitely going to be a high priority.

mplewis · on July 27, 2023

I always appreciate seeing a professional, courteous, and honest postmortem like this one.

verytrivial · on July 27, 2023

(caveat: I may be running on old tarsnap company info but) I must say, the ONLY thing that has ever made me shy away from seriously using tarsnap was the prospect of an unexpected Colin Percival outage. i.e. key person risk. I'm guessing I'm not alone in this.

jacquesm · on July 27, 2023

It's an MTBF like calculation: do you trust the well engineered one person company that has a well engineered solution with few moving parts over the much larger company that has far more moving parts and their probably less well engineered solution with far more moving parts?

I personally would go with the simpler solution because in my experience you need an awful lot of extra complexity before you get to the same level of reliability that you have with the simpler system. Most complexity is just making things worse.

You can see this clearly when it comes to clustering servers. A single server with a solid power supply and network hookup will be more reliable than any attempt at making that service redundant until you get to something like 5x more costly and complex. Then maybe you'll have the same MTBF as you had with the single server. Beyond that you can have actualy improvements. YMMV and you may be able to get better reliability at the same level of performance in some cases but on average it always first gets far more complex, costly and fragile before you see any real improvements.

I strongly believe that the best path to real reliability is simplicity (which is: as simple as possible) and good backups. For stuff that needs to be available 24x7 and 365 days per year this limits your choices in available technologies considerably.

deltarholamda · on July 27, 2023

While I get this as a risk, I'm not convinced it's any more risky than a larger corporate entity.

This is Colin's job. Colin has his name attached to it. It's really important to Colin.

You're not going to get the same kind of service from BigBackupCorp. Their employees are replaceable, their management is replaceable, and to be honest, you as a customer are replaceable, if they decide to move in a different direction and become BigFlowerArrangementShippingCorp.

The neat thing about a small business is that it runs entirely on its own profits. There are no stock price games or VC jiggery-pokery or anything like that. If it's a profitable business, there will be somebody to come along and take it over and make it their job with their name attached to it. I think the open Internet benefits a lot from this sort of thing.

cowsandmilk · on July 27, 2023

It isn’t necessarily about Colin quitting. Key person gets hit by bus is also always a concern. You can say someone will pick it up, but I know nothing of whether such plans are in place. Does the person who would inherit the business have the know how to sell it? Is there enough documentation in place for a transfer of assets to be successful?

idlewords · on July 27, 2023

This is how that scenario shakes out:

  1. Key person gets hit by bus
  2. You see the black bar on Hacker News and learn the sad news
  3. You go download all your data from the service, which is still up because there is no bus access to data centers.
  4. You feel like a jerk for all your creepy "hit by bus" talk.
  5. A few weeks later, some VC-funded operation with multiple employees you depended on disappears overnight without a trace.

marcosdumay · on July 27, 2023

> You go download all your data from the service

Just about this step... you are supposed to have it already. You just have to find another service and start using it.

saagarjha · on July 28, 2023

I wonder if Tarsnap could stand up to everyone downloading their data from it at once, especially without anyone to help keep it alive.

XCSme · on July 27, 2023

Companies and corporations get "killed" often too, even if the people in them are alive.

idlewords · on July 27, 2023

Make a list of the competitors tarsnap has outlived and maybe it will change your calculus a bit. The risk you need to evaluate is not "what if something happens to the proprietor" (which I've always found pretty macabre), but "what if something happens to him and then the service goes down and also I never backed up my backups". This is a risk you can make as small as you want with judicious planning.

saalweachter · on July 27, 2023

I mean, if you are on HN, you will probably learn of a Colin outage within 24 hours, so practically speaking you would really only have a problem if your primary data storage, Tarsnap, and Colin all failed in the same 24 hour window or so before you had time to switch to a new backup provider.

koolba · on July 27, 2023

Pretty sure his brother works on tarsnap too.

They should take separate buses to ______.

cperciva · on July 27, 2023

Pretty sure his brother works on tarsnap too.

Yes, I hired him in 2015 IIRC. If you look at tarsnap's GitHub you'll see a lot of commits from gperciva.

pkx166h_ · on July 28, 2023

Oh! Do say Hi to Graham for me.

He mentored me so that I was able to contribute and eventually help maintain and manage LilyPond's Documentation and Patch Testing in a meaningful and rewarding way - all without any programming experience.

koolba · on July 27, 2023

Nice. Being able to work with your family is great.

bombcar · on July 27, 2023

I would never consider a backup provider to be more reliable than that, because if you depend on it, it will fail you at the hardest time.

Better to have multiple layers of backup, of which tarsnap and friends are only one, and verify regularly.

abiro · on July 27, 2023

> The second step failed almost immediately, with an error telling me that a replayed log entry was recording data belonging to a machine which didn't exist. This provoked some head-scratching until I realized that this was introduced by some code I wrote in 2014: Occasionally Tarsnap users need to move a machine between accounts, and I handle this storing a new "machine registration" log entry and deleting the previous one

Recommend writing a TLA+ model to catch stuff like this

colonwqbang · on July 27, 2023

What would be the benefit of tarsnap over using something like restic+backblaze at order(s) of magnitude lower cost? What specific need would motivate you to pay $3000 per TB-year?

carapace · on July 27, 2023

Some of us have lots of extra money and like an excuse to give some of it to cperciva so he doesn't have to work a shit job and can apply his skills and talents to bigger, better things?

(People here asking about the low Bus Factor: you don't keep your backups in one service/location, eh? You use Tarsnap and Restic with Backblaze, Rsync.net, S3, etc. right? "Backups are a tax you pay for the luxury of restore.")

jpgvm · on July 27, 2023

Extremely good deduplication means that for the core set of very important data I backup to Tarsnap the costs are negligible. I imagine the math is probably different if your data is changing more frequently. I for instance use other services to manage my video and photo libraries but my accounting databases, critical documents, etc are backed up to Tarsnap.

I have been using Tarsnap for a decade and not only has there been minimal availability issues there have been almost no issues of any kind that I can recall.

mherrmann · on July 28, 2023

It sounds like most of the 26h downtime was spent restoring backups. Incidentally, this is exactly the reason why Tarsnap is unusable for me for production environments. Backup restoration (as a user) is excruciatingly slow. When my systems are offline, I have no patience to wait for hours for my backup service. Maybe things are better now; Last I tried was a few years ago when Tarsnap took on the order of magnitude of one hour to restore a backup of a few GBs.

akashshah87 · on July 27, 2023

Unfortunately, looks like https://www.tarsnap.com/infrastructure.html will have to be updated.

>> So far such an outage has never occurred; but over time Tarsnap will become more tolerant of failures in order to minimize the probability that such an outage occurs in the future.

Tachyooon · on July 27, 2023

Unrelated to the outage, but I'm curious nonetheless: would it be possible to hook up Tarsnap's encryption software to a Dropbox folder? I'm not sure if it even makes sense to use Tarsnap for this, but I'd love to have an easy setup that allows me to use Dropbox's servers but only let them see encrypted data so they can't snoop.

matthiaswh · on July 27, 2023

You probably want something like https://cryptomator.org/

ivoras · on July 28, 2023

Doesn't plain old Duplicity (https://duplicity.us/) do that already? (except for de-duplication)

aborsy · on July 27, 2023

Tarsnap is undoubtedly expensive, but it also donates to various efforts!

Neglecting the pricing, does Tarsnap have any advantage over Restic?

Restic also deduplicates, using little data.

mattbee · on July 27, 2023

The deduping in restic is just on the edge of acceptable for me, making me think I'd have trouble with a lot more data. Basically the one a month "prune" operation takes about 36h (to B2) . I feel I could be tuning something but also it works and I don't want to touch it.

aborsy · on July 27, 2023

I backup around 2TB with Restic, also tried locally with Borg. The size is nearly the same. Sadly, I can’t even test with Tarsnap! (absurd pricing for 2TB).

justsomehnguy · on July 27, 2023

> absurd pricing for 2TB

Well, it can't be that ba..

    $0.25 x 2000 = $500

Yikes. And this is without BW costs.

At $500/M you can just rent a dedicated physical server with a lot of HDDs and still have money left for your favourite pumpkin latte.

For comparison rsync.net says it's $0.015 per GB/Mo, for 2TBs that's $30/m and no BW costs.

jiripospisil · on July 27, 2023

Not in any way affiliated but I'm a happy user of Scaleway's Object Storage [0] together with S3QL [1]. It's not the fastest but they give you 75GB of storage for free so that's a fair trade [2].

[0] https://www.scaleway.com/en/object-storage

[1] https://github.com/s3ql/s3ql

[2] https://www.scaleway.com/en/pricing/?tags=storage

ilyt · on July 27, 2023

I'm renting $15/mo 2TB atom machine from OVH/kimsufi as second target for backups.

Now that I think about it... some kind of micro-distributed backup server (throw on few of your machines, auto-replicate between) would be a neat project...

justsomehnguy · on July 27, 2023

It's not even that neat.

Just slap rsync/syncthing to the backup dir.

ilyt · on July 28, 2023

I do use syncthing on NAS + remote cheapo server for my day to day stuff, and bareos for rest.

It's just PITA to add another instance.

sandgiant · on July 27, 2023

Curious how much you backup, which version of restic you're running and why you think the deduplication is borderline unacceptable. There were several major (orders of magnitudes) improvements made to pruning within the past ~1 year, that's why I'm interested.

mattbee · on July 27, 2023

A straight upgrade, that I can do :) It's been running for years without one.

I was only edgy about it because when it takes 36h it blocks the next daily backup, and I wondered whether that was going to get worse (it hasn't).

chibea · on July 30, 2023

The max-unused percentage feature is well worth it to 80/20 the prune process and only prune the data which is easiest to prune away (i.e. not try to remove small files big packs but focus on packs which have lots of garbage).

In general, there's an unavoidable trade-off between creating many small packs (harder on metadata throughout the system, inside restic and on the backing store but more efficient to prune) versus creating big packs which are more easy on the metadata but might create big repack cost.

I guess a bit more intelligent repacking could avoid some of that cost by packing stuff together that might be more likely to get pruned together.

zgluck · on July 27, 2023

Tarsnap is undoubtedly expensive, but it also donates to various efforts!

I mean.. you could purchase a cheaper service and also donate to various efforts. Bonus: Then you'd also be able to pick those efforts.

bartvk · on July 27, 2023

How do you compare the two, price-wise? With Restic, you have to provide your own storage.

RockRobotRock · on July 27, 2023

Aren't these storage prices absurd? Please let me know if I'm misunderstanding.

gnfargbl · on July 27, 2023

The prices are absurdly high if your use-case is storage of large volumes of data that regularly change. It wouldn't be sensible to use Tarsnap for that, and you probably want to use one of the bulk backup services instead.

Tarsnap makes a lot of sense when you benefit from the encryption and (especially) de-duplication features that it offers. For me, all of my most important personal and business data, from multiple decades, compresses-and-deduplicates down to around 6GiB. Considering the high value of the data I store in it, tarsnap's pricing actually feels absurdly low.

patrec · on July 27, 2023

> Tarsnap makes a lot of sense when you benefit from the encryption and (especially) de-duplication features that it offers.

Can you provide more detail why you think so? I don't believe there is any use case in which tarsnap makes sense, other than maybe some Plan-C backup solution which you fall back on in the highly unlikely event that neither Plan-A nor Plan-B worked.

Concretely, what benefits does tarsnap offer over restic or borg in combination with rsync.net, to make up for the substantial downsides (such as insanely slow restore, complete lack of wetware redundancy or being written in C[1])?

[1] https://www.tarsnap.com/bounty-winners.html

rssoconnor · on July 27, 2023

I use tarsnap because the asymmetric crypto means I can give my cron job authorization to create backups, but it doesn't have authorization to read or delete(!) backups.

This ability is critical to prevent a compromised system from having its data wiped and having all backups wiped as well.

I haven't been able to figure out how to do this in any other system. But if someone has a tutorial, I am all ears.

rssoconnor · on July 27, 2023

Replying to myself:

I've been musing on this subject all afternoon. I'm a user of Tarsnap, and I do find it expensive, in the sense that I would prefer to backup larger amounts of data for less amount of money. At the moment I backup photos separately from Tarsnap and in an adhoc way.

But I still cannot figure out a way to get all the benefits I get from Tarsnap from any other software solution.

* Must be usable under Nixos.

* Backups must be asymmetrically encrypted so that backups can be automated, yet a compromise of the system cannot immediately gain read authorization to arcived data.

* Backups must be append-only without further credentials, or otherwise prevent a compromised system from being able to delete existing archives.

* Deduplication between archives while still allowing archives to independently be deleted.

Using the ZFS snapshot functionality with rsync.net, for example, with Duplicity comes close. However, as I recall, duplicity wants to regular (typically monthly) full backups and then incremental backups from there. You cannot remove these full backups without deleting the entire month's worth of backups, and because the full backups are independently encrypted, there is (of course) no deduplication between full snapshots, even though the data is still likely largely the same. And because the snapshots are encrypted, it is impossible for the rsync.net storage to see or even know that large parts of the encrypted data is identical.

AFAICT there is really nothing else that does what Tarsnap does.

ufmace · on July 27, 2023

IMO AWS S3 works fine for this:

* Create a S3 bucket and enable versioning * Create a new user and give it only s3:PutObject on your new bucket * Create an auth keypair for that user and put it on your server

Now any server compromise that gets those keys can only add new data to your backup bucket, and can't read, overwrite, or delete any previous backup.

There's no dedup, so that could be a deal-breaker.

There's also no real encryption (though that shouldn't be too hard to add I guess). I don't really see the gain though. Anyone who compromises the server keys is blocked from reading by AWS permissions. Granted, that's not quite as reliable as good crypto for blocking reading, but on the deleting side, there's never going to be anything but the auth system of whatever solution you're using to block that.

I get that there's some applications out there where preventing data exfiltration is important enough to need strong crypto (though is that really important when we're talking about full compromise of your server, which gets the attacker direct access to the data anyways?), but I decided that the risk of failing to implement properly or full data loss due to losing the keys or them being corrupted wasn't worth the risk of blocking somebody who somehow compromised the AWS account security from being able to read backup data.

rssoconnor · on July 28, 2023

Deduplication isn't necessarily a deal breaker. Let's see.

My main machine is currently storing 1.6 TB (compressed) of total archives with tarsnap, but only 33 GB (compressed) of unique data within those archives. So if S3 is 50x cheaper, then not having deduplication would be a wash.

However other comments here suggest that S3 is only 10x cheaper.

Aachen · on July 28, 2023

Restic's rest-server does this too, or afaik you can configure restic to use S3 with object locks or whatever it's called

Edit: just saw your sibling / reply-to-self comment. This setup would fulfill the requirements you posted, or at least I would assume that restic runs under (or compiles for) your nix OS. It doesn't use asymmetric encryption for this but the goal of append-only is there

> because the snapshots are encrypted, it is impossible for the rsync.net storage to see or even know that large parts of the encrypted data is identical

If they don't see a large amount of data incoming, they'll know large parts of the data are identical (or removed, I suppose). Hiding traffic volumes is fundamentally only possible by introducing dummy data

rcxdude · on July 27, 2023

Rsync.net can do similar, because of their snapshot system. By default there's no way to delete a snapshot except through the schedule set up (you can write to them and ask if it's necessary for some reason). It doesn't use asymmetric crypto to do this but it's neither necessary not sufficient for the purpose of preventing accidental or malicious deletion of backups

https://www.rsync.net/resources/howto/snapshots.html

rssoconnor · on July 27, 2023

Right I've considered that. It is however limited to like 7 snapshots.

The thing is that tarsnap deduplicates over arbitrarily long time periods, letting me make arbitrarily long staggered sequences of retained archives.

Perhaps I should really reconsider if I really need such long lived archives, but it is hard to bring myself to drop them.

rcxdude · on July 27, 2023

You can do the same with rsync, they just charge you for the extra space (the differential space, like with tarsnap) instead of providing them for free (from what I can see the limits in the web UI are like 1000 daily and weekly snapshots, 200 monthy snapshots, 100 quarterly snapshows, and 10 yearly snapshots, which I suspect are arbitrary 'good enough for most' numbers, not some hard limit based on what they can profitably provide). I personally use the direct ZFS option so I can set up the snapshots exactly how I want, but it is extra effort and doesn't provide quite as good a guarantee that they won't be overwritten (it's resilient against a compromise of the server uploading the backups because I've set up scripts that way, but it doesn't protect against compromise of the logic credentials for the VM in the same way).

rssoconnor · on July 27, 2023

Oh thanks. I did not know that. That does seem good enough.

I just now need a deduplicating asymmetrically encrypted backup program.

I've tried duplicity in the past, and maybe I should try it again. But my recollection is that duplicity will just fail to do backups at the slightest hint of any problem. Like maybe if the last backup was interrupted then no more backups for you until you attend to it.

Edit: More memories returning of having to dig out my decryption key to resync the metadata when duplicity gets unhappy, and then since my target server was append-only, duplicity was upset when it wasn't allowed overwrite any of it's incomplete metadata files. I guess the ZFS snapshot technique would alleviate the latter issue.

To be fair, if tarsnap gets confused it needs the keys to do its fsck command, but I recall this sort of thing happening regularly with duplicity and almost never with tarsnap.

rsync · on July 27, 2023

No, not limited.

An rsync.net account can have any arbitrary schedule of snapshots - including days, weeks, months, quarters and years.

gnfargbl · on July 27, 2023

My post was specifically addressing the comments around cost.

patrec · on July 27, 2023

No, you specifically claimed that "tarsnap makes a lot of sense" in certain situations. I think that's incorrect and that there are basically no situations where tarsnap makes sense as a primary or even secondary backup solution. Even when completely ignoring costs. This is a strong claim, so it should be easy to provide a counter-example if one exists.

ilyt · on July 27, 2023

That if there was zero other options that offer encrypted backups, but other software offers that too. Many also offer deduplication. And deduplication is less of a needed feature if you dont pay thru the nose for GB

mekster · on July 27, 2023

It's insane. Not sure how anyone can accept such a rip off pricing.

Tarsnap : $0.25 / GB storage, $0.25 / GB bandwidth cost

rsync.net : $0.015 / GB storage, no bandwidth cost

s3 : $0.023 / GB storage, some complicated bandwidth pricing

If tarsnap is built on top of s3, they're charging 10 times for the storage cost. Easy money from the uninformed?

LukeShu · on July 27, 2023

How's the saying in every HN thread go? "Don't set your prices based on your costs, set your prices based on the value you deliver." or something like that.

Tarsnap is a wonderful piece of software. You're paying for that.

That said, is the value of "Tarsnap" worth the price difference from "Borg+rsync.net"? (Or Restic, I've been meaning to look into Restic). I'm not so sure. These days I'm a customer of rsync.net, not of Tarsnap.

But I still firmly disagree with the "Colin's just exploiting the uninformed" angle.

patrec · on July 27, 2023

Completely ignoring costs, can you name a single use case for which tarsnap would be better than Borg or restic on rsync.net?

i_have_an_idea · on July 27, 2023

As a consultant on a DR project, you can increase your billables by 10x due to the extremely slow backup restore speeds.

doolallylah · on July 27, 2023

Backing up anything Windows with more granularity than top level directories?

Ugh.

Try picking a choosing specific file types or file extensions from filesystems holding thousands of files.

I ended up having to cobble together some god-awful pre-process powershell with multiple pipes just because restic fails to be able to grep using Windows reliably.

:(

tredre3 · on July 27, 2023

> restic fails to be able to grep using Windows reliably

That is news to me. I backup almost a million files spread across 4 Windows devices, with heavy use of --files-from and --iexclude and it seems to work. What am I missing?

I agree that restic filtering options are pretty limited. Too limited, really. But what's there seems to work?

rssoconnor · on July 27, 2023

With regards to restic, Tarsnap preforms asymmetric encryption which lets you perform automated backups without needing to enter any passwords (or otherwise storing your encryption passwords in plain text).

With regards to duplicity, Tarsnap does full deduplication across all backups for any given "machine", while still letting you independently remove any snapshots you like. i.e. no special "full snapshot" that must always be kept around, and no need for multiple full snapshots that have no deduplication between them.

electroly · on July 27, 2023

rsync.net is also overpriced for strictly backup purposes. Make sure you do check out restic; it can use S3 or Backblaze B2 (I actually use both) as backends instead of something expensive like rsync.net. The value of these boutique storage services evaporates when you start using restic.

AnonHP · on July 27, 2023

The complexity with S3 or Backblaze B2 is the granular pricing for various operations on the service. It’s difficult to optimize costs compared to a fixed “$X per GB” pricing that’s easier for people to understand and forecast expenses a bit more easily. Other than that, rsync.net provides the same pricing for different days center locations, which means there is some subsidization going on.

There are services like rsync.net that support borg at a lower price. Borgbase is one of them. I haven’t used either of these.

LukeShu · on July 27, 2023

> There are services like rsync.net that support borg at a lower price.

And rsync.net is even one of them!

"Special "borg accounts" are available at a very deep discount for technically proficient users." -- https://www.rsync.net/products/borg.html

...hrm, it seems they didn't update that page with last year's price drop. https://web.archive.org/web/20220319135035/https://www.rsync... It used to be a deep discount, now it's the same for <100TB. I wonder if they did drop the Borg prices too and just forgot to update that page?

mekster · on July 27, 2023

I treat zfs far more reliable than restic which is still rough around the edges the last time I tried a few years ago.

Run rsync to the target and forget is quite easy, though I admit rsync.net's deal is getting worse these days posing minimum usages here and there.

tptacek · on July 27, 2023

Yes. That's the thing about Tarsnap, a service with a TikZ diagram on its front page, built around a Unix utility, that meters in picodollars. It's meant to bilk money from uninformed mom-and-pop backup users.

nemothekid · on July 27, 2023

I'm having a hard time to believe that anyone remotely interested in Tarsnap's value prop is also an "uninformed mom-and-pop".

This "uninformed mom-and-pop" is potentially compiling the client application from source, but can't do basic math to compare tarsnap's pricing to the top 20 or so competitors that rank above tarsnap in SEO?

tptacek · on July 27, 2023

That was sarcasm. :)

nemothekid · on July 27, 2023

Should have looked at the username, massive egg on face.

idlewords · on July 28, 2023

I tried Tarsnap briefly once and was charged billions of picodollars. It definitely preys on the ignorant.

thenickdude · on July 27, 2023

S3: Upload bandwidth is free, download is what I'd consider to be astronomical at $0.09/GB ($90/TB).

Geez, that's really not improving the comparison with Tarsnap.

gnfargbl · on July 27, 2023

If you are primarily cost-driven, you missed one:

Backblaze: $0.005 / GB storage, $0.01 / GB download.

ilyt · on July 27, 2023

You can send your backups to backblaze and S3 and it still would be cheaper

pritambarhate · on July 27, 2023

There is also cost development and ongoing support. You haven't factored that in your 10 times the storage cost calculation.

vasco · on July 27, 2023

The 10x doesn't seem like it's enough to pay for more than a single EC2 server though.

mekster · on July 27, 2023

Sounds like a very poor excuse when competitors are way cheaper.

Aeolun · on July 27, 2023

You don’t really need an excuse when people are paying regardless.

RockRobotRock · on July 27, 2023

>Easy money from the uninformed?

I don't think so. Anyone who can use this software I'm sure knows what other options exist.

mekster · on July 27, 2023

Then I'd like to know what they think of the benefit of spending $25/mo for just 100GB.

baz00 · on July 27, 2023

This is why I use s3 sync, versioning and lifecycles for mine on a Standard-IA bucket. My 120Gb costs $1.80 a month. No way would I pay tarsnap prices.

The 120Gb is the contents of my OneDrive and local repository trees. This is everything I've ever done that I want to keep and is approximately 115Gb of photos and not a lot else!

ilyt · on July 27, 2023

> If tarsnap is built on top of s3, they're charging 10 times for the storage cost. Easy money from the uninformed?

That's pretty much any SaaS... look at the various log or metrics gathering solution, where you pay serious multipliers of what would cost to run same software on your own instance.

AnonHP · on July 27, 2023

You can get a HN reader’s discount on rsync.net (email them to ask for it or search on HN), bringing the price down to $0.12 / GB, and everything else remains the same.

rsync · on July 27, 2023

The HN reader’s discount is lower than that these days (we probably "normalized"[1] parents account to reflect that).

We also have .edu / student / nonprofit discounts. Email us.

Finally, Debian and FreeBSD project members get free accounts. See the committers handbook, etc., for details.

[1] Whenever we lower our prices, we increase quota on existing customers to "normalize" them to the new price/GB. If you do nothing, your rsync.net account just grows over time due to this.

wink · on July 27, 2023

I'm not here to defend it but I only use it as a secondary backup for my mail server (so basically append-only) and for a low amount of gigabytes it's fine and just works. No, I wouldn't want to write changing 100GB there.

RockRobotRock · on July 27, 2023

in that case I'm curious about all the downvotes for asking a reasonable question

crossroadsguy · on July 27, 2023

Well while I use Tarsnap for a very small amount of data (due to pricing) and I quite like Tarsnap for my use-case scenario, your question might have been downvoted due to two reasons:

- your comment was a very valid question but rather quip-like, offhanded, seemed off etc etc. I mean something like that…

- Tarsnap is an hn darling

If I have to pick one I think it’s the latter :)