Tarsnap: Our costs just went down, so we're lowering our pricing for customers too!
Comcast: Our costs just went down. Effective immediately, we are adding a new cash management fee to your bill to cover our costs of handling all this new cash.
Cash management actually does cost money. You expect bills you get from them to be definitely issued by authorized representatives of the US government and bills you deposit with them to not disappear, right? Those are not naturally occurring properties of green paper.
Below scale, cash management is thrown in just to get people's business, but at scale, it costs what it costs. Walmart, which has substantially more problems than you on that score, reportedly spends millions of dollars every year just dealing with pennies.
The "cost of doing business" all gets passed along to the consumer, inevitably. That's how businesses work. The consumer has to cover the "cost[s] of doing business" and then some, or companies don't make any money.
I'm using the phrase to mean something that's inconvenient for a company but ultimately necessary to make more money. Not just any task that's part of everyday business. But maybe I'm mistaken.
They complained that they had to handle too much physical cash and that there was a limit to how much I could deposit in a given month (into a business account, mind you).
I can't think of TOO many more first world problems than that, to be honest...
You realize that they have to pay people to move that cash around, count it, double check it etc, and then they have to pay to store it and maintain it? People don't work for nothing, and storage doesn't cost nothing.
It might be. My (Canadian) bank's "small business banking" plan charges $2.25 per $1000 of cash deposits beyond the first few thousand/month. (And $2.25 per $100 of coins.)
Of course, if I ever deposited cash I would be on a different banking plan; but it's not entirely unheard-of.
The one thing that has put me off Tarsnap until now is, and not to be intentionally morbid, the bus factor [1] of 1. As far as I know its just Colin that runs it. In the unlikely event we need to access our backups, tarsnap is down and Colin is no longer available to maintain it we could be stuffed.
Colin, do you have a contingency plan in place if you are not available?
What has put me off Tarsnap was not realizing that the price per GB is for compressed storage. So while it looks high compared to Amazon's per GB pricing, it's a lot cheaper.
I don't recall if this was on Tarsnap's website when I had first checked, but it might be worth reporting the average compression rate to make more visible to prospective customers what the Amazon S3 storage portion of the pricing is.
Yeah, I almost wrote to them suggesting exactly this, but then I figured what the heck, I'll just sign up. The tool would definitely be helpful, though.
I'd assume the average compression rate to be very low, no? Don't modern cryptosystems have to encrypt then compress to avoid some forms of attack, thus making the compression not very effective?
In the recent web attack instances, it's not just known plaintext, it's using compression to discover unknown plaintext by repeatedly guessing across multiple requests and watching for the request size to change. So an attacker would need the ability to alter your filesystem and make backup requests, repeatedly, for that attack to matter to Tarsnap.
The whole point to tarsnap is to be able to safely encrypt any and all of your data, including compressed data. We wouldn't expect that a .tar.gz being backed up is somehow "less secure" than the .tar file, we'd demand the same security for both.
Compression becomes a problem when it can be used as an "oracle" into the key used for a given stream of ciphertext. The reason TLS is susceptible is because the attacker can MITM and control at least some aspects of the plaintext or shared client-server state, and iterate repeatedly to refine their guesses.
These issues simply don't apply in the same way to backing up files.
I sure there are still theoretical issues that would need to be worked through in deciding how you'd do something like this, issues which could be better explained by any of the many crypto types who hang out here. But don't cargo cult your treatment of crypto.
A strong construction should be impossible to compress because encryption maximizes bit entropy, which should make the cyphertext indistinguishable from a PRF (ie a random noise source).
MAC, decrypt, decompress.... always^9 in that order.
Right, encrypt then compress is useless. And compress then encrypt is unsafe in some cases[1]. Apparently backup isn't one of them, you need to be able to choose the plaintext.
Not really. If you compressed your data on Amazon, you'd get a lower per-GB price too. I think tarsnap is good, and you're paying him for the service of interfacing to Amazon for you. It's not accurate to say his price of compressed storage is comparable to uncompressed storage.
Very true, and infact that is what I do. I use my VPS providers backup system for snapshotting my servers and an offsite file store for additional backups.
However if my choice for the latter is between a sole trader and an organisation with 20-30 employees and multiple directors that obviously is the more sensible choice. No matter how much more you like the tools or respect the owner with something as important as backups you need to go with the most reliable option.
Sure, I could use tarsnap and another file store for three backup locations (which may be something I should do) but then the argument is why not go with two companies that have a bus factor of more than 1?
For my part, it's a closely related reason: I'd sign up for the service today if the client were Open Source, rather than "look but don't touch" source.
If you're using Tarsnap, and you hear it fails, then quickly make a new backup somewhere else. The chance of your primary storage failing during that unlikely and limited window seems low.
Yes, that could work however you have now lost your historical backups and still exposed yourself to an unnecessary risk.
Again sorry Colin for this but there is a 1/500,000 chance he will be struck by lightning this year [1]. Amazon 3s (the backend of tarsnap) has a durability of 99.999999999% [2] and so you are 200 thousand times more likely to lose your backup because of a lightning strike than due to a corruption on s3.
I think the fact that Colin handles everything himself increases my trust in tarsnap, because he clearly knows what he is doing. I trust it in a way that I would not trust a faceless corporation like crashplan, even though they also advertise client-side encryption.
I have a secondary emergency backup in a bank vault, so even in the unlikely event that tarsnap were to crash and burn at the same time my disk fails, I still have another backup, it's just 1-2 months old.
Honestly, I'm not sure I would be a tarsnap customer if it were a larger operation where I could not gauge the trustworthiness and skill of all the involved parties.
If you get hit by a bus, who alerts your users and how? If your users aren't alerted, they may not realise that anything is amiss until AWS freeze your account for non-payment (for example).
I can't say that I spend a lot of time keeping up-to-date on the project leaders' obituaries for services I use. I don't imagine I'm unusual in this regard :)
Edit: This is just a rhetorical question, not a demand for explanation :) - just saying that being hit by a bus does not automatically mean that users are shifting off your service in 24 hours.
Remember when Colin Percival announced his Tarsnap logo competition on HN?
There'd be a cash price and everything. I am sure several people worked on logos - some excellent, some less so. I'm frankly surprised to see that this is the result:
Really? Did some of us just waste our time so that Colin could have his 8 year old nephew whip up a blurry logo? It's ofcourse his right to do so, but it does make me wonder.
Why are you surprised by the result? It's distinctive, easily recognisable, and the keyhole makes it clear that the product's focus is on security.
"During this time, 83 people submitted over 100 designs . . . The winner, as promised, received $500; I decided to create a second prize of $100 for the designer of the other logo, in part as an apology for the time I took up asking him for revisions"
http://www.daemonology.net/blog/2013-12-16-tarsnap-logo-cont...
eh... Without having any skin in the game, I would say this logo design adequately conveys the meaning: disks, a lock, "authoritative"-looking font, succint tagline.
I agree the blurriness could be improved, though.
As for people entering a contest... they can't all be winners.
The blurriness is the result of antialiasing. I start with an EPS; can you tell me how to convert this without running into that problem? I'm afraid I'm not very good at anything graphical.
Often it requires a bit of manual hinting. For example, [1] was just resizing the vector and allowing antialiasing to do its thing, and [2] is the result of resizing it and then dragging vertices to pixel or half-pixels.
So, I guess it's about time I start using Tarsnap :)
No, seriously. What do most people use it for? Simply creating a daily backup of their hard drives? Also, are there any business users who use it to backup an entire organisation's systems?
Btw, more on-topic, I'm reminded of this quote, which I love, from Jeff Bezos: "There are two kinds of companies: Those that work to try to charge more and those that work to charge less. We will be the second."
I've always thought that was a profound sentiment, and I've always wondered if, in the long run, that's the way to make a business survive.
What do most people use [Tarsnap] for? Simply creating a daily backup of their hard drives?
This is all anecdotal, but I think most people are at least somewhat selective in what they back up. In my case, I have some servers where I back up /, but on my laptop I only back up my home directory because I know if my laptop dies I'll be reinstalling FreeBSD from scratch anyway.
Also, are there any business users who use it to backup an entire organisation's systems?
I think so, but I'm not going to name names. Maybe some Tarsnap customers will reply here.
this quote, which I love, from Jeff Bezos: "There are two kinds of companies: Those that work to try to charge more and those that work to charge less. We will be the second."
Yes, I found that quite inspiring too. And it has certainly worked well for Amazon.
> are there any business users who use it to backup an entire organisation's systems?
Stripe has been a happy Tarsnap customer for several years. It's robust and thoughtfully designed, Colin has provided great support, and "backups for the truly paranoid" is a pitch that very much resonates with us.
Definitely vault physical backup copies of private key material at a physical safety deposit box in a bank. Preferably two copies in two banks in two different timezones.
I keep all of the company registers and records, receipts, contracts, invoices, etc in Google Drive. We need to use Google Drive as it is the best of the options open to us for securely sharing files with accountants and lawyers.
I use https://github.com/Grive/grive to sync Google Drive to an encrypted local directory on my workstation at home. This is done daily.
Then I use TarSnap to backup that local directory.
Effectively I create an on-site copy of important docs from Google Drive, and then use TarSnap to create a secure and trusted off-site backup of those important docs.
Oh, and we also use TarSnap to backup our company (product) databases once a day too (though interim backups are stored closer to the servers too).
Our entire organisation is handled thus:
1) Systems and code via Github, and pulled often to one place (that local workstation) and then backed up.
2) Data dumps pulled locally and sent over to TarSnap of all product/customer data and all company files.
And the only on-faith thing is file attachments in the customer data:
3) Files via Amazon S3, trusting the durability of S3 and security of our interface to it.
This is all disaster recovery stuff. Secure, trusted backups.
"There are two kinds of companies: Those that work to try to charge more and those that work to charge less. We will be the second."
Thats works well when your prices are based on 3rd party prices and you can optimize your business to keep your margin. But, for instance, a service oriented company, or a freelancer, will usually try to improve his/its skills and experience to charge more because the endresult is worth it.
In the case of a freelancer your are optimising your prices so that you can charge more while optimising your skills/performance so people need less of your time.
You can make money by working more effectively and efficiently while still charging less for the same "product".
But that wouldn't change my bottom line at all, i would just work faster/more and need more clients.
My clients usually are fine with paying more because they feel that my experience has value vs the next guy who just cobbles together their application without thinking much about maintenance, architecture, testing etc. I might even be slower than that guy.
Well im not convinced. For one, better skills/more experience doesn't necessarily make you faster, it might even make you slower because you put more effort into things others simply ignore, but the quality will be higher. Anyway, i see your point, i just don't think it translates into reality all that well. Not for longterm projects at least, which are hard to estimate anyway.
How does Tarsnap compare to Arq when using a Mac? Both do local encryption and incremental backups with deduplication. However, I pay $0.03 per GB-month on S3 with Arq and $0.25 on Tarsnap, almost an order of magnitude more expensive. And yet I constantly hear people saying that Tarsnap should be more expensive when I simply compare it to Arq and think that it already is very expensive.
You paid $40 for Arq, though. Assuming you intend to keep it updated, that's $30/year¹, which buys you 120GB-month per year on Tarsnap. It also works everywhere where there's an Unix shell (including Windows with Cygwin), not just Mac OS X.
¹ (assuming 16 months between major versions, as in 3 → 4)
Sorry, I wasn't clear. I didn't meant you could get 120GB of storage for a year, I meant you could each year buy 120 units of "GB-month" (like kWh but for storage).
You could distribute that equally throughout the year, using 10GB-month per month, yes. Or you could use 1GB-month in the first month, 2GB-month in the second, etc, as your storage needs grew.
Ugh. This works, but it's ugly. There are a lot of automated tarsnap assistants on github; I made one myself.[1] They can do helpful things like allow for daily, weekly, and monthly backups, keeping a certain number of each. (And personally I like giving each folder its own archive, and I dislike having slashes in the archive names.)
[1] https://github.com/pronoiac/tarsnap-cron - which might be helpful to someone else. It splits up the archiving and the pruning of old archives, so you can do those with different keys. This part is as yet untested. It probably requires an tarsnap fsck if you do those on different systems. I was going to forgo the plug, but it might be helpful, and I'm about to get some sleep, so I'll probably miss the best time to contribute to the discussion.
I've always thought that was a profound sentiment, and I've always wondered if, in the long run, that's the way to make a business survive.
That depends on your customer and what your product positioning is. If, for example, you run a discount store, a la Amazon or Wal-Mart, then that advice will work well for you. If, however, you are running a company that sells products that are positioned as premium, lowering your prices can hurt you.
Cutting costs on premium products immediately changes customer perception, such as when Starter was purchased by Wal-Mart or Rock & Republic was purchased by Kohl's.
I only use it to backup configs and game saves. So far, I've used less than $0.05, backing up 73M with 93 days of history (I haven't deleted anything).
That is a daily snapshot of my /etc, important dotfiles, and game saves. I haven't needed it yet, but I'm sure I will be glad when I do.
EDIT: I just have a simple script in my cron.daily that backs up files and folders that I tell it to.
I have something close to 100G of document scans, database backups and /etc for all my servers in there. Use it to back up my office file server as well.
I find that the customers who are paying $0.01/month often provide the most enthusiastic word-of-mouth advertising. You're not useless at all, no matter how little you have stored. ;-)
[I haven't looked up your account, so I don't know if you're over or under $0.01/month, but the precise number really doesn't matter.]
It's only about 2MB per day to upload.
I have a daily cron that does the backup and deletes all but the latest two images.
Quality of service, along with Colin's amazing pricing, make it an easy win for me. I never feel like I need to stop and waste time thinking about other options.
Maybe nothing. I didn't look into other options because Tarsnap is already cheaper than a cup of coffee per month and it's run by someone I trust and want to support.
I'm with you on that it's cheap and also run by someone trustworthy. However, what happens if he decides to retire or something similar? At least S3 is almost surely going to be around.
I would like to use it to backup my photos and videos offsite (they are currently on a ZFS RAIDZ machine with snapshots), but, unfortunately, the cost is prohibitive. I currently use SpiderOak, which is overkill for my purposes, but comes out to about $5/mo for 100 GB.
Something like Popcorn Time is the future both as a client and moving to a shared backend where people can more easily split hosting fees... Because how many copies of The Big Lebowski need to be duplicated all over the place if there's plenty of bandwidth to serve it? A hosting service like Mega with the security of Tarsnap that could host and de-duplicate torrents would be very interesting. Imagine being able to instantly watch a movie if someone else on the same service happens to have already got it.
>A hosting service like Mega with the security of Tarsnap that could host and de-duplicate torrents would be very interesting.
Tarsnaps security is based on client-side encryption. They cannot read the plaintext. Therefore if more than one person uploads the same file, they will have two different encrypted versions of the same file, which can not be deduplicated.
You could use convergent encryption, but then users can query each others files.
If I wanted to store, say, a 100 GB of photographs (jpeg and raw formats), roughly how much would it cost me per month just for the storage? Let's say I would upload one big archive of photos in the beginning, and then re-archive the photo collection whenever I add significantly more photos to the collection (after a vacation or a birthday).
To go just by the stated pricing of Tarsnap, this would cost me $25 per month for the storage. But then I also see mention of users with terabytes worth of archives who pay less than $10 a month. I read the FAQ entry which explains how this happens, but that does not really tell me whether I can hope for such savings when it comes to photos. Do photo collections (raw/jpeg/both) "shrink" significantly from the deduplication and compression of Tarsnap? I think they don't (and that the savings apply to incremental backups), but it would be great if you could make this clear. Thank you!
Tarsnap won't be able to shrink your photos by using deduplication and/or compression. I guess users paying 10$/mo with terabytes of data stored, have massive advantages of both, but it all depends on your usage.
If you are going to store 100GB of photos with Tarsnap, I'd guess it would be close to 25$ as you said. If you just want your photo collection for disaster recovery, you could check out Glacier instead, which is a lot cheaper.
I'll start with my disclaimer that I'm founder of Trovebox and this isn't a sales pitch because we've focused Trovebox for business use.
Great, that's out of the way. Storing and archiving photos & videos is near and dear to my heart [1] and I believe that cloud storage is one piece of answering yes to the question of "will I have my photos in 50 years?". My entire Shuttleworth Fellowship is based on this.
Here's my $.02.
Cost - we haven't yet but have all the pieces to use Amazon Glacier for storage of high resolution originals. That means storing 100 GB will cost you $1 / month. There's additional costs to keep thumbnails in S3 for immediate access -- my estimate is <$3 all inclusive for that 100GB.
Ownership / Portability - a big part of my fellowship is to see how a hosted service (ala Flickr) can provide 100% ownership and portability. The solution is to let users (optionally) bring their own storage. So you can use the Trovebox software connected to your own Glacier and S3 bucket.
Functionality - I think organization, viewing, sharing and archiving should be merged together. Instead of having your sharable photos in one place and your archives in another --- why aren't they combined?
I work with wedding & portrait photographers who shoot 100GB or more in a single weekend. So, within a couple years they could be looking at $100/mo+. And it just grows from there.
This is the fundamental problem I've always seen with cloud storage for photo/video pros (or hobbyists): they need long-term storage but the bill just keeps growing. Should they be expected to pay $1k/mo after they've been in business 10 years?
---
On a separate note, how are you doing RAW <-> JPEG conversion on the server?
The bill would keep going up, yes. But that's the nature of any growing collection - even if you're using a drobo at home.
The only hope is that cloud storage goes down over those same 10 years at a rate which makes it continually affordable. But it won't always be a fixed cost since the number of photos keeps going up.
The RAW -> JPEG conversion is done using ufraw [1]. We originally tried extracting the thumbnails so the JPEGs would use conversion settings from the camera but most of the thumbnails are too small to do anything useful with.
That also assumes we've reached "peak megapixel" ... it seems to have plateaued recently but I'm not convinced a RAW file in 5 years will be about the same size as now.
Question about Tarsnap backup strategies and worst case scenarios.
How would one go about making sure that when a server is compromised, the malicious attacker wouldn't be able to delete all the tarsnap archives for that machine? Since the tarsnap.key is stored on the server itself and that's all you need to delete archives as well. Of course, you're already properly effed when an attacker has root access to the machine, but offsite backups should still be safe imho.
That's why on some of my servers I have a 'pull'-backup strategy in place, where a remote server would connect to the machine to be backupped and pull a backup, so in an event of that the server would be compromised no backups could be deleted. Is this something that can be achieved with Tarsnap as well?
I don't have much insight into what's profit-maximizing in this market, but rather than "public utility pricing" (though I can also see that analogy) I think of it as more like classic small-business pricing, especially in markets where developing some kind of reputation for fairness is deemed important by the owner. I can't think of a good representative example, but it's so well established I'm pretty sure I've run across examples in 19th-century American novels of this sort of "fair price with a modest profit" ethos.
developing some kind of reputation for fairness is deemed important by the owner
That's certainly part of it. Let's face it, a lot of people use Tarsnap because of my reputation; I'd like to have Tarsnap contribute to my reputation rather than merely taking advantage of it.
I'm pretty sure I've run across examples in 19th-century American novels of this sort of "fair price with a modest profit" ethos.
He could still pay himself a fair salary but he'd avoid any tax liability for the business and I guess he could accept donations too?
I don't know much about it, but if your only goal is a fair salary I've always assumed a non profit business structure would make sense? Can someone more knowledgable chime in?
There's no real reason to turn Tarsnap into a non-profit. At least for now, the work is easily handled by him, he still has a passion for the work, and the project isn't large enough to require any "management".
But does crashplan does client side decryption? That is the one weak spot in my current backup provider - backblaze. They encrypt and protect your data and when you need to restore it is out in the wild.
Or use duplicity. It even directly supports many cloud-services as backend, including Google Drive ($2 for 100GB, $10 for 1TB) Dropbox, Onedrive, etc. It encrypts, deduplicates, stores old versions, etc.
I'd love to use duplicity, but in the 3rd paragraph of their website is the sentence "Duplicity is still in Beta.".
I'd much rather trust my backups to tarsnap, which is not in beta. Furthermore, and perhaps even more importantly, tarsnap offers support which duplicity does not. When my hard drives decide to die, I really want someone who I can contact if there are any issues restoring from backups.
Since CrashPlan uses blowfish-448-cbc-sha1 (a weird choice) and may (I'm uncertain on this) have ability to push configuration changes from remote, I've considered that possibility, because of CrashPlan's "unlimited" offer.
Unfortunately, it's barely usable due to recovery issues. You can't mount CrashPlan as a filesystem (well, without ton of reverse engineering), so that's not an option unless you're satisfied with all-or-nothing recovery without possibility to pick and restore just certain files of interest.
A middle ground there that would work for restore of particular files is to use ecryptfs/encfs without encrypting filenames. I think ecryptfs at least knows how to do this. Then you can download the file with the proper name, but the contents are encrypted and decrypt them locally.
You could probably also hack up something to figure out locally what encrypted filename corresponds to what regular filename and go fish for the encrypted filename in CrashPlan. It will probably be clunky though.
That's if you only rely on CrashPlan-provided encryption, which is certainly not a cream of the crop.
We were talking about eCryptfs/encFS-encrypted copy, where filenames are (usually) encrypted. That means navigating around names like `l00Dqf,A49VqDd8AveLMrbBE` or `qR,bmE-73cA2H6wOxZxlKSwD`.
And if you don't want some govt to take down a single provider like what happened with Lavabit email vis-a-vis backup, go with a distribute (multi-provider) solution like Tahoe LAFS
Deduplication across users isn't possible due to encryption, but it is done locally on your own data, so it is true that you only pay for unique and compressed data.
> This will no doubt annoy my friends Patrick McKenzie and Thomas Ptacek, who for years have been telling me that I should raise Tarsnap's prices. But while Thomas accuses me of running Tarsnap like a public utility rather than a business, and thinks this is a characteristic to be avoided, I see this as a high compliment (...)
That's really nice. I'd rather have fair public utility rather than business for the sake of concentrating money.
I won't think twice about it the day I need to sign-up for tarsnap.
That is
I wonder how sustainable it is though. cperciva could make a boatload of money doing something else for someone else. Right now maybe that's not an attractive option, but perhaps at some point he'll, say, marry, have kids, and want some more cash and financial security.
So to a first approximation you're valuing the business at 0. You're generating enough free cash flow to pay the salary needed to keep it running but no more. If you wanted to delegate that responsibility you, as the owner, couldn't get anything out of the business as the free cash flow would be 0.
Nothing wrong with that really, but if you really want to strictly price it as a utility there needs to be some form of return on equity. Even amazon generates some free cash flow, they've just been reinvesting it all so not generating any income.
He said he's in a good financial situation. He did not say he paid himself the lowest possible amount of salary he could live off, and let the business run without any safety buffer of money.
I'm just going to assume that Colin is smart enough that he knows how to price his service. After all, he's the only one with the data to really know tarsnaps financial situation.
May be a bit OT but is there any way to create an archive locally and see the size of it before paying to figure out how much space i'd need?
Been looking at tarsnap for a while but never gotten around to actually try it and my quick naive calculations for backing up my ~ makes it sound too expensive for me even though it's probably the best alternative i've seen so far.
Using options --dry-run and --print-stats would probably give you this information
''Don't really create an archive; just simulate doing so. The list of paths added to an archive (if the -v option is used) and statistics printed (if the --print-stats option is used) will be identical to if tarsnap is run without the --dry-run option.''
Missed the dry-run option but it still seems to require a key which requires you to pay. Talked with some on irc and it seems it's not possible at the moment. Someone mentioned gzip providing a good enough estimate on the compression without taking the de-duplication into account.
Gzip would provide a fine-enough approximation, since both that and tarsnap use zlib. Tarsnap also has deduplication, so equal (or almost-equal) files will take up much less space.
We love tarsnap, easiest and most secure way to offsite our database dumps. No thrills, no beautiful design, just solid software written by cperciva who is a crypto wizard.
That's a bit too much like a loyalty penalty for my taste: the default option makes things better for new users than for established customers.
(Of course loyalty penalties are commonplace -- from phone companies, car insurers, etc. -- but my mental model of cperciva says he probably doesn't like them.)
my mental model of cperciva says he probably doesn't like them
Exactly true. I was looking at that comment trying to put my finger on why I hated the idea so much, and you're right: It's because it would be a loyalty penalty. I want to be fair to everyone.
As a customer of about 2 years now, it is much appreciated. I feel much better doing business with corporations when I get the feeling that they do their business fairly. Tarsnap succeeds at this.
Even excluding all the details of tarsnap's design, using glacier for backups has always seemed totally wrong-headed to me for a simple reason: it disincentivizes checking up on your backups, which is a key part of doing backups. No point waiting until you lose your data to realize that your backups had developed a glitch.
Add any minimal restoring cost and glacier costs about the same as regular storage -- with a lot less convenience. I honestly can't think of a use case for something like glacier, where storage is cheap but reading/writing is expensive.
I would personally be totally fine with a backup solution which is orders of magnitude slower and several times as expensive in the rare restore case if it meant that the typical "write only for long periods of time" scenario was significantly cheaper. My personal backups are not something I'd ever need to restore in a hurry.
The one big downside to the "cheap backup, expensive restore" approach is that it discourages testing your restore.
Thanks for that, using 'tarsnap' on a FreeBSD and a couple of RPi clients, works great! Always nice to get a price cut!
PS. Is there any future plans to make restore a little bit faster? It's been a while since I restored some files, but the process was so slow that I thought there was a connection problem, then I googled and found out that it's okay to be slow :-)
Are there any better tools out there to schedule backups and purge old backups? I'd be happy with a simple tool which just keeps 7 daily backups and that's it.
There was a bug in Tarsnap 1.0.20 (February 2009) which had a ~ 0.2% chance of causing silent data corruption if people were using the new --checkpoint-bytes option. (The bug actually had two possible consequences, silent data corruption and exiting with an error, and my analysis showed that the second was about 500x more likely to occur first.)
Fortunately this was a new feature and I was able to identify which people had used it (since it relies on server-side functionality and I had good logs) so I could email all the potentially-affected users to warn them.
Uhh... let me look back at that code. My memory isn't what it used to be... right, now I remember. Sort of.
So, Tarsnap uses a "chunkification cache" to speed up archiving; when it chews through a file and splits it into chunks, it records "file X was N bytes, had inode #I, and was last modified at time T, and here's the chunks it was split into". The next time it sees the file, it starts by stat()ing the file and if those parameters match it reuses the chunk list rather than reading the file from disk and splitting it into chunks again.
With checkpointing enabled, if a checkpoint occurred in the middle of a file a truncated entry (to be specific, the chunk list for the portion of the file which had been processed) would be stored in the chunkification cache. If a later tarsnap process read that it was possible that it would archive the file wrong (I think it would be truncated, but I'm not absolutely certain right now). The fix was simply to not use an entry from the chunkification cache if it wasn't internally consistent.
Comcast: Our costs just went down. Effective immediately, we are adding a new cash management fee to your bill to cover our costs of handling all this new cash.