> Wouldn't it make sense to also write the backup catalog to the tape though? Seems like a very obvious thing to do to me.
The catalog would be written to tape regularly: this is what would gets e-mailed out. But it wouldn't necessarily be written to every tape.
Remember that the catalog changes every day: you'd have Version 3142 of the catalog at the beginning of Monday, but then you'd back a bunch of clients, so that catalog would now be out-of-date, so Version 3143 would have to be written out for disaster recovery purposes (and you'd get an e-mail telling you about the tape labels and offsets for it).
In a DR situation you'd go through your e-mails and restore the catalog listed in the most recent e-mail.
50GB was an enormous amount of space in the late 90s. Why wouldn't each file on the tape have something like begin/end sentinels and metadata about the file so that the current catalogue could be rebuilt in a DR scenario by just spinning through the whole tape?
I'm with the OP - depending 100% on a file that's not on the tape to restore the tape is bonkers. It's fine as optimization, but there should have always been a way to restore from the tape alone in an emergency.
Isn't there an saying about limiting criticism before we thoroughly understand the trade-offs that had to be decided.
One potential trade-off is being able to write a continuous datastream relatively unencumbered vs having to insert data to delineate files, which is going to be time consuming for some times of files.
There are trade-offs, but as someone who's been working in technology since the mid-90s and spent 10+ years as a systems engineer for a large corporation, "we have all of the backup tapes, and all of the production data is on them, but we can't restore some of it because we only have an older catalogue for those tapes" seems like a pretty unarguably huge downside.
I'm also having real trouble imagining any significant impediments to making the tape data capable of automatically regenerating the most recent catalogue in a disaster scenario, given the massive amount of storage that 50GB represented in that era. This sounds like a case where the industry had hit a local maximum plateau that worked well enough most of the time that no vendor felt compelled to spend the time and money to make something better.
I've written software to handle low-level binary data, and I can think of at least three independent methods for doing it. Either of the first and second options could even be combined with the third to provide multiple fallback options.
1 - Sentinels + metadata header, as I originally described. The obvious challenge here is how to differentiate between "actual sentinel" and "file being backed up contains data that looks like a sentinel", but that seems solvable in several ways.
2 - Data is divided into blocks of a fixed size, like a typical filesystem. The block size is written to the beginning of the tape, but can be manually specified in a DR scenario if the tape is damaged. Use the block before each file's data to store metadata about the file. In a DR scenario, scan through the tape looking for the metadata blocks. In the corner case where the backed-up data contains the marker for a metadata block, provide the operator with the list of possible interpretations for the data, ranked by consistency of the output that would result from using it. This would sacrifice some space at the end of every file due to the unused space in the last block it occupies, but that's a minor tradeoff IMO.
3 - Optionally, write the most recent catalogue at the end of every tape, like a zip file.
The catalog would be written to tape regularly: this is what would gets e-mailed out. But it wouldn't necessarily be written to every tape.
Remember that the catalog changes every day: you'd have Version 3142 of the catalog at the beginning of Monday, but then you'd back a bunch of clients, so that catalog would now be out-of-date, so Version 3143 would have to be written out for disaster recovery purposes (and you'd get an e-mail telling you about the tape labels and offsets for it).
In a DR situation you'd go through your e-mails and restore the catalog listed in the most recent e-mail.