There is a way around this: You allocate enough space at the beginning (or the e...

tinus_hn · on May 25, 2023

Then you would have to rewind the tape at the end, which is not what you want. You want to write the tape and keep it at that position and be done.

If you write the catalog at the end, you have to rewind and read the whole tape to find it and read it. Which is not an improvement over reading the tape and reconstructing it.

This is all either impossible or very difficult to fix when there is actually not a problem, if there is a disaster and the database is lost, you just read the tapes to reconstruct it.

abraae · on May 25, 2023

If the catalog was at the start of the tape, how would you expand it when adding more files to the tape?

And if the catalog was at the end of the tape, how would you add more files at all?

mmis1000 · on May 25, 2023

> And if the catalog was at the end of the tape, how would you add more files at all?

modern zip softwares just remove the whole index at end, add file, reconstruct the index and append again.

KMag · on May 25, 2023

I'm not sure about modern implementations, but it's not actually required to remove the old index at the end. It's perfectly legitimate (and somewhat safer, though the index should be reconstructable via a linear scan of record headers within the archive) to just append, and the old indexes in the middle of the archive will be ignored.

larusso · on May 25, 2023

But you had multiple backups on these tapes. If you rewrite the index how do you restore from a certain day?

cplusplusfellow · on May 25, 2023

Your end-of-stream index would remain in place with a backup number / id.

Your entire index would be the logical sum of all such indices. Think of the end of stream index as a write ahead log of the index.

BugsJustFindMe · on May 25, 2023

Put it in the middle and work your way in from the ends!

jonhohle · on May 25, 2023

There are lots of append-only data structures that would support this, but would also require scanning the tape to reconstruct the catalog.

KMag · on May 25, 2023

> but would also require scanning the tape to reconstruct the catalog.

If the index consists of a b+-tree/b*-tree interleaved with the data (with the root of the index the last record in the archive), a single backward pass across the tape (including rapid seeks to skip irrelevant data) is sufficient to efficiently restore everything. This should be very close in throughput and latency to restoring with an index on random-access storage. (Though, if you're restoring to a filesystem that doesn't support sparse files, writing the data back in reverse order is going to involve twice as much write traffic. On a side note, I've heard HFS+ supports efficient prepending of data to files.)

In other words, yes, you need to scan the tape to reconstruct the catalog, but since the tape isn't random access, you need to scan/seek through the entire tape anyway (even if you have a separate index on random-access media). If you're smart about your data structures, it can all be done in a single backward pass over the tape (with no forward pass). Keeping a second b+-tree/b*-tree interleaved with the data (keyed by archive write time) makes point-in-time snapshot backups just as easy, all with a single reverse pass over the tape and efficient seeks across unneeded sections of tape.

ShadowBanThis01 · on May 25, 2023

Beginning: write more entries into the allocated space I mentioned. End: write more entries into the allocated space I mentioned.