Hacker News new | past | comments | ask | show | jobs | submit login
Backing up 18 years in 8 hours (chromakode.com)
186 points by chaosmachine on April 15, 2016 | hide | past | favorite | 83 comments



I still have a disk image of the Mac Plus I used until I was 10, when I upgraded to a new Blueberry iBook. The Mac Plus experience can be recreated with Mini vMac, an emulator.

Files since 1999 are all still in a folder on my local drive. I've made an effort not to lose the photos and chat logs. My iTunes database dates back to 2002, and I'm disappointed that I lost my old SoundJam MP database that I used for 2 years before that.

Unfortunately, my archives continuously come under threat from forced upgrades. Upgrading iPhoto to the Photos app will lose a lot of metadata. The latest versions of iTunes removed USB sync for contacts & calendars, which really bothers me when I want to maintain my files without the cloud.

Many old documents are now unreadable (e.g. Clarisworks). I've realised that simplicity is essential for keeping long-term records. Keeping a copy as plain text is important for preservation. Just because "there's an app for that" today doesn't mean that 10 years from now, the app will still work. This is (especially) true for companies that should know better, such as budget tracking tools.

I recently went through my old records to compile a list of every rock concert I've been to, along with the location and price when possible. With heavy use of archive.org, my iCal, and my iPhoto library, I figured out all the dates, but it was a major effort. Most of them were in the last 10 years.

My dad still has his PhD thesis on a magnetic tape. Anybody who knows how to read that onto a PDP-8 should get in touch!


My files from the 1970's were on magtape that was written to by a long-gone drive that was way out of spec. I threw it away. My next oldest files were from 1982 or so, and were on 8" PDP-11 floppies. My 11 was discarded long ago. I contacted an old friend, and he said he still had his, though it hadn't been powered up in years. Amazingly, it powered up, and there wasn't a single bad byte on my 30 year old floppies (!). He was able to recover it all for me.

I have most of the stuff since, though floppy backups were erratic. I copied the 5.25" floppies to CD, and then to hard drives, before discarding the old computers.

I try to save most files as jpg, pdf, mp3, mp4, or plain text, figuring those are the most future proof.


Postscript is better than pdf, if you have the choice.

(They basically introduced pdf, because postscript was too open and other companies started beating them at it.)


PDF/A is an ISO standard, with an EU-backed preservation project writing conformance testing tools and such.

https://en.wikipedia.org/wiki/PDF/A

http://www.preforma-project.eu/media-type-and-standards.html


The trick is finding an editor that will save in strict PDF/A.


Exactly! I've been mildly involved as my families archiver for years. PDF/A is the standard that the should be used for archiving docs. In fact it's what the library of congress recommends themselves for long term storage of docs. But as you said I can't think of a single app that does strict pdf/a support. I may have come across one it seems but can't remember what it was. I store everything in pdf and images as tiff.

The LOC has good guidelines for archiving all sorts of formats.


I feel your pain. Especially with tricky formats like Clarisworks (or in my case, old Cakewalk and Hash Animation Studio) reviving the data might require emulating the entire OS and environment they existed in. At least for full disk images, the backups contain copies of the software used to create the files.

It's really disappointing when software uses unnecessarily complex file formats. Chat logs are an important example: I was dismayed to discover that my Yahoo Messenger chat logs on Windows are stored XOR-obfuscated! [1]

There's a lot of compromises to be made between the convenience of making backups and their long term viability. My hope is that by greedily saving all of the bits, there will be enough context to make sense of the data if I really needed to, even if it is an intensive process.

A good alternate strategy would be to scoop up the most likely interesting files when processing the backups and re-encode them in more future-ready formats, as derefr suggested.

[1] https://cryptome.org/isp-spy/yahoo-chat-spy.pdf


I've got quite a few old ClarisWorks files as well. At some point I realized all that data would become unreadable (at least not easily) when that file format becomes unsupported. So I went through every ClarisWorks file (with a script) and saved it out into multiple formats (e.g. RTF, PDF). The hope was at least one of those would open...

Batch re-encoding those files at the cusp of losing easy access to them turned out to be a lot of possibly error prone work. So to future-proof important documents, now I do that re-encoding continuously up front, whenever those files are saved.

But for long term viability, on top of backups, the files themselves have to be in a format usable by programs that will be around for a long time on platforms that won't disappear.

Practically, that means using simple file formats (TXT). Or else using a program (LibreOffice) that creates files in an open format that can be easily re-encoded up front into multiple formats like DOC and PDF. The MultiFormatSave extension for LibreOffice makes it easy to save into multiple formats for that purpose.

After all that work though, I just found out LibreOffice can open ClarisWorks files. lol...


> Many old documents are now unreadable

This is exactly why some people are so vocal about open formats.

Regarding your fathers PhD thesis, take a look at http://www.pdp8.net/ I'd be surprised if they cannot help or point you to some people that can.


This whole thread has just triggered me into finding out why current versions of MS Office are failing to open some 20+ year old documents that I have. Turns out, MS Office no longer supports Word for DOS/WordPerfect/AbiWord files - which is fine I guess, but to rub salt into the wound, there used to be converter plug-ins available, but MS not longer host them for download (I guess the 400kb archive was taking up too much space or something). I just had to resort to hunting down the archive and downloading it from some unknown/untrusted file directory. Anyway, it works and I can now open all my ancient documents in Office 2013 - which is not bad considering the converter plug-ins were written for Office 97.

Regarding the article - I'm not sure I understand the concept behind storing a set of the recovered files in a Cloud service. Surely a second HDD with the same contents stored at a second location would be cheaper in the long run. The HDD would last 20+ years if stored correctly - you can't guarantee that your consumer level Cloud service will still be around even in 10 years time.


"Container" is simply a place where you can read the content today. That can be a HDD, it can be AWS, it can be a USB drive - doesn't matter as long as you have a system that can read it. Once that "container" starts to get near it's EOL you transfer it to a new "container".

I wouldn't trust a HDD for 20 years. Keep a backup mirror locally and a backup offsite (cloud or at a relative), what matter is that the transfer is automatically running in a periodic manner.


The old file formats which get to be unreadable just because they are "old" and the newer programs don't support them anymore is much more serious issue than many people think.

I hazily remember having the Outlook PST files as the mail archives and then later not being able to recover most of the metadata of the mails, not even programmatically, as the PST was designed even before the "internet standards" were something they'd worry.

GPG also removed the support for the old encrypted files and the old keys since 2.1: https://www.gnupg.org/faq/whats-new-in-2.1.html#nopgp2 which signifies the dangers of using it in archival contexts. The disk encryption tools can be even more problematic, being dependent on the particular OS versions.


Good points!

”Upgrading iPhoto to the Photos app will lose a lot of metadata.”

I’m still on Mavericks but will upgrade at some point. This makes me nervous though. Would you care to elaborate on this issue?


Apple have a guide on it here:

https://support.apple.com/en-us/HT204478

One thing to bear in mind is Photos will create a new photo library, leaving the old one untouched. The metadata will still be there, but good luck getting it out.


> Many old documents are now unreadable

There is/will be a niche market for folks writing software to convert legacy formats into open/current formats.

Remember how all of a sudden shops offered to convert your VHS-tapes to DVD? I bet they were making a killing.


Just went into a CVS to pay for one VHS conversion for $25.99. CVS is using a service called http://www.yesvideo.com


Is there a good reason to back up raw disk images with block-level deduplication, rather than running file recovery to get filesets, and then doing file-level deduplication?

On the one hand, I can imagine "cryonically" preserving a disk image for a later, better filesystem recovery program to come around. (This "cryonic" approach would give even better results by preserving bit-level analogue flux recordings of the disk platters, rather than relying on the output of digital reads from the disk heads.)

On the other hand, the longer you leave these disk images as dead blobs of data, the more layers of legacy container+encoding formats you'll have to try to get your system emulating when you finally do want to pull the files off. One day your OS won't have drivers for reading e.g. FAT16, or zfs2, or ReiserFS2.11, nor will it be able to parse out the meaning of an MBR-partitioned disk. Reaching back through Linux kernel archives for something old enough to understand those things, will result in a kernel that won't boot your PC. You'll end up having to do something convoluted with qemu just to get your disk read.

Personally, I'd much rather throw out all the intermediate containers I can, as soon as I can: not just extracting files from the disk's filesystem, but further extracting files from any proprietary archive formats on the disk (using the extractor tools probably installed on the same disk), and even canonicalizing containers like AVI by remuxing them into modern extensible formats like MKV. The goal being to give a file-level deduplication process the best possible inputs to work with, most likely to match: not just for space-saving, but because reducing "junk duplicates" helps greatly in actually finding anything in all that mess, let alone organizing it.


Interesting point! To be honest, I didn't consider that it might be more difficult to interpret the underlying filesystems and encodings in the future. My goal was to get a lossless (well, as much as possible) archive of the disks since my time and physical access was limited. It is hard to predict what I will want to access in these images over the scale of decades, so my thinking was to leave them as untouched as possible, since I don't know what information will be important.

That being said, a good guess would be that the most interesting data will be media files (especially old photos) and documents. For that data, your advice of collecting and re-encoding the files is wise. For the purpose of discovering the media files in these backups, I found my favored approach to be a brute force recursive search for file types. Exploring the original structure of the filesystems was interesting, but my intuition for where the valuable data was usually proved wrong.


You want to convert the data to an open format and keep that around in modern containers, where you can easily transfer it to a different container should that be necessary. BUT for many applications you also want to preserve the environment. For instance, if you write your thesis in LaTeX, whatever installation you have is unlikely to be replicable in 10 years, so keeping a minimal VM that can compile it is preferable. You would need to keep this VM runable and upgrade it to newer formats as things progress.


I'm not sure I buy the absolute supremacy of open formats. They are good for emergency recovery (no depending on a company that just went kaput) but they often lack momentum. MP3 is not an open format, but I would argue it's one of the best choices for storing your music because it is everywhere. MP4 hasn't gained quite the same traction yet but we're headed that way.

Now, if you are a savvy programmer and you archive the (open) OGG spec, you could argue that you can write your own custom OGG -> MP2073 encoder at any time to recover your ancient music. But that's not most of us.

Another example that comes to mind is the many MS Office replacements I've used over the years. I have a small trove of old documents in old open formats used by StarOffice, OpenOffice, & others that are a pain in the butt to access and don't always render correctly, while my ancient .doc files from twenty years ago still open in two clicks.


I'd consider mp3 an open format, and office documents are an exception. MS has gone above and beyond for it to work. There is thousands of dead formats out there unfortunately.


LaTeX is plain text markup so old LaTeX documents can usually just be read without processing. (The major exception is that graphics are hard to visualize as a sequence of draw commands!). Further, ordinary LaTeX from the last few decades can still be processed without difficulty. The older versions of LaTeX are still available online.

Of course, things are never as simple as they should be. LaTeX is a markup language that through macro expansion ends up expanding into TeX. Since TeX 3.0 in 1989, Knuth has attempted to keep the TeX system stable. Since then TeX documents should produce the same output, pixel for pixel, as they do now running on the current version 3.14159265 (yes, the version number is converging to pi, there won't ever be a TeX 4.0).

Few people, however, produce documents in plain TeX--the LaTeX markup is so much more convenient than the lower level TeX. LaTeX has been slowly evolving and there are some backward compatibility issues, but they are minor. The first release of LaTeX seeing general use was described by Leslie Lamport (the creator of LaTeX) in his 1985 book[1]. That version of LaTeX, 2.09, can still be processed by today's LaTeX 2e in compatibility mode. LaTeX 3 is supposed to supersede LaTeX 2e someday, but it's not clear how many more years that will be.

Since LaTeX is open source the distributions from TUG (the TeX User's Group) are easily obtained and they have all the historical versions of LaTeX and TeX available.

This all makes LaTeX/TeX seem like one of the best ways to maintain a document's source for the future. A few tips:

- Fonts can be a problem because fonts evolve. Either use something like TeX's extensive collection of "built in" Computer Modern fonts or save the font files along with the source of anything that you might want to work on in twenty years.

- LaTeX has a wide number of very sophisticated third party extensions. Along with the document source it would be a good idea to keep the contemporaneous versions the extensions used by the document. (These extensions are just files of additional macros.)

- If one is only interested in the typeset results of a LaTeX document, use a LaTeX extension like pdfx (or Adobe Acrobat) to generate a PDF/A version of the document (LaTeX programs normally generate pdfs). PDF/A is a pdf specification from Adobe that is intended for archival use and rendering far in the future (fonts are embedded in the output, etc.).

- Pandoc can convert LaTeX to a wide number of alternative formats (HTLM etc.) with some success depending on the target language and the document complexity.

[1] http://www.amazon.com/Latex-Document-Preparation-System-User...


Actually

    PDF/A is a pdf specification from Adobe that is intended
    for archival use and rendering far in the future (fonts are    
    embedded in the output, etc.).
Maybe they state that, however as long as there is no reader that is working on the newest hardware this is just wrong.

And the Specification for PDF/A-1 - PDF/A-3 (a,b,u) is really really long and hard to implement correctly. I doubt that 70% of the solutions (even for money) incorrectly parses the spec. Actually I doubt that only Adobe Acrobat (Reader) would actually print a 100% correct PDF/A-3a file to the screen.

Actually this is the (inofficial/offical) spec:

    http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
It should be really close to the iso one (http://www.iso.org/iso/home/store/catalogue_tc/catalogue_det...)


Uunfortunately that has not been my experience. Between various linux distros LaTeX environments and 3rd party plugins, I've yet to be able to easily recreate the environment to produce similar output. If I struggle with it now, I have little hope that I can do it in 10 years - and I have more faith in VM containers.


I agree; you're right. It isn't easy. I wouldn't want to go back and figure out my TikZ/PGF diagrams if that package lost backwards compatibility. But at least there is hope that a straightforward LaTeX document can be read in the future. You can always run the LaTeX source through Pandoc or just read it as it stands as text with markup. It's better than having a WordPerfect or XYWrite document (I liked both of those better than Word way back when).

Your idea of a VM container is interesting--I hadn't thought of that, but what about the software to run the VM. Does VirtualBox stay backward compatible over the next 10 or 20 years? Maybe the way to go is Markdown, that is so easy to read even without processing it.


I don't expect it to be 20 years backwards compatible, but since I use VMs daily, I have a good idea of when to start looking for a way to convert the machine to a different more modern format.

Re: markdown or similar like restructuredtext, I don't have any experience with advanced markup in them, but I think you'll have the same plugin problem as with LaTeX. The readability of the source isn't that important, since you'll probably have the rendered output as well, and that is,or should be, copy/pastable if needed.


Disk is cheap in online storage services, so its "cheaper" in time to dump the image and pull out whatever you need in the future.

> One day your OS won't have drivers for reading e.g. FAT16, or zfs2, or ReiserFS2.11, nor will it be able to parse out the meaning of an MBR-partitioned disk.

Maybe in 10-20 years.


Couldn't you just load them up in a VM anyway?


Yes, for reading these disk images, I've typically resorted to using VMs anyway, so I can experiment with old code in a sandbox. Actually, one project I've had in the back of my mind is attempting to boot some of those old Linux installs in a VM. Can you imagine re-opening a carbon copy of your old workspace from over a decade ago, right where you left it? I think the exercise would be thought-provoking.


Damn that would be interesting. I should make a backup of my current one and see how it changes year after year. Good idea!


Since I’m young, a decade ago I probably hadn’t discovered proper version control. I expect I’d find folders of neatly organised, timestamped ZIP files of my code.

Perhaps some things are best left in the past.


A few weeks ago I tried to copy a Windows 7 VM from VirtualBox to VMWare Player. I expected it to be easy, but it was actually difficult enough that I gave up on it after a few days.

If your backup plan involves mounting a disk image on a VM, I strongly advise you to test it before you need it :)


Emphatically agreed. Migrating VMs (IME especially Windows VMs) is a frustrating proposition.

Creating new Linux VMs expressly for the purpose of reading particular file formats is a much safer bet. It is unlikely that the ability to conveniently emulate a basic i386 system with block storage will go away in the next few decades. My assumption is that any formats I can read on Linux today will be readable in the future, as long as I have a copy of the source / binaries. This is why I included a copy of lrzip's git tree with my backups -- everything else is in a standard Ubuntu install.


Just curious, why would you choose MKV over MP4? MP4 has native HTML5 support, so I figured that would have more long-term viability. And as far as I know, they're both open formats (maybe MP4 isn't and is just free to use, I'm not sure).


> maybe MP4 isn't and is just free to use, I'm not sure

Correct. MP4 is not an open format.

From [1]:

> MPEG-4 contains patented technologies, the use of which requires licensing in countries that acknowledge software algorithm patents. Over two dozen companies claim to have patents covering MPEG-4. MPEG LA licenses patents required for MPEG-4 Part 2 Visual from a wide range of companies (audio is licensed separately) and lists all of its licensors and licensees on the site. New licenses for MPEG-4 System patents are under development and no new licenses are being offered while holders of its old MPEG-4 Systems license are still covered under the terms of that license for the patents listed (MPEG LA – Patent List).

> AT&T is trying to sue companies such as Apple Inc. over alleged MPEG-4 patent infringement. The terms of Apple's QuickTime 7 license for users describes in paragraph 14 the terms under Apple's existing MPEG-4 System Patent Portfolio license from MPEG LA.

[1]: https://en.wikipedia.org/wiki/MPEG-4#Licensing


Do you know of any good resources on storing/maintaining video files? I recently ripped over a TB of uncompressed video from old 8mm tapes and I'd like to preserve these memories digitally for as long as possible.


Lossless x264 (with FLAC audio) in MKV with a recent version of FFMPEG.


Better store the sourcecode of the exact version used for transcoding/playback and back it up propperly too. You might need to re-recreate the binaries in dozens of years from now.


I got my first PC in 1990 at age 11, having used only Commodore computers before that. I wish I had had the foresight to preserve everything I've ever created.

Cheap Internet access (i.e. a local dial-up number) didn't come to the area where I grew up (a small, midwestern town) until about 1996. From 1990-6, I was heavily involved in the BBS "scene". Being a teenager with lots of time but little money, I ran a BBS but couldn't afford to pay for all those door games. Instead, I learned to reverse engineer them and write "cracks" (binary patches, basically) to "unlock" those games. In the middle of the night, I'd dial into far away BBSes and upload them everywhere that I could. I'd love to have that source code to look at again today.

I did find some of my old paper notebooks in my mother's garage a while back. These are the ones that I took to school with me. Instead of doing schoolwork, however, I'd spend my time in class writing code (on paper, by hand) and then I'd type it all in to the computer when I got home after school. It was very neat to find those and look back at code I wrote ~25 years ago.

For the last few years, now that we all have a camera in our pocket at all times, I have kept backups of every picture I've taken but there are probably thousands of photos that I took before then that I'll never see again.

With storage as cheap as it is, I've resolved to never again "lose" any of the photos or videos I've taken. It'll be amazing to look back at them in another ~25 years or so.


Will he in another ten years remember the password?

Happend to me. Encrypted an archive, and the funny thing is I did save the password into a password manager, but I didn't label it descriptively enough, so when I tried decrypting the archive I couldn't locate the right password. I could have just tried them all, but I had a copy of the files from the archive in another place so I just deleted the encrypted archive.


I’m a happy Arq user since a few weeks back, but this scenario scares me. I have even considered using a weak password such as ”abc123” in order to prevent this issue.


Print out your key on paper, label it descriptively and store it in a safe place (like in a safe), using something like Paperkey.

[0] http://www.jabberwocky.com/software/paperkey/


Do you think you'll remember that weak password in 10 years time? Or are you thinking you'll brute force it when the time comes? In which case there's not much reason to encrypt in the first place.


I don’t know, but I definitely think it’s easier to remember that string than a 30 characters long random string. I could even use my given name to make it more memorable.

To clarify: I didn’t primarily choose my backup application because it encrypts the backups but because it seemed like a reliable application with a solid user interface. (In Arq, encryption is not optional.)


You can repeat that string, or use five easy words to get to that 30 character string.


Then you get an accident, and you have amnesia. How about that scenario?


Easy, tattoo your public and private keys on your body somewhere, maybe as QR codes.

Joking aside, I do use some things like hashes of works of literature and other important things to me as keys and archive hashes as their own passwords. I also send emails to myself explaining my thinking when I make certain decisions. It's like commenting my life.

And if I lost my memories, would I really selectively miss the data that belonged to former me? That's a scenario to think about.


I'd always regretted not keeping my old hard drives when my family left Moscow in 1999. There was the Robotron 1715, which was my first true taste of computing at home. A Compaq-produced 386-SX box was solely responsible for me having learned GW Basic and Pascal before I was a teen. My daughter, who's retracing my steps somewhat, would have loved to see how I had done it. I sense that seeing her dad's attempts at learning to hack might help her understand how harmful current iPad-esque walled gardens are.

Not surprisingly, when I visited Moscow 15 years later, my old hardware was gone, and not a shred of knowledge remained of where it might had gone. My relatives conceded that it might have been stolen. I hope the thieving computer obsolescence club or history museum will enjoy my pieces.


Offtopic: Just a reminder to always remove GPS info from personal photos that are to be uploaded to internet. Photo in post gives me GPS coordinates on Westview Dr in Lake Oswego, OR.

If we are so concerned about our privacy or encryption, we should not make such mistakes.


Oops! I appreciate your bringing this to my attention. This is an unexpected downside to my custom git-based publishing scheme -- easy to leave out EXIF wiping in the asset pipeline.

In the future, it's really important to attempt such notices privately first. Even if the information is out there, making that knowledge public in a popular spot greatly increases the exposure. I'm sure that with your skillset you could have a direct line to contact me in less than a minute.


Thanks for posting his home address.

Great job

(To make a point you don't have to show full coordinates)


Even if I didn't, anyone can still download image and get it. Anyway, I have removed it.


Mandatory XKCD reference: https://xkcd.com/1360/

My point is. The technical aspect of moving data around is actually not that challenging compared to the "business logic" of organizing your old files.


I've lost so much data over the years (at least 10-15 years worth of digital photos, collected documents, everything), and multiple times. And still... I only have plans to have organized and scheduled backups, and never really executed on it. It's a bit like a big part of my life never existed (just in my mind, and when lucky in lower resolution on Facebook).


You should get CrashPlan or similar that is more or less a "fire and forget" kind of backup where you can recover most of what you lose.


Oh, if only CrashPlan existed back then. I've also lost numerous files due to stupid mistakes and failing hard drives. Can't find any photos from before 2006. Now I use CrashPlan, Dropbox w/packrat, Arq and local backups to ZFS w/snapshots.


Oh, this looks interesting, thanks a lot!


“Treat any object that is elevated from the floor like it will fall.”

I find this an excellent metaphor for programming. In particular: low-level debugging of high-level code is inevitable, so choose your abstractions carefully.


I want to apply that to politics/History as well.

That said, this is the one sentence my 1st grade teacher always kept repeating and stuck with me ever since. (it was, at the time, about pens falling from desks)


For the truly paranoid, I advise to put an additional copy in cold storage at a bank vault at bank A. Print the GPG private key and passphrase out and put it in a bank vault at bank B. Handle permissions to access these vaults within your family as you like it.

Bank A shouldn't be in the same region/area as bank B in case a desaster strikes and floods the vault(s).

Then play that scenario through whenever you update your keys/data.


Oh yeah, ddrescue rescues my ass once.

My macbook disk crashed, and my timemachine wasn't working for some time.. It was 200GB of data.

What I did was connect the disk to a linux machine, and use ddrescue for a month 24/7 to try and read the bad disk.

I lost 0 bytes.


I recently put back an old P3 box (laptops don't have Floppy Disk ports) to access an HP Colorado Backup Tape. 400MB only but some interesting stuff like Word HTML page, some old games, the much nicer than I remember QBASIC.EXE[1] and most impressively TURBO.EXE (Turbo Pascal 7 IDE) which clocks at a whopping 600KB.

[1] that thing had pseudo modules, a visual tracer, live indent and lint (limited but still).



cd Desktop/old/old/old/old/old/old/old/old/old/


Some of the images I recovered actually contained directory structures like that, which included some of the oldest files I was able to recover. This approach can work out if the filesystem eventually gets preserved!


This is strange to me, I've been using computers since I can remember, but I only very late started to do anything productive with them. I've never had anything I've wanted to keep, I mean sure I now have some scripts that are valuable, but nothing old like that.

It's also so strange when co-workers talk about how their kids are not just playing games but also creating mods for their games. When I was kid I just drew pictures, played with lego and played video games.


Modding games is extremely easy these days. I had to walk uphill both ways through two feet of snow just to get some games to work in the early 90s. Buckets of blood, sweat and tears went into my MajorMUD scripts.


I have no idea how hard or easy it is, but it still seems like they are doing way more advanced shit than I ever did


I wonder how much of this is selection bias by media. There are now twice as many human beings on this planet as when we were kids (totally making assumptions about you here, if you don't mind). That means more gifted kids, in addition to our ever more invasive media. Our various feeds are (unfortunately) full of page after page of people talking about the things they think their kids are doing. However I don't think kids today are especially special snowflakes compared to when we were kids. I'm not saying there aren't gifted kids. I just mean it's probably not right or true to say that gifted kids these days are more amazing than gifted kids of yesteryear. And it doesn't help that every parent on the planet sells their kid as a genius. Even the parents of stupid kids.


No comment from him trying to do a restore?


Thanks for mentioning this. I did test the restore flow (download from Nearline, decrypt, unlrzip, and verify), but didn't note it in the blog. It's critical to actually test the backups!

Edit: updated the post to mention this as well.


Am I the only one blown away by Google Nearline?

From https://cloud.google.com/storage/docs/nearline:

> Data is available in seconds, not hours or days. With ~3 second response times and 1 cent per GB/month pricing

This is in contrast to 3 hours of Amazon Glacier [1].

[1]: https://aws.amazon.com/glacier/


I went through a similar exercise but had to deal with long deprecated floppy interface tape drives. It took a number of days to source parts and trial and error to get an OS set up to read from them, but in the end it worked... and my BBS lived again inside a DOS emulator!

https://twitter.com/apaprocki/status/550432891201941504

In case anyone else has a stack of long forgotten various QIC formats -- don't lose hope, have patience and restore them :)


Did you consider using OneDrive or Google Drive ? Something like http://www.duplicati.com/


I'd say the only thing more volatile than a forgotten hard drive is a cloud service. They will come and go, go bust or get shut down, get depreciated and replaced. And if you forget to pay, don't count on finding your old forgotten data in 15 years.


Glacier, Nearline or B2 are "cloud" services as well, and while they're more stable than ordinary-consumer-grade services (like Google Drive or OneDrive), the it's not like their future is set in stone (affordability, service policies, etc)

I believe the way to get the best reliability is to keep multiple replicas of the archive (git-annex's great here) over storage options from as many different vendors as possible, and hope the chance of everyone going down at nearly the same time is too low. And probably schedule some periodic fully-automated checks to minimize the chance some service had degraded and had silently lost the data.


In that respect my recommendation is to buy some of these massive Seagate archive disks. I don't know how durable they are but given you can have 8TB for less than $200, I would have one or two in independent places. But I would encrypt the data though. But then where do you store long term the key (brain is a bad choice!)?

And as technologies evolve, in 5y there will probably be cheap 30TB HDD, so it will make sense to consolidated, etc


> buy some of these massive Seagate archive disks

Be careful. There's a tendency that if one drive fails, the other will follow very soon. Especially if the drives are from the same batch (I was bitten by this), but not necessarily. I'm not expert on the topic, but I believe that when planning for long-term HDD reliability, buying drives from multiple different vendors is a must.


I guess it's a good thing that Western Digital and HGST (which should now be thought of as a WD variant) have introduced their own 8TB drives too. Seagate also has "Surveillance" and "Enterprise" variants of 8TB size.


> where do you store long term the key

I know this is clunky, but I'd print out the key (or possible laser-cut the key!) onto durable media and store it in a safe / safety deposit box etc.

It's going to be annoying to manually type it in, but it's a reliable method of last-resort.


> manually type it in

Laser-cut the key represented as a QR code, paired with classic textual hexadecimal (not even base64, to not have to worry about "o"s and zeroes) representation. Problem solved.


I'm aware of at least two old PCs of mine that are still in my mother's garage. One is a Pentium 100 that was primarily used as a router/firewall for my local network so there probably isn't much "data" on it. The other was slightly newer and may actually contain some of my old files. I think I'll bring them home the next time I'm there and see if I can find anything significant on them.


For me, the most important take away from this article is the importance of keeping files in formats that are open. On my Macbook Air, all of my personal files are in text, jpeg/png, pdf, or mp3 formats. Of course, even open formats can fall out of favor, but that doesn't seem to happen as often.

I also like that you encrypted your backups, although in doing so you probably made Senator Dianne Feinstein cry.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: