The more complex your system grows, the more often it will fail and shoot you in the foot. I'd advise against systems like Hashicorp Vault - they just increase the complexity and while they have their merits in complex setups, you seem to be too small to be able to operate such a system.
Have an offline backup printed along with the disaster recovery checklist and documentation and put them in a safe in your company - the checklist should be dumb enough that your drunk self can use it at four in the morning, because you were the nearest employee when everything went down.
Ensure that you have stupid manual processes in place on rotation of the safe's PIN and encryption keys in general, including a sanity check if the newly generated keys actually work (e.g. if they are used for your backup storage, actually back something up and restore it). Ensure that the safe's PIN is available to at least another person and used regularly (e.g. because you store your backup tapes there).
If you feel that you need to change from this very simple system to a more complex one, ask yourself why. What does your change actually add in terms of security and what risks does it add.
In the end, you want your system available to customers and the security you add is to not only secure the data, but actually to know who can access it (the auditing part).
Great answer, thank you. I agree with your point about testing the restore process, right now I'm trying to think of a way to automate it.
As a side note: for example we had some backups that are probably useless, because they are way too small. Catching this would mean more manual regular checks, or some automated rules, at which point it becomes quickly more complex again.
The backup part is quite easy - generate a well-known file with 1kb of data and include it in your backups. After the backup completed, validate that you can restore the well-known file and compare the content of the original and the restored. Easy to automate if you run a well-customizable backup system.
Storytime: I did not check the restoration of my backups some time ago and had a faulty harddrive, so I needed to restore the backup. Backup was also as dumb as possible - essentially a tar and encrypting it with openssl. So I reinstalled the server, tried to decrypt it - got the error, that the key was wrong. Took me a good weekend to find out, that openssl changed the default hash algorithm between openssl 1.0 and 1.1. This would not have been catched with the proposed system, but now I really pin all default options in my scripts as well.
While you're right, I'd recommend against both for the specific use-case. You just added another layer. The extra software needs to be available, maybe it's not developed anymore and won't compile on your system, maybe they changed the alphabet from which the words are generated, ...
OpenSSH private keys are armoured by default, gpg-keys can be exported and imported in an armored format - and everything else can be just printed as hex representation with whatever tool (e.g. `od -Ax <file>` or any other).
z-base-32 isn't going to magically disappear off the face of the earth. Anyway, here's a 32-byte secret key as hex. Still easier to type in than to drive to a data center. GPG is just horribly verbose, and the old school RSA keys are huge in comparison.
It's a 50-line CLI I wrote (just like `entropy` on the other line). It's just a simple interface for https://gitlab.com/NebulousLabs/entropy-mnemonics which is one of the many different "encode binary as words" things out there. It's the idea that matters more.
For a PKI, you can also afford relatively more complexity with respect to the root CA key if you use intermediate CAs for most things and are careful with their expiration.
Have an offline backup printed along with the disaster recovery checklist and documentation and put them in a safe in your company - the checklist should be dumb enough that your drunk self can use it at four in the morning, because you were the nearest employee when everything went down.
Ensure that you have stupid manual processes in place on rotation of the safe's PIN and encryption keys in general, including a sanity check if the newly generated keys actually work (e.g. if they are used for your backup storage, actually back something up and restore it). Ensure that the safe's PIN is available to at least another person and used regularly (e.g. because you store your backup tapes there).
If you feel that you need to change from this very simple system to a more complex one, ask yourself why. What does your change actually add in terms of security and what risks does it add.
In the end, you want your system available to customers and the security you add is to not only secure the data, but actually to know who can access it (the auditing part).