Hacker News new | past | comments | ask | show | jobs | submit login

6. Any critical assets - verify you can restore and get working offsite with 0 access to the current network



7. Have a backup strategy

7.5 Test your backup strategy every now and then (nothing is more painful than data loss and dysfunctional backups)

8. "All users lie, there is no exception". Sad but true, get used to it.

9. Automate your infrastructure. If something goes down, just recreate it with Chef/Ansible/Terraform/etc.

[edit: added no.9]


10. Rotate all the passwords immediately since the other guy has left the organization and have a password policy in place.

11. Understand the reason the other guy was fired make sure you are not the one holding the bag.

12. Have a issue tracking system in place, you will thank yourself later.

13. Have a change management policy in place.

14. Before you make any change full understand the consequences and have a tested plan to get back to a known good state.


I would add 15 start keeping a physical log book to document changes.

Dependant on what you have to look after you might want more than one book.


Also, backups are kind of thankless - no one ever puts backup maintenance on your schedule, but if a machine goes down everyone expects you have a backup from three weeks ago and that they can retrieve all the data from it. You also have to poll people as to what needs to be backed up. You need to do test recoveries. Some backups should be offsite, and the policy around that should be communicated to managers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: