Hacker News new | past | comments | ask | show | jobs | submit login

I agree about the part of not accessing information from production.

But I am wondering how could we debug or test something which happens only on production? I ask this because there are some bugs that can appear at the intersection of code and data.

So far my strategy is to do the following:

1. Only one person can access production DB. This person will do a backup copy and encrypt it to an internal storage.

2. Another one will get the backup and run an anonomizer script on data. The anonimzer is still up to debate what it should do after the obvious cleaning of personal data from user accounts. One important (and hard step) is regenerating the uuids but keeping foreign keys integrity.

At the end this person will create a new DB internally with the anonimizer data.

3. Someome reviews the new DB and marks it as ready to be used

Then a dev can ask access to this fresh copy.

In some teams I played with making this process full automated until review. But then if there are bugs suddenly we have a live internal DB with customer data which is not wanted.

As an alternative but only for small projects I wrote once a script which analysis the DB data and tries to create fro, scratch a similar data structure but with fake data.




> But I am wondering how could we debug or test something which happens only on production? I ask this because there are some bugs that can appear at the intersection of code and data.

I've found that your strategy depends greatly on the kind of bug and what kind of service:

* If you're implementing a DNS server, you can copy live queries and compare good-to-bad. Then you can notify when something bad crops up. But odds are you aren't implementing a DNS server.

* If you're working on something whose behavior potentially changes under load, you need to find a way to replicate load. Some companies have entire production environments where release candidates are sent without being less secure. Cloudflare has some of these - I implemented one of the early versions.

* If you're dealing with weird logic tied to edge cases in the database, you need to work to identify those. Having live data often makes it only marginally easier.

There are products out there that will synthesize large amounts of production-like data based on the patterns in your database. I've used tonic.ai, and I know there are others. As you say, this is a touchy process with nasty error cases. Having someone else implementing it might be desirable.


Use a copy of production (perhaps anonymized) for debugging, and delete the copy afterwards.

> But then if there are bugs suddenly we have a live internal DB with customer data which is not wanted.

Don't let the production-copy touch your normal development environment. Make sure it's deleted in time.


Use a copy of production (perhaps anonymized) for debugging, and delete the copy afterwards.

This way of debugging assumes a lot of things;

- You're assuming that your anonymization script works. What if some data isn't removed?

- What if the system you're using for debugging sends an email or connects to a webhook or attaches to a remote volume or pushes to a cloud service etc etc? Did your anonymization step really work?

- What if someone has connected the system you're debugging on to a production service by mistake? That would mean you're not even using the anonymized database. You're really on production..

- What if you forget to delete the database afterwards? Or forget to purge a cache? Or you fail to delete a container? Or you do delete the container, but not the container volumes? That production data is still there. Oops.

It's much simpler to just not use production data for debugging. It makes debugging harder, which is annoying, but you can't go wrong and accidentally leak your user's data. I'd prefer to just spend more time on debugging than have my users data be put at risk.


Yes, obviously you'd try to debug as much as possible without touching production data.

Of course, different businesses also have different requirements on how sensitive production data is.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: