Storing encrypted secrets in SCM is pretty standard practice. Usually I recommen...

lmeyerov · on March 4, 2022

Standard doesn't mean good, and the sophistication to do this well is beyond most orgs and individuals . You can come up with specific scenarios, but they'll get garbled through the telephone game over time.

Much better to treat data, including secrets, as radioactive, and thus follow least priv + defense in depth. If it's never there, and better, they never got it, there's nothing to worry about

Ex: Before PII hits logs, encrypt it and don't give data scientists the key.

Ex: orgs we work with who version notebooks won't allow notebook output to be saved, and people know (+ software) to reject baked creds. Likewise, auth isn't baked generally, instead SSO.

If it's never there, for every stage, so much easier.

morelisp · on March 4, 2022

No, storing encrypted secrets in repositories is an incredibly common and safe practice in DevOps/GitOps environments. Virtually every orchestration tool has some way to support it.

E.g. k8s sealed secrets https://github.com/bitnami-labs/sealed-secrets

E.g. Salt's GPG filters https://docs.saltproject.io/en/latest/ref/renderers/all/salt...

E.g. Ansible Vault https://docs.ansible.com/ansible/latest/user_guide/vault.htm...

E.g. Puppet hiera-yaml https://puppet.com/docs/puppet/6/securing-sensitive-data.htm...

PD/PII is a completely separate issue. First because even encrypting doesn't remove legal obligations concerning processing, and second because your DS/BI teams probably need access to the unencrypted data to like, do their actual jobs. You need completely orthogonal forms of access control for that (like SSO as you alude to).

lmeyerov · on March 4, 2022

Wrong thread? I use tools like that, but they're not the environments, credentials, & threat models (I think) we're talking about.

I'd be curious if/how any data science teams (google/facebook/netflix/...) bake user credential & API secrets into data science notebooks. I've never seen it, but I don't get to see everyone's notebook environments. I have seen 1-2 projects attempting to do DB auth plugins/libs for jupyter notebooks, but not high-grade production ones. Instead, it gets baked into the deeper env (think Tableau, Databricks, ...), vs part of the ipynb.

Notebook security feels like 1990s/2000s browser security and the literal decades of unsafe web apis. DLP and all that is at the forefront in risks, yet most tools just do system auth and maybe a few special connector auth. The real threat model is outside of their system, so it's no surprise analysts & their data orgs fail in practice :(

morelisp · on March 4, 2022

Arguably kahrl is in the wrong thread repeatedly posting about "never ever ever ever ever ever storing secrets in version control". As I said, I agree PII/PD is different.

Sealed secrets can be used in various k8s-aware notebooks.