Securing Your PostgreSQL DB with Roles and Privileges

hn_throwaway_99 · on Aug 12, 2023

The article starts at the top by saying "To become SOC2 compliant, we needed to remove global access and fine-tune who has access to what schemas and tables."

I've had to go through this SOC2 certification process as well, and I think a much better approach (with a lot of other benefits) is to use client side encryption to encode sensitive data like PII or PHI (personal health info) before you insert it into the DB. That way it's easy to give all of your developers read-only access to essentially the entire DB for things like debugging support while still maintaining SOC2 and other compliance (e.g. HIPAA).

Not saying there isn't also good use cases for roles and privileges (and it's a lot harder to add client-side encryption after the fact), but using client side encryption/decryption is a better approach to this issue IMO (you get more security benefits, and the compliance benefits really just are a consequence of that).

lastofus · on Aug 12, 2023

Doesn't encrypting your data before insertion make your data unable to be indexed/searched easily?

hn_throwaway_99 · on Aug 12, 2023

For indexing/searching on encrypted fields we use a blind index (lots of good resources if you search for that term).

On the other hand, sorting on encrypted fields has proven to be a difficult challenge. There are some possible approaches but they lower the security of your encryption.

GauntletWizard · on Aug 12, 2023

Blind indexes are useless when working with limited address spaces like Social Security Numbers, and even US Addresses[1]. It would take under an hour to reverse these on my current home PC.

Your advice isn't simply security theater - It's wrong and dangerous. It leads to companies treating this data, which is still sensitive, as nonsensitive and storing it insecurely, particularly when data teams export it to third-party tools.

[1] https://www.transportation.gov/gis/national-address-database

hn_throwaway_99 · on Aug 13, 2023

> It would take under an hour to reverse these on my current home PC.

The indexes are created with a secure salt. They're only crackable if you know the salt.

webstrand · on Aug 12, 2023

For some kinds of data and queries, it doesn't matter if the data in the index is encrypted. For other kinds of data, you could build the index on an expression that produces decrypted or anonymized values. Sadly postgres doesn't have per-index permissions, so you can't prevent a user with access to the table from using all of it's indexes.

_lqaf · on Aug 12, 2023

That's fine if you want a bucket of bits instead of a database. You can even make it easier by making one big table with an ID and blob, and just serialize | encrypt state to the DB. Easy-peasy.

If you want to use the "R" in RDBMS, though, or report on your data, or use indexes, or anything else that makes it worth running complex DBs instead of a file system, you're stuck using a database as a database.

hn_throwaway_99 · on Aug 12, 2023

This is wrong and unnecessarily snarky. I don't pre-encrypt all data, just PII/PHI. Doing this, or tokenization with a vaulting servjce, is pretty much standard recommended practice for storing sensitive data.

TheNewsIsHere · on Aug 13, 2023

I agree.

I do think there’s an argument to be made for the idea that, to put it colloquially, “somewhere the Social Security Administration needs a database that just has every SSN in plaintext”, but that’s not _exactly_ an honest everyday use case.

snagg · on Aug 12, 2023

We are big proponent of app-layer encryption as well. We wrote extensively about how we do it for our specific use case: https://www.slashid.dev/blog/app-layer-encryption/

singron · on Aug 13, 2023

We've done SOC2 type 1 and 2, and with a few exceptions, you only have to do what you say you do. First you claim you have controls on X, Y, Z, and then your audiors check that. You can just not claim X if you don't want to implement it. If the claim is vague, you have a lot of flexibility for implementation too.

This is a huge reason why SOC2 isn't a very useful certification. Your SOC2 and my SOC2 can be wildly different.

TheNewsIsHere · on Aug 13, 2023

I would qualify this statement. For a competent auditing firm there are non-negotiables to attesting to your own firms compliance, and a discerning (prospective) customer who pays attention and knows how to read those reports can spot places where “you’re trying to get away with it”.

I’d much sooner agree that the flexibility is in implementation. As long as you can hit a control in a reasonable and articulable manner that can be measured and evidenced, you have much flexibility. I see that as the benefit of SOC2. Others see it as an issue.

To your point, last time I led a company through a SOC2 Type 1-2 engagement, we had some standards sourced from NIST that were ahead of industry for the time, and published NIST standards were an authority that the auditing firm was comfortable accepting as compensatory for a control that otherwise would have been absent or out of compliance. So that control was ultimately accepted as “No exceptions during the audit period, but see our notes annex”.

rudasn · on Aug 12, 2023

How can you do client side encryption with web apps though? While keeping the key on the client, I assume, and allowing multiple browser sessions for the same user?

hn_throwaway_99 · on Aug 12, 2023

When I say "client side encryption", I'm referring to the database client, which in most web apps is actually code running on a private server (i.e. browser code makes API calls to a server running something like Python, Node or Java, and that server code makes calls to the DB - it's on the server where PII is encrypted).

That said, you can also use the SubtleCrypto API in the browser to encrypt data before it is even sent to the server.

candiddevmike · on Aug 12, 2023

Envelope encryption, where you encrypt a data encryption key (typically symmetric with AES) with other keys (typically asymmetric with RSA). This is how most password safes like bitwarden work.

TheNewsIsHere · on Aug 13, 2023

This model reminds me of sealed boxes, so I wanted to add that to this discussion.

Send a public key to the client (say in a secrets input page), your browser encrypts field content with that key, and you receive the ciphertext on the server. You can then decrypt it, discard the sealed box keys, and persist the data however you need. (Presumably something that sensitive would get encrypted with a different key before going into the database, but you could keep the keys around and have each piece of data protected by a different key. This has pros and cons.)

Github Actions secrets are protected in transit to Github using sealed boxes.

patmorgan23 · on Aug 12, 2023

I believe by client side they mean the database client, which would be the application backend/server.

louwrentius · on Aug 12, 2023

Maybe I’m totally out of it, but creating an actual database user for each account of your application sounds like you can rely on database security and don’t run the risk of application bugs causing security vulnerabilities.

This means a more complex database level of roles and privileges, which may be it’s own can of worms, but if you have to choose between problems to have, what would you select?

_lqaf · on Aug 12, 2023

We are heavy postgres users, and we do both SOC2 and a more stringent audit demanded by our corporate parent. Lots of roles, lots of RLS, encrypted media, additional column-level encryption for some specific things for not great reasons).

The postgres roles in particular were a recurring mess until we built provisioning that stuff in to our onboarding automation. If you are going to have individual named users, I highly recommend taking humans out of the process, we make too many mistakes.

And yes, there are multiple advantages to enforcing access control as close to the data as possible. I tend to think of databases as "data structures that can defend themselves" - not just security, but type and data validation, relational references, etc. all can both save you from bugs and help you find them earlier.

louwrentius · on Aug 12, 2023

Yes, as an armchair "expert" that's what I was thinking. But managing swats of SQL: a developer friend said that this is difficult too. I don't know. But if I had to choose, I would make the SQL / DB side rock solid, even if it required some business logic in SQL.

warent · on Aug 12, 2023

I don't think this literally means each user of your app gets their own DB user, rather that you create different db users for different aspects of your app.

What you're describing is what RLS (row-level security) is for, where you log into a generic global "app_user" user with certain permissions that don't include things like admin tables etc, and then define the specific user that is using the session via session variables.

louwrentius · on Aug 12, 2023

I've seen blog posts decades ago from DB admins that actually advocated for a DB account per user account. They stated the DB was totally build for that, but people just don't know. They use what they understand, as in: app people may know too little about database security and privileges and just decide to solve it in code.

patmorgan23 · on Aug 12, 2023

There are applications that are built to where literally every user gets a database login. For example Dynamics SL (formerly Solomon) is built this way with MS SQL server.

TheNewsIsHere · on Aug 13, 2023

We use an application in my company that’s designed to run in the vendor cloud or on-prem. They’re pretty transparent that every cloud customer is running in their own specific database, and they’re just clustering all of those databases.

The SaaS build of the app even exposes this in the URLs. Everyone is on the same codebase, and there’s just some additional database logic to connect the right deployments to the right databases.

pphysch · on Aug 12, 2023

Has anyone run into issues with too many roles? Like if you want to use RLS and have a role per application-user, with millions of users.

aeonsky · on Aug 12, 2023

Yes, my team had a direct issue with this on Aurora Postgres, at least. This is PG9 but then kept happening all the way into PG12 until we got rid of all but like 5 roles. Above like 4000 roles we experienced a significant lag on every query, sometimes on the order of seconds. At scaled somewhat linearly. I even wrote to Tom Lane and he said that area of Postgres is poorly optimized.

pphysch · on Aug 12, 2023

Interesting. Why so many roles initially, and how did you safely consolidate to 5?

paulryanrogers · on Aug 12, 2023

Roles/users are backed by PG tables, so I imagine storing them should be no more difficult. (Assuming partitioning and other native features are available.) Yet I'm not sure how well it would scale all the row checks for read and write access, especially if you also use column level security.

Proxying connections for so many different users would be awkward too, though some proxies apparently can take on a different role for the session and revert when client-side disconnects.

anarazel · on Aug 12, 2023

FWIW, you don't need to use database roles if you want to use RLS. You can instead have some other context indicating the current "application user" and use that in your RLS policies.

pphysch · on Aug 12, 2023

Do I have to add that context to every query, or is it something I can set per cursor/transaction?

anarazel · on Aug 12, 2023

Either. What the best approach is depends a bit on your needs / security model.

You can e.g. something like storing the session "application user" in a configuration variable (SET myapp.rls_user =...). But if the user can influence the SQL and that's part of the threat model, you need to do more, because that could be changed by further SQL.

Another solution is to just have a session level temp table indicating the current application user.

pphysch · on Aug 12, 2023

Oh sweet. That approach makes a lot more sense. Access would be through a server-side ORM so users would not be able to run arbitrary SQL. Thanks!

edmundsauto · on Aug 12, 2023

Supabase has pretty good docs and a nice Ui to play around with this, btw.

programmarchy · on Aug 12, 2023

Why have a role per user instead of just defining the row policy with the user directly?

hn_throwaway_99 · on Aug 12, 2023

In Postgres a "role" really is equivalent to a user. A user in Postgres is just a role with the ability to log in.

programmarchy · on Aug 12, 2023

Yes but I’m presuming the person I was responding to meant “role” in the context you’re talking about, but by “user” they meant a row in some “users” or “customers” table corresponding to their application. Questioning the need to “create role” for every application user.

pphysch · on Aug 12, 2023

How does that work? With per "user" roles, I can SET ROLE "user-1000" and enter their authz context without changing any of my queries. How would this work without per-user roles?