Securing PostgreSQL [pdf]

floatboth · on Nov 3, 2016

> Use a well-known secure algorithm (AES256 is considered the standard).

> Never roll your own crypto.

Directly using a primitive like AES is pretty much rolling your own crypto. (Creating an actual cipher and using it is just beyond ridiculous.)

What you should use is secretbox from NaCl (libsodium or tweetnacl). Or Keyczar, I guess that's still maintained (last commit on github is 21 days ago). Or TripleSec for long term things that you're really really paranoid about.

The point is, use a library that does authenticated encryption for you.

tptacek · on Nov 3, 2016

It's exactly rolling your own crypto. This is the second thread today on which someone has had to point this out (thanks!).

Virtually nobody writes their own ciphers. But "Don't roll your own crypto" is still important and still very common advice. Why? Because the stuff you do to actually apply "AES256" to your data is the stuff that actually causes crypto vulnerabilities.

In fact: if you do some reading on modern stream ciphers and design your own stream cipher, and then plug that cipher into libsodium, you are probably more secure than you would be if you tried to use AES256 in your own library without libsodium!

Cyph0n · on Nov 3, 2016

Why is using AES-256 directly considered "bad"? Is the issue key management? RNG? Padding? Or what exactly? Genuine question.

Or do you mean that re-implementing AES-256 in your own library instead of using a well-tested implementation is "bad"?

technion · on Nov 4, 2016

You can get a good example of this by simply hitting this link and reviewing the first few hits:

https://github.com/search?utf8=%E2%9C%93&q=aes+encryption

I'm not going to call out any project by name, but on the first few pages, I see:

- A project which implements unauthenticated CBC mode AES

- A project that does not, in any way, document its mode or implementation beyond "using AES 256"

- A project with <10 commits, more than two years ago, yet over 50 stars

- A project that uses mcrypt

Over all, it is easier to find a project with horrible deficiencies than anything that didn't immediately look concerning. And every one of these boasts "AES 256 encryption".

Cyph0n · on Nov 4, 2016

I see, thank you for explaining. I'll definitely stick to the "known" libraries if I need AES.

tptacek · on Nov 4, 2016

Be a little more careful than that. Unless you're doing TLS or PGP, you essentially want to stick to NaCL/libsodium, and nothing else.

Cyph0n · on Nov 4, 2016

I will keep that in mind, thanks!

rascul · on Nov 4, 2016

What is wrong with mcrypt?

technion · on Nov 4, 2016

I started writing an answer, but this blog is more comprehensive than anything I could put together:

https://paragonie.com/blog/2015/05/if-you-re-typing-word-mcr...

Problem #2 is something I've hit in multiple real-world codebases.

tptacek · on Nov 4, 2016

It's a low-level crypto library that leaves avoidance of virtually all the exploitable crypto mistakes as an exercise for the programmer.

greendragon · on Nov 3, 2016

https://www.nccgroup.trust/us/about-us/newsroom-and-events/b...

Cyph0n · on Nov 3, 2016

The context of the long-winded conversation you linked to is user auth and session cookies. That's a VERY specific use case for AES.

I'm talking about AES in general.

But I still skimmed through it.

TL;DR

* AES CBC instead of ECB or (gasp) Triple DES

* SHA-1 instead of MD5

* Use MACs

* Be careful with padding

tptacek · on Nov 3, 2016

* ECB is not an alternative to 3DES

* If you use 3DES or any other 8-byte block cipher you import several additional security concerns you have to code around

* If you use CBC you also have to get the IV right, which tons of carefully designed crypto has failed to do

* SHA-1 is also insecure

* Neither MD5 nor SHA-1 is a MAC

* Your choice of MACs brings with it new security pitfalls

* How you apply the MAC also has pitfalls; see, for instance, all the systems that have managed to leave CBC IVs out, because they were specified as separate arguments

* Padding is a pitfall if you use CBC... or a few other modes --- guess which!

* If you use something other than CBC you get other pitfalls

* "Be careful with padding" is a vague description of like 6 different padding vulnerabilities you have to know about

* You still haven't even generated a key yet

* In the unlikely event you get all of this right, all you've managed to do is write a very basic symmetric key seal/unseal --- you're still only 40% of the way (in functional terms) to something as simple as NaCL

So, no, I would not say that was the TL;DR of that piece. I think the TL;DR of that piece is right there in the title.

Cyph0n · on Nov 3, 2016

* Never said AES ECB was an alternative to DES.

* I would never use 3DES.

* I am aware of that, just TLDRing. SHA-256 is my cup of tea.

* Did I say they were a MAC? I would use CBC HMAC + SHA-256 for that.

* No idea, since I'm not an expert at AES.

I feel like you're offended that I didn't like the article. It's just an opinion, don't take it personally.

CiPHPerCoder · on Nov 3, 2016

> * Never said AES ECB was an alternative to DES.

> AES CBC instead of ECB or (gasp) Triple DES

What? This entire vein of conversation is nothing but miscommunication.

These are two choices a developer should use:

  * crypto_secretbox() and crypto_secretbox_open()
    from NaCl or libsodium
  * rm -rf code/ && shutdown

Yes, a privileged few are actually capable of cobbling together a secure cryptosystem out of AES, HMAC, SHA2-family hash functions, and maybe Ed25519 and X25519 if they have a sane implementation available. The general public should just use whatever AEAD mode they're provided and not build their own disaster.

> * I would never use 3DES.

Good.

> * I am aware of that, just TLDRing. SHA-256 is my cup of tea.

SHA-384 is mine.

https://blog.skullsecurity.org/2012/everything-you-need-to-k...

Except for passwords. You don't use simple hash functions for passwords.

> * Did I say they were a MAC? I would use CBC HMAC + SHA-256 for that.

I'm assuming you meant AES-CBC + HMAC-SHA-256 here, in an encrypt-then-authenticate mode.

By "assuming" I of course meant "hoping".

> * No idea, since I'm not an expert at AES.

AES expertise isn't the issue here. Composing a secure cryptography protocol out of standard implementations is a rare skill set among software engineers.

Cyph0n · on Nov 4, 2016

And I thought I couldn't have gotten a worse response! Good job :)

tptacek · on Nov 3, 2016

I'm not offended that you didn't like the article. I'm saying you failed to summarize it. I'm pretty sure I know what the point of this particular article was. :)

Cyph0n · on Nov 3, 2016

OK, fair enough.

tonyhb · on Nov 3, 2016

Yep. I've used NaCl for edwardian curves using djb's boxes and it's pretty much ideal as far as I'm concerned.

tptacek · on Nov 3, 2016

Edwards curves, not Edwardian. They didn't have those curves in 1905.

Curve25519 isn't an Edwards curve. Ed25519 is the equivalent Edwards curve. Curve25519 is a Montgomery curve. NaCL uses C25519 for DH, and Ed25519 for signing.

I'm just responding to the nerd snipe (and eagerly awaiting my comeuppance from 'pbsd). You're right (in advance) that the point of using NaCL is that you don't need to know any of this trivia.

srd · on Nov 4, 2016

I have a question about storing the encryption keys. How would one actually securely store them and distribute them among the application servers in a cloud environment.

I don't buy into the 12-factor-application way of storing sensitive data in an environment variable ("ps ae" and a local intruder has the data).

Storing it in a secured file on the server requires the file to be distributed by the provisioning service, thus implying the key being stored in a repository (that just isn't your application code, but a repository none the less).

Using an api service to with client certificates doesn't really help either, because if the application code can access the required configuration and certificates, so can an intruder with shell access (since the intruder most likely has the user permissions of the running application before doing a privilege escalation attack).

I haven't seen an answer to that question that really satisfied me in the past. Does anyone have a battle tested method for storing database encryption keys?

koffiezet · on Nov 4, 2016

> Using an api service to with client certificates doesn't really help either, because if the application code can access the required configuration and certificates, so can an intruder with shell access (since the intruder most likely has the user permissions of the running application before doing a privilege escalation attack).

The problem here is that if your application requires the use of encryption keys, and the user which runs this application gets compromised, you have a problem anyway - since like most applications, yours probably also doesn't care about the key's security once it's in memory. If they get that far, the only thing you can do is replace/revoke those keys and take the hit.

In more secure non-cloud setups, encryption/decryption is being done by something like a HSM, a box with only one interface (usually PKCS#11) which can be used to encrypt, decrypt and sign stuff, and you never see the keys. You have software HSM's - and you could try to apply the same principal in the cloud, where you have an isolated box with only the very well protected and audited soft HSM running.

But not sure the cost of learning and maintaining such a system is worth it for most situations, where I would use an API service like Hashicorp's Vault. Most of the compromising of keys and secrets doesn't happen on your servers, but on your or some developer/user's work machine. How many crap is exchanged over email, dropbox links, slack, skype, ...? Keeping keys out of the hand of the user and eliminating the need for your users to have them at all is higher on my priority list.

altendo · on Nov 3, 2016

While there is some PostgreSQL-specific things in this, the vast majority of it is pretty solid advice regardless of the database you have.

knorker · on Nov 3, 2016

The advice on FDE is misleading if not flat out wrong. It's NOT just to protect against media theft. It's also about "shit, the drive crashed. How do I destroy the data on it before throwing it in the trash?"

Same with replacing a smaller drive with a bigger drive. Yes, DBAN it, but that won't take care of remapped sectors.

It's easier to destroy an encryption key than it is to destroy the data.

rzzzt · on Nov 3, 2016

To the point that more recent hard drives already come with hardware encryption enabled, and the secure erase/sanitize command can finish in mere seconds, because it only needs to throw away the existing key: https://en.wikipedia.org/wiki/Hardware-based_full_disk_encry...

fulafel · on Nov 4, 2016

That requires you to trust the disk to implement many finicky crypto details correctly in its proprietary black box, you can't get any assurance about that with reasonable cost. And SSDs have a horrible track record in the area, manyimplementing SECURE ERASE so it leaves your data on the flash chips. And if you choose a disk model, make your investigations and find it passes your sniff test, disks are no longer a commodity part for you.

tptacek · on Nov 3, 2016

The point is that FDE protects you principally from loss of physical custody, and the reason the slide is there is because lots of developers think that FDE somehow helps in the (vastly more likely) scenario where your machine gets owned up over the Internet.

knorker · on Nov 4, 2016

Yeah, but I'd rather not fight misunderstandings with misinformation.

It's more true to say, like you do, that it's about loss of physical custody. But the presentation said theft.

You are more correct in your phrasing, but even that doesn't cover my example of swapping hard drives around. If someone hacks the low security server and finds that it still has customer data in the unwritten blocks of the FS, or in remapped sectors, then that's not loss of physical custody, but it would have been prevented by FDE.

And honestly, who would not scan for deleted data after hacking a machine? It's not a tiny detail.

tptacek · on Nov 4, 2016

If you have PII on a disk, and then you hand that disk to another organization not prepared to protect PII, you have surrendered physical custody of the PII. The distinction you're making is basically the same as the distinction between a thief cinderblocking your car window to steal your laptop, and you leaving your laptop in the back seat of a cab. Yes: FDE helps both if your drive is physically stolen, and if you physically lose it.

knorker · on Nov 5, 2016

If you don't have FDE then you cannot erase PII, is my point.

That hard drive is then forever tainted by PII.

With FDE, that is not the case.

tzaman · on Nov 3, 2016

So true, which is why I find it kind of awkward when I say to young developers/business owners they should pay for a professional PostgreSQL hosting and they argue it's open source and they can host their DBs on a $25 (or less) Digital Ocean instance.

Until it's too late, I guess.

hinkley · on Nov 3, 2016

Precautionary principle.

Multiply the odds of the problem by the costs of the problem. If a DO outage would cost you millions of dollars, then it will be much, much cheaper to spend $250,000 a year to fix a problem than it will be to deal with the consequences.

The problem is that most software developers are gamblers, and most businesses encourage that behavior. Some even cultivate it. We take stupid risks all the time and when we don't get caught we think that means that none of our actions have consequences.

And mostly those people are right. Stock options pay off so infrequently and so inconsistently that they are a poor incentive for long term thinking. In fact the only time I ever got any money out of options was because of a pump and dump by the founders (aka an acquisition).

Even the worst developers I've worked with would take 3-5 years to drive a company into the ground, unless nobody else was paying any attention at all. And when the company folds, they've still got years of take-home pay and they just have to find a new job.

pjlegato · on Nov 3, 2016

We (databaselabs.io) are the first Postgres as a Service on DigitalOcean. We fight this sales battle on a daily basis.

The problem is that younger devs/owners are much less likely to have ever experienced any significant outage, so it's a non-issue for them. They can't see why they should pay good money to prevent it.

We've had much better results with slightly older and more experienced people, who realize that our service is an absolute bargain compared to the cost and risk of attempting to run continuous backups, etc. themselves.

aa_memon · on Nov 4, 2016

i am curious, you are commenting on a post about secure postgres but after signing up for your service i see that the database i created is publicly available, on the standard port 5432, with the default postgres username. the only thing stopping me is a password, no certificate required. i am not a postgres expert but how is this not extremely insecure?

pjlegato · on Nov 5, 2016

"Extremely" insecure is a bit of an exaggeration. It's not as though the password is "password" on all the databases.

We do support certificate-based logins and firewalls with IP whitelists. They're just not on by default. Frankly, the reason is because almost all customers don't care about having a very high level of security if that implies doing more work (setting up certificates, whitelisting IPs.) Moreover, they actively prefer to have "simpler and less secure" over "more complex and more secure."

We've done experiments with certificates and firewall IP whitelists and so on. Almost all customers and potential customers reject these things. They say they want a simple password that just works from anywhere.

We had to choose between being slightly less secure and having customers, versus being highly secure and having no customers. Since the business can't survive without any customers, that choice was easy.

That said, if you do want any of those things turned on for your database, just write support@databaselabs.io.

jlgaddis · on Nov 5, 2016

Out of curiosity, can one change these settings for their own databases automatically via some "control panel" or such, or does it require intervention from your staff?

Fortunately, I think there have been fewer "unauthenticated remote access" vulnerabilities with PostgreSQL than MySQL so this (being accessible from 0/0) probably isn't a huge deal. That said, I'd look for ways to restrict who can actually connect to 5432/TCP that won't negatively affect the majority of your customers (e.g., if your databases are running on DigitalOcean, can you restrict connections to that particular DO datacenter by default and provide an option to loosen those restrictions in increments -- "this datacenter", "all DO datacenters", "the world", etc.?).

pjlegato · on Nov 6, 2016

These settings are currently manually operated by staff, via support@databaselabs.io.

That will get added to the control panel eventually, but right now as approximately zero percent of customers want those things, it's not a good use of our limited engineering time to even automate that, versus other things that engineers could be doing with their time.

While it would be nice in theory to restrict them by default, in practice there's just no restriction that's close enough to universally applicable to be workable (i.e. one that won't disrupt a large number of users' use of the database if it's applied everywhere.)

And you are correct, there are essentially zero unauthenticated remote access vulnerabilities that come out in Postgres. Combine that with:

* All connections require SSL * The password is a long string of randomly generated characters * We actively monitor the network for unusual traffic patterns

and it's actually not so bad. Not ideal of course, but very much not "extremely" insecure, as the above post said.

amirmansour · on Nov 4, 2016

How are you guaranteeing 100% uptime if your service relies on third-party infrastructures?

pjlegato · on Nov 5, 2016

100% uptime is a financial guarantee: if it's up for less than 100% of the time (other than scheduled maintenance), we refund 30x that amount, up to 1 months' fees.

We used to have a more engineery 99.95% uptime guarantee, but customers empirically prefer the financial guarantee over that.

sandGorgon · on Nov 3, 2016

here's a question - other than RDS, which other PG hosting would you recommend ?

pjlegato · on Nov 3, 2016

Try us! https://www.databaselabs.io/ . Professional PG hosting in Google, AWS, and DigitalOcean.

Happy to discuss more details -- pjlegato at databaselabs.io.

sandGorgon · on Nov 4, 2016

good to know. i just checked out your website on my phone . i have to do a google login auth before i can see pricing?

i wanted to see what you offer... replication strategy, etc

EDIT: I just went to your desktop website and can see pricing. Do you do failover ? AWS RDS does failover and high availability - which makes it worth it.

If you can do high availability on digitalocean. That will be killer.

pjlegato · on Nov 5, 2016

Doh. It looks like pricing is broken on mobile. Sorry about that! We're on it.

Yes, we set up and run autofailover upon request. It's getting rolled into the UI this month. In the meantime, write support@databaselabs.io and we'll turn that on for you manually.

tzaman · on Nov 4, 2016

You can do HA on DigitalOcean, the question is whether you should do it yourself, and in most cases, the answer is a big NO.

why-el · on Nov 3, 2016

PG has an good list in their website: https://www.postgresql.org/support/professional_hosting/

whitepoplar · on Nov 3, 2016

What companies do you think offer the most competent Postgres hosting?

rch · on Nov 3, 2016

I've been happy with Amazon RDS.

cygned · on Nov 3, 2016

Awesome! Is the actual talk somewhere as a video?

craigkerstiens · on Nov 3, 2016

The same talk is actually being given at PGConf Silicon Valley in just under 2 weeks (http://www.pgconfsv.com/program). We will be recording the audio and slides there and that video will be online some weeks after. You could also always come and see it live if you're in the bay area :)

cygned · on Nov 3, 2016

Too sad, Germany is a bit too far away. However, thanks in advance for recording, looking forward to watching it!

smnplk · on Nov 3, 2016

Me too. I would like to see the talk. Thanks for recording it.

brian_cloutier · on Nov 3, 2016

These slides are from a talk given today at pgconfeu, unfortunately no videos were taken.

jtchang · on Nov 3, 2016

This is solid advice. While you may not be able to do everything in the slides there are some really good practices.

jve · on Nov 4, 2016

"For critical passwords, use split passwords with dual custody."

Could anyone comment on the practice? Does it mean that I have the first half of password and other guy has the other? We can only log in by combining password? How do we type the password in by not sharing a physical computer?

rwilsonperkin · on Nov 4, 2016

Yep! Dual custody can really just be as simple as providing two halves of the password from different owners. I've implemented similar before. Both owners having physical access tends to be the safest way, as whenever one is remote you have considerations about the password in transit. Something like tmate could help.

esseti · on Nov 3, 2016

The problem with pgcrypto is the fact that in the logs there's the password? that's it? I would like to see the talk to get a better understainding of the whole message, is it available somewhere?

mmerickel · on Nov 3, 2016

The fundamental issue is that you should not be exposing the encryption keys to the database. If you're using pgcrypto then you're issuing SQL statements in the database with the key. You should do your encryption client-side so that the key is never passed over the wire at all.

okket · on Nov 3, 2016

That said, pgcrypto is still useful, you can compute hashes or generate UUIDs with it.

dom0 · on Nov 3, 2016

The point was that you probably shouldn't use it for data encryption in the DB and think the server doesn't have the keys.

CodeWriter23 · on Nov 3, 2016

No the issue is statement logging can be turned on at any time and when it is on, everything you are encrypting is revealed in plaintext in the log.

tptacek · on Nov 3, 2016

The bigger problem is that pgcrypto is 1990s cryptography: it supports Blowfish, has optional authentication, uses all-zeroes IVs, falls back to insecure RNGs, uses old cipher modes, has ambiguous padding...

Don't use pgcrypto. Use libsodium to encrypt in your application.

smkdtr · on Nov 3, 2016

One of the constraints I am working with is that a human is not necessarily in the loop all the time. What is the best way then bootstrap the encryption process?

tptacek · on Nov 3, 2016

You basically don't. You're adding a lot of complexity chasing after a very marginal (say, 5%†) fraction of the security application-layer crypto gets you.

You can get that fraction closer to 50% without a human in the loop by segregating your crypto code from the application in a virtual HSM, using something like a TLS client certificate to authenticate your application to the HSM, have humans in the loop for restarts/bringups of the HSM itself, and doing pretty aggressive monitoring on the request patterns between the app and the HSM.

The threat model here is that someone owns up your app server --- if that wasn't in your model, you wouldn't need application layer crypto. The idea is that owning up the app server will get an attacker enough access to make requests to the HSM, but not control of the HSM itself. If the HSM can detect abnormal volume of requests, you can use it as a circuit breaker to prevent bulk exfiltration of your secure data. You also get accountability, so that when your server is compromised you might know exactly what records were accessed.

Typically, the app server itself will handle all of the database operations, and the virtual HSM just provides a "seal" and "unseal" operation --- convert plaintext to ciphertext, convert ciphertext to plaintext.

† The 5% you're getting in the dumb online crypto scenario is that if you store the root crypto secrets in a file, an attacker can't necessarily recover it from an SQL injection attack --- but in reality the percentage is probably lower, since most of the time SQLI will equate to RCE.

smkdtr · on Nov 4, 2016

Thanks for the response. Although not ideal, I am stuck with using crypto at the DB level. Based on what you said; running something like vault (running as a separate user from the DB) and accessing it using the HTTP API + TLS client cert seems like the way to go.

zie · on Nov 4, 2016

You can use something like vault, they have "crypto as a service" https://www.vaultproject.io/docs/secrets/transit/index.html

You still have to bootstrap vault, but it's a pretty robust and easy to implement HA system.

rietta · on Nov 3, 2016

68 slides! This must have been an epic or fast paced conference talk. Good points. I think the comments on FDE are spot on; it's almost always about compliance. And I'm reasonably comfortable with SSH access if, and only if, it uses SSH key authentication (no password auth) and that PermitRootLogin is turned off. A Bastion host is a good idea, but "never" is a bit strong.

Puts · on Nov 3, 2016

I think this talk misses one of the most important security patterns of PostgreSQL and SQL-databases in general. If you for example have a table with hashed passwords. Why would any user except admin need to be able to make a select on that table? Make a function to validate the user and only grant permission to run this function.

tptacek · on Nov 3, 2016

That's a bad idea, because it implies that your password hash has to be expressible inside of Postgres.

If you're worried about a SQL dump exposing password hashes, segregate password validation into its own microservice. This comes with other benefits: for instance, you can ratchet up the work factor on your password hash, because the service will very easily scale horizontally.

Puts · on Nov 3, 2016

First of all, this pattern is not exclusive to password hashes. There are lots of situation when handling customer data where you simply don't need the ability for the client to query the whole data-set, and if that's the case, allowing it is just bad hygiene.

Now if you make a good set of prepared statements as an interface for your database, this could be viewed as a "micro service" in it self.

tptacek · on Nov 3, 2016

Yes, and that's a good pattern. Just not for password hashes.

agentgt · on Nov 3, 2016

Perhaps to help others extract info: for me most of this I know or is sort of common sense at this point with the one exception of page 39: "do the encryption in your app".

I'm not saying I have ever used pgcrypto but it is important to be mindful that pg is logging stuff.