Postgres Demystified

radimm · on March 24, 2013

You can watch the session here http://www.youtube.com/watch?v=3wwRmmd0P3k

craigkerstiens · on March 24, 2013

There's also a version available here - http://www.infoq.com/presentations/Postgres-Introduction

johnwards · on March 24, 2013

and a 28 minute version in HD here: http://vimeo.com/56682925

(Hey Craig! ;))

cheapsteak · on March 25, 2013

Thank you. This one has the best audio and video quality for sure.

mseidl · on March 24, 2013

Thanks. Just looking at the decks provides very little use.

viraptor · on March 24, 2013

From the datatype wish list, these look like a bad idea actually:

- url - If all schemes are allowed, this is almost equivalent to "does it include :// after the first word". I'm not sure why it should be included.

- phone - Around the world this is equivalent to: does it contain digits. And maybe also + # and star. And maybe also letters for internal voip phones. And....

- zip - Every country has its own. There is no common format. It can have letters, numbers, symbols, ... anything.

- email - This actually could work if it allowed all possible formats.

jacques_chester · on March 25, 2013

PostgreSQL handles UUIDs, IP addresses and booleans quite intelligently. Where appropriate they include specialised functions that you can use in queries, and the formats for input are generous so long as your input is unambiguous.

For example, the boolean type in PostgreSQL will accept TRUE, 'true', 't', '1', 'on', 'y' and 'yes' (case insensitive) for truthful values and the equivalents for false values. It's hard to state how much better this is than Oracle, to take my work environment as an example.

My point is that I imagine that when the PostgreSQL team get around to these types, they will address them with the same care and forethought as they have for everything else. I trust them because their feature plans are not based on maximising shininess.

wereHamster · on March 25, 2013

> For example, the boolean type in PostgreSQL will accept TRUE, 'true', 't', '1', 'on', 'y' and 'yes' (case insensitive) for truthful values and the equivalents for false values.

But does it support 'affirmative', 'correct', 'aye' (http://thedailywtf.com/Articles/Are-You-Sure.aspx) or 'YOUBET', 'YEP', 'NOWay', 'YEAH', 'NOPE', 'Maybe' (http://thedailywtf.com/Articles/The-Object-Test,-a-New-PI,-a...) ?

Note the mapping of 'Maybe' to a boolean through '__TIME__[7]&1'!

jeffdavis · on March 24, 2013

I can see most of those being useful so that you can easily extract components. For instance, from a URL you might want to extract the URI; or from a phone number you might want to extract the country code.

If done right, I can see those being useful. I agree that it can be overdone, though.

ysangkok · on March 24, 2013

I don't like how timetz was highlighted. From the PostgreSQL docs[1]: "Time zones in the real world have little meaning unless associated with a date as well as a time, since the offset can vary through the year with daylight-saving time boundaries."

Also, I quote: "It's Postgres or Postgres-Que-Ell, not Postgre-Es-Que-Ell". And then a slide shows "PostgresQL" while http://postgresql.org shows "PostgreSQL". That pronounciation makes no sense to me if the slide is correct.

[1]: http://www.postgresql.org/docs/9.2/static/datatype-datetime....

ibotty · on March 25, 2013

that's a typo.

rogerbinns · on March 25, 2013

I'm waiting for the day when databases figure out their own damn indexes. They are sitting there with billions of CPU cycles per second, trillions of bytes of RAM and trillions of bytes of storage. Yet someone has to go in and babysit the thing to tell it which indexes to create.

(Yes I know the perfect dbadmin can make the perfect indexes. But I expect the database to figure it out itself and rarely be wrong.)

ratzinho87 · on March 25, 2013

RavenDB knows auto indexes, though I haven't tried it yet. http://ravendb.net/

DrJokepu · on March 25, 2013

IBM's DB2 has an inbuilt tool that can suggest (and create) indexes for you.

ysangkok · on March 25, 2013

Sounds like you need strong AI. :) But really, isn't that like wishing the compiler would find out if you need a priority queue or a stack? Or rewriting your n² sort automatically to n log n?

rogerbinns · on March 25, 2013

No, it is more like the compiler working out that when you say x=0 whether it should be done by loading a literal, using a zero register or xorring the register with itself.

Note that you don't actually need strong AI. Look at what MongoDB does http://docs.mongodb.org/manual/core/read-operations/#query-o...

Essentially it is dumb in that it tries multiple query plans concurrently, figuring out which was most efficient and using that in the future, monitoring its expected performance and trying all candidates again when it deviates.

With modern systems having such a surplus or CPU, RAM and storage a dumb system could try multiple indexing strategies, and work out which worked best.

baubu · on March 24, 2013

The biggest problem with PostgreSQL is the lack of a REPLACE statement and in general first-class "upsert" support.

This is the most fundamental mutation operation on dictionary, in all programming languages, denoted by:

map[key] = value

It's truly mystifying how something so basic could be missing from something that is supposed to be a better system for storing data.

It also doesn't even fully implement the SQL standards, missing for example the MERGE statement in SQL 2003 standard, which is also utterly baffling.

Note that all databases have catastrophic issues like these, it's just ridiculous how they can be in this terrible state.

psadauskas · on March 25, 2013

You can do upserts as a single atomic statement using the WITH statement:

https://gist.github.com/paul/855efdecaaa2ec4deec7

You can also use it to perform a "find or insert" as a single atomic action:

https://gist.github.com/paul/75ec84d131e36492b17b

malkia · on March 25, 2013

Maybe because there is no optimal way to do this without locking.

killedbydeath · on March 24, 2013

I also miss this feature however there are options: a) for small and mid-sized tables you can use a SELECT + conditional INSERT or UPDATE or INSERT-on-exception-UPDATE works b) write-heavy tables are worth being rearchitected to be INSERT-only as this helps avoiding any locks

jacques_chester · on March 25, 2013

I like MERGE too, in the proper place. I'm not sure that the absence of MERGE constitutes "this terrible state".

It would be like a commercial RDBMS lacking boolean and serial types, necessitating reams and reams of repetitive, error-prone boilerplate constraints and insert/update triggers.

mercurial · on March 25, 2013

Yeah, but boy, they're sure going to bill you for it.

sheraz · on March 25, 2013

It does take some extra work, but I've implemented this using trigger functions. There is questions and various ways to solve it on stackoverflow [1]:

[1] - http://stackoverflow.com/q/1109061/1646663

fjordan · on March 24, 2013

One feature I've missed while using Postgres is the built-in REPLACE INTO that MySQL provides. I know it's possible to create a function to accomplish this in Postgres for specific cases, but is there some general variant which could be applied to multiple tables?

jordanthoms · on March 24, 2013

If you are using ruby, upsert woks well : https://github.com/seamusabshere/upsert

3amOpsGuy · on March 24, 2013

A delete followed by an insert in the one transaction - break into batches for a huge table?

That's really simplistic but I don't know the use case.

jeltz · on March 24, 2013

That is not enough to solve the general case due to locking considerations.

If we have one transaction running delete and then insert concurrently with one transaction doing a normal update then the update will find zero rows to update if it is ran after the delete+insert. The inserted row will be invisible to the update while the old row will be locked. This means the update will wait until the other transaction is committed and then find neither the old nor the new row.

A correct REPLACE implementation would make sure the UPDATE would UPDATE the replaced row.

The above is assuming READ COMMITTED isolation level.

3amOpsGuy · on March 24, 2013

Yep and even if we do account for this, say by asserting a table lock or escalating isolation level, we can still run into problems in the case that there are triggers defined.

There is no direct substitute in PostgreSQL, but there are case by case alternatives AND as far as any use cases I can dream up right now, they have better (As in they are easier to debug, easier to observe and easier to tune - MySQL necessarily hides the update operation from you) semantics.

adrianmsmith · on March 25, 2013

Interesting, PostgreSQL works just like you said.

However MySQL doesn't. I've just tried it, and MySQL blocks the UPDATE until the first transaction (DELETE, INSERT) is finished (like PostgreSQL), but after the commit, the UPDATE statement updates the newly inserted row.

It seems MySQL firstly searches the table for the row and then attempts to acquire a lock on it (thus waits for the DELETE, INSERT to finish, as it's locked the row being DELETEd), and, after the lock has been acquired, searches the table again for rows matching the query in order to perform it.

jeltz · on March 25, 2013

Interesting, I did not know this.

Did you run MySQL in READ COMMITTED or REPEATABLE READ? And would it matter?

neya · on March 24, 2013

It's amazing how the newcomers to tech seem to ignore 'proven' databases like PostgreSQL so easily that have actually stood the test of time and actually have the real capacity to scale enormously on demand (which when done right). What is more disappointing is how the newcomers seem to trade in for beta-quality NoSQL databases that promise them of better reliability and scalability out of the box, which is simply not true[1]. For instance, databases like MongoDB promise the developer to not worry about scalability and focus just on their code, while it does some things under the hood in a terrible way [2].

PostgreSQL is one of the few databases that supports SQL and can actually be less painful to scale than MySQL. Also, the recent versions have enormous performance boost that can silence the SQL critics, IMO.

If you are interested in testing out PostgreSQL for your project, you might want to develop on the WAPP stack and give it a shot:

http://bitnami.com/stack/wapp

Disclaimer:

I was a mongoDB developer too and I loved it so much. I loved the fact that I could skip running db migrations on my rails app and I can focus on just coding, instead of worrying about scaling. But, it was a terrible trade off. First of all, you can't use NoSQL for everything. It was a very painful lesson that I learned the wrong way. 99% of the time, you want to use SQL, because most use cases can be executed perfectly with SQL db's. Next, some db's have a terrible architecture and design decisions under the hood, mainly to improve their benchmarks (seriously!). For example, turning off write-safe by default... :cough: :cough: Last but not the least, SQL for the wrong use case complicates things and ends up adding redundancy into the database (like storing User details inside multiple collections, un-avoidable, especially when you are against embedding documents). The worst trade-off using a NoSQL database is the analytics part - "Hey db, show me the list of users who are from the United States and who have subscribed to Plan X and who are the highest paying customers and who love chocolate pie" SQL - "Here you go." NoSQL - "Sorry sir, not possible. Possible, but possibly a nightmare."

If you want to build a 'scalable' app for your next big thing and don't want the scaling hassles of MySQL, which actually scales extremely well already[3], you can use PostgreSQL.

Cheers

[1]Scaling is hard. Don't let anyone else tell you otherwise!

[2] http://blog.engineering.kiip.me/post/20988881092/a-year-with...

[3] http://www.quora.com/Quora-Infrastructure/Why-does-Quora-use...

SkyMarshal · on March 24, 2013

My rule of thumb is, use PostgreSQL unless you have a well-defined need for something else. There are few things you can't do with it, and many things most other free databases lack.

Also, if you're not aware of it you'll probably be interested in Fabian Pascal's Database Debunking's site [1].

[1]: http://www.dbdebunk.com/

jacques_chester · on March 25, 2013

You must always pay the development complexity piper.

SQL databases start off difficult but pay dividends on the back end. NoSQL databases start of easy but begin to extract their costs later.

Similarly, a well-normalised database is tuned to write-and-store performance, but the cost of joins can crush you at query time.

What's usually missing from discussions like these is highlighting the difference between OLTP and OLAP.

You favour one or the other. TANSTAAFL.

liotier · on March 25, 2013

> SQL databases start off difficult but pay dividends on the back end. NoSQL databases start of easy but begin to extract their costs later.

Funny - I feel it is the other way around : NoSQL yield scalability to the high end but I find SQL incredibly more convenient because the database does most of the work for me... We all have our bias and mine is thinking in a relational way.

yen223 · on March 25, 2013

TANSTAAFL = "There ain't no such thing as a free lunch", for those who are wondering.

neya · on March 25, 2013

Thanks! I actually was really wondering what it meant :D

falcolas · on March 24, 2013

"... can actually be less painful to scale than MySQL."

Can you expand on this? All that I'm aware of, in the scaling department, is their basic replication introduced recently.

neya · on March 25, 2013

My view is that a single PostgreSQL box can handle reliably and efficiently enough queries than a comparable MySQL box. This is not to say that one is better than the other, it depends on a lot of things, but I am actually addressing only the pain points. PostgreSQL's enterprise brother, powered by enterprisedb has a cloud solution that takes care of scaling while you can focus on your code.[1] I had this in mind when I was writing that sentence.

Also, here's some more inks to research about: http://stackoverflow.com/search?tab=votes&q=postgresql%2...

Again, I don't say using PostgreSQL is a magical solution for your next app. Just that in my opinion it is better than MySQL.

[1]http://www.enterprisedb.com/products-services-training/produ...

vidarh · on March 25, 2013

I've used Postgres for the last 5 years, and mostly MySQL before that. But while I love a lot of features in Postgres, my big caveat with it is that Postgres seems to have an attitude of "if we can't do it perfectly, we won't do it yet", and so we only got "proper" replication recently (or had to deal with third party hacks on top of triggers or the like) and still can't do multi-master replication without third party hacks.

Meanwhile, MySQL have had multi-master replication "forever" because they took the easy way out of "simply" logging statements, and streaming those to slaves, and add a mechanism for adding offsets and step sizes to sequences. It's a bit of a hack, and vulnerable (if you do clashing updates, replication will fail), but in real life usage it gave MySQL a massive advantage for some types of scenarios for a long time (e.g. back in 2006 I ran a cluster of 16 MySQL servers spread over two sites; configuration was trivial while doing the same with Postgres at the time was a massive PITA).

It's a philosophical difference that means that while I'd be more comfortable about trusting our payroll data to Postgres, for a web app where major scalability is a concern, I'd be prepared to consider MySQL in situations where I'd consider Postgres a no-go because of the hassle.

Postgres feels like it has a more theoretically sound foundation, and is catching up feature-wise, though.

A few more iterations on the replication support, and extensions like https://code.google.com/p/plv8js/wiki/PLV8 (run Javascript in Postgres via V8, which also instantly gives nice JSON support to rival many of the no-sql solutions) and it's eating it's way into both the no-sql space and the space held by MySQL.

falcolas · on March 25, 2013

"a single PostgreSQL box can handle reliably and efficiently enough queries than a comparable MySQL box"

Sorry, but citation needed. Most of the (relevant) stack overflow results are pointing to articles citing 4.1 and earlier MySQL versions, or pure FUD opinion pieces.

MySQL and PostgreSQL are within the same order of magnitude (depending on workload and how much time you're willing to spend optimizing queries) for a single server, so to say that PostgreSQL is better requires some backing up with actual numbers or techniques.

DannoHung · on March 25, 2013

If I actually don't know what data I want to put into my DB before I write to it, is a Mongo-esque solution appropriate?

neya · on March 25, 2013

The other commenter is right - When you know what data you want to store, I'm pretty sure you can work out the storage options. For example, most apps that require a typical user registration and login can be done with a basic MySQL box or a PgSQL box.

You can use (and should use) a NoSQL database when you have data that benefits from being non-relational and storing it in a relational db will actually cause you trouble.

For example, let's assume this function:

    f(x){
     return someHeavyComputationThatRequiresALotOfTime(x);
    }

Let's say your specific use case requires that for well-defined inputs of the function f(x), you want the outputs immediately and processing this function for each request will cost you a LOT of resources (think CPU, RAM, etc.). So, in such cases, if and only if it makes business sense, you can store the inputs ranging from 0 to x and corresponding computed outputs in a NoSQL database and access it when needed, instead of actually computing it everytime.

so, your data would look something like:

    {"input": 0, "output": 143532.3434}
    {"input": 1, "output": 22342424.0934}
    {"input": 2, "output": 33242423.2423}
    {"input": 3, "output": 346324634.4546}
    {"input": 4, "output": 321144563.2457}
    {"input": 5, "output": 536573462.5646}

and so on..

Actually my example is bad because you can store the same thing on SQL db's too, but unless there is a very specific advantage[1] that these NoSQL db's have (multitudes faster read speed, lower memory consumption, etc.) you want to stay away from them.

If you are evaluating a good NoSQL db for your project, then I suggest you check out Voldemort DB[2] by LinkedIn, which is actually pretty good and has some positive feedback from people running it at scale[3] without much of the marketing layer that Mongo has.

[1] http://www.slideshare.net/nurkiewicz/projekt-voldemort-when-...

[2] http://www.project-voldemort.com/voldemort/

[3] http://engineering.linkedin.com/voldemort/voldemort-collecti...

Someone · on March 25, 2013

I guess that, at the time you have the data in hand, you do know what kind of data you have and also have a rough idea about what variations will exist.

If you don't, apparently, the NoSQL databases will work fine, but so would SQL databases that you tell "here's a binary blob". AFAIK, neither would support querying those binary blobs, though.

dopamean · on March 24, 2013

Is there something similar to WAPP for MacOS?

jcoder · on March 24, 2013

If you want the simplest way to run Postgres on a Mac, have a look at http://postgresapp.com/, or `brew install postgres`

dopamean · on March 24, 2013

I'm new to Macs. brew returns with command not found. I thought it was installed by default...perhaps not? Also, I have Postgres installed already.

crucialfelix · on March 24, 2013

brew is homebrew and as a new mac user you will want and adore it.

http://mxcl.github.com/homebrew/

its a package manager, but more than that it has scripts to handle the mac's strange edge cases.

so many packages wasted so many hours of my life before homebrew took over.

dopamean · on March 25, 2013

Thanks a million bud.

taligent · on March 25, 2013

PostgreSQL does NOT have the capacity to scale enormously and to scale at all is far more complex than products like Cassandra or Riak.

http://labs.spotify.com/tag/postgresql/

And MySQL Cluster is a cohesive, built-in, well supported version for scaling MySQL. PostgreSQL has no equivalent.

jacques_chester · on March 25, 2013

It's true that MySQL is ahead on the tickbox features here.

But the PostgreSQL team are doing their usual slow, stepwise refinement approach to implementing these features from the primitives and moving up. I expect that in a few versions they'll be at sufficient feature parity with MySQL on this front that anyone who cares enough is the sort of person who decides between Oracle RAC and Teradata.

While that will be good and I look forward to it, I will miss seeing you turn up in these threads like a bad penny.

taligent · on March 25, 2013

And I will definitely miss the weekly "PostgreSQL is all you need and you're stupid to use anything else" threads.

My position is that there a lot of different products out there that cater for different needs and there is no "one size fits all" solution. I can scale Cassandra out to a hundred nodes in minutes on EC2 with no configuration changes. I also have queries in MongoDB that are literally 50x faster than on a SQL database.

jacques_chester · on March 25, 2013

> And I will definitely miss the weekly "PostgreSQL is all you need and you're stupid to use anything else" threads.

I think that for any case where you might be building a new system based on a relational backend, it's basically true (modulo local constraints like "we're an Oracle shop"). The chances that you will need to run a website that needs 100 Cassandra servers any time soon is ... well it's unlikely.

PostgreSQL is a stable, proven workhorse. That's why I like it. My point of view is that you should start with high safety and features and relax those constraints as circumstances demand.

neya · on March 25, 2013

Jacques, Taligent is a well known troll here - Two things you never talk against when Taligent is around - Apple and MongoDB. And he will start poisoning the threads. Just ignore him.

taligent · on March 25, 2013

Very mature neya. Feel free to actually address my points on their merits without resorting to ad hominem attacks. That way everyone wins.

And I comment on lots of different things but have a stronger interests in Apple and Database. So sue me I guess.

ibotty · on March 25, 2013

please show these 50x faster queries. really. i am curious.

neya · on March 25, 2013

Exactly, my thoughts!

insaneirish · on March 25, 2013

I'd love to know how many nodes you'd need on a virtual server that doesn't have horrid I/O performance like EC2.

ldng · on March 25, 2013

"In the beginning of Spotify, when load was lower, PostgreSQL was definitely the right tool for the job."

"Later came Postgres 9 and with it the excellent streaming replication and hot standby functionality. One of the most important database clusters at Spotify, the cluster that stores user credentials (for login), is a Postgres 9 cluster."

Hum, your link kind of defeat your point. That or I did not understand the point you were trying to make.

dude_abides · on March 24, 2013

Loved this line: "Why Postgres?" "It is the emacs of databases"

Ziomislaw · on March 24, 2013

psh~ it's the vim of databases ;)

jacques_chester · on March 25, 2013

No, really. It's an emacs. Features upon features.

SQLite might be the vim of databases.

Actually, BDB hash files.

Smudge · on March 25, 2013

And MS SQL Server Compact Edition is the Wordpad of databases.

(I'll just see myself out...)

T-R · on March 24, 2013

Does Heroku Postgres support PLV8 now? In the questions section of the presentation, he doesn't mention it as unsupported, but as recently as Sept 2012, it wasn't [1] - has that changed?

[1] http://stackoverflow.com/questions/12593080/heroku-hosted-po...

craigkerstiens · on March 24, 2013

If you have a newly provisioned Heroku Postgres database try: create extension plv8;

morganwilde · on March 24, 2013

If this was meant to demystify something, then I completely missed it. Since I'm new to PostgreSQL, I was hoping to get a birds eye view from someone that knows his stuff, but what I got was keyword clouds. Could that be anything else but mystifying?

mercurial · on March 24, 2013

That's more a presentation of some not so well-known features and extensions found in Postgres than something for newcomers. In any case, watch the video linked in a comment above instead of reading the slides.

house9-2 · on March 24, 2013

just watched the video - good stuff, he also mentions these two resources at the end - http://www.postgresguide.com/

- http://www.youtube.com/watch?v=3yhfW1BDOSQ (Christophe Pettus "PostgreSQL when it's not your job.")

hackerboos · on March 25, 2013

You have to watch the video. These slides alone mean nothing.

asax · on March 25, 2013

I have spent quite a few years working with MS SQL Server, but have only dabbled with Postgres. Can somebody recommend a good Postgres book or tutorial for people like me?

zorlem · on March 25, 2013

One of the best resources for PostgreSQL is the online documentation - I've hardly seen a F/OSS project with a better written documentation. The only drawback is that it is quite verbose and it's easy to get lost. You might want to start with the official tutorial [1] and might also check the dated "MS-SQL to PostgreSQL Migration" (from 2001) [2] which still provides some nice transition tips.

Make sure to check the "Postgres Guide" [3] which is a recent addition and is excellent to get you started quickly.

The quick intro by Packt Publishing [4] looks useful and concise as well, but is not as good as the Postgres Guide.

Since you're familiar with the RDBMS concepts I would suggest you take a look at the nice "PostgreSQL 9 Admin Cookbook"[5] which has a solid collection of recipes that you can use to learn the specifics of the DB and a bunch of nice features.

[1] http://www.postgresql.org/docs/9.2/static/tutorial.html

[2] http://wiki.postgresql.org/wiki/Microsoft_SQL_Server_to_Post...

[3] http://www.postgresguide.com/

[4] http://www.packtpub.com/article/introduction-to-postgresql-9

[5] http://www.amazon.com/dp/1849510288/

edit: fixed the link to postgres guide.

asax · on March 29, 2013

Thanks for that! Much appreciated!