>MySQL outperformed Postgres. For my part, I thought Postgres was slow until...

larrywright · on Jan 8, 2011

Out of curiosity, what were the things you did to make updates fast?

torial · on Jan 8, 2011

Here's a blog-like post (really a Facebook Note) I made about the differences a while back:

Having the right syntax for databases is critical. Here's the moderately optimized SQL for two DB systems for two different updates:

SQL Server

Update tok Set spid = est.sid FROM @Tok tok INNER JOIN @eST est ON tok.cs = est.cs

For SQL Server -- using the table alias for the update is critical for good performance. Another trick for SQL Server can be to do something like (which is recommended by the SQL Server team -- and it yields excellent speed improvements):

Update tok Set spid = est.sid FROM (SELECT * FROM @Tok) tok INNER JOIN @eST est ON tok.cs = est.cs

Postgres

Update _Tok Set spid = est . sid FROM _Tok tok INNER JOIN _eST est ON tok.cs = est.cs WHERE tok.cs = _Tok.cs

For Posgres, the performance critical line is the addition of the WHERE clause to bind the Update table to the FROM table. It makes sense -- once you understand (from the SQL Server perspective) that it is equivalent to using the alias in the Update clause

kisielk · on Jan 8, 2011

I don't understand your recommendation to get rid of views when possible.

There is nothing particularly slow about views in Postgres, nor is there anything particularly optimized about them. They act mostly like any other query, except they are available to you like a table is.

A view is only as good as the query you use to define it.

torial · on Jan 8, 2011

It could be my understanding of what was going on was incorrect, but I'll tell you what I observed, and please let me know if there was a better way to fix it. I found the easiest way to do so was just to avoid using Views. And frankly, wasn't surprised by this, as using UDFs w/ SQL Server also had really bad perf, so I didn't question that a particular feature was badly performing.

Ok, lets say you have two large tables: Sentences and Annotations, and you make a view that joins those, lets call it vwSentencesDetailed. Then you do a query like:

select * from vwSentencesDetailed where srcID = _srcID

Where srcID is an indexed field in the Sentences table that will let you get a narrow subset of the data. Let's say the particular file it came from....

The performance characteristic I observed was that the it behaved like (loosely):

select * from (Select * from Sentences s Inner Join Annotations a on a.sentID = s.sentID) vw where srcID = _srcID

rather than like:

select * from (Select * from Sentences s Inner Join Annotations a on a.sentID = s.sentID where srcID = _srcID) vw

I believe that SQL Server makes such a performance optimization, but Postgres does not.

kisielk · on Jan 8, 2011

It's hard to say what the problem was without seeing the actual definition of the tables and the output of EXPLAIN ANALYZE.

There are times where you may have to structure your view query slightly different than you would a standard query in order to get the planner to optimize things in the same way, particularly if the view is using subqueries.

Ultimately it is possible to have a view run just as fast a regular query, but you might (though not always) have to do a little bit of extra work with EXPLAIN ANALYZE to make sure things are working as you expect.

chunkbot · on Jan 7, 2011

I'm liking the new trend of making databases "fast" by default, with "safe" configurations. People who care about safety should take the time make it so; people who want it fast, well, give it to them faster! So maybe PostgreSQL should offer faster defaults?

hgimenez · on Jan 8, 2011

Not sure what you mean by "safe" by default. Speed and performance have nothing to do with security, if that's what you mean.

The reason that the default Postgres configuration does not perform well is that Postgres runs on a number of operating systems, and it is not possible to make assumptions about the hardware and configuration of the host. For example, increasing configuration parameters like shared_buffers to any useful value will require you to tweak the OS'es kernel resources. The Postgres documentation does a great job at describing the process on a number of operating systems:

http://www.postgresql.org/docs/current/interactive/kernel-re...

andrewf · on Jan 8, 2011

Plenty of software manages to share information between processes without using SysV shared memory, and therefore being subject to these kernel resource limits.

I'm not saying this should necessarily be fixed - I'm sure the Postgres team has better things to be working on.

But it's not a natural, unavoidable limitation, either.

dstorrs · on Jan 8, 2011

It's a design decision. MySQL chose "fast defaults". PostgreSQL chose "safe defaults". If you want the other choice, use the other database.