Hacker News new | past | comments | ask | show | jobs | submit login
PostgreSQL performance considerations (thoughtbot.com)
76 points by mattyb on Jan 7, 2011 | hide | past | favorite | 32 comments



Here are a few extra tips:

- "select relname, indexrelname, idx_scan from pg_stat_user_indexes" will show how many times an index is used, so you can remove unused indexes

- you can create indexes without locking tables (ie, while your site is up) by using "create index concurrently"

- you should use the autovaccuum daemon instead of manually running vacuum analyze in most cases


This is particularly important if you are working on Heroku. I just made the switch from MySQL to Postgres and while they both run SQL, they have numerous syntactical and performance considerations that need to be addressed so you can transition into the Heroku cloud easier.

It's definitely worth considering for 9 and up, as at least when I was in grad school for databases (~2008), MySQL outperformed Postgres.


>MySQL outperformed Postgres.

For my part, I thought Postgres was slow until 5 months ago (when I came from a port from SQL Server).

Then I learned something! The default install of Postgres is slow (at least on Windows). And on benchmarks against the baseline SQL Server port I had, it was about 2x slower. BUT there were a number of things I could do to make it faster: a) Use the EnterpriseDB wizard for tuning Postgres --> resulted in a 2x to 3x performance improvement. b) Adjust the way I did updates (there is a way to get Postgres updates to be fast, it just takes some massaging) c) Get rid of views when possible -- I found that certain query compilation optimizations I had taken for granted on SQL Server weren't available.

Doing all this resulted in a system that performed faster and w/ less RAM than SQL Server(and I had spent a bunch of time tuning the SQL Server perf in the first place!)


Out of curiosity, what were the things you did to make updates fast?


Here's a blog-like post (really a Facebook Note) I made about the differences a while back:

Having the right syntax for databases is critical. Here's the moderately optimized SQL for two DB systems for two different updates:

SQL Server

Update tok Set spid = est.sid FROM @Tok tok INNER JOIN @eST est ON tok.cs = est.cs

For SQL Server -- using the table alias for the update is critical for good performance. Another trick for SQL Server can be to do something like (which is recommended by the SQL Server team -- and it yields excellent speed improvements):

Update tok Set spid = est.sid FROM (SELECT * FROM @Tok) tok INNER JOIN @eST est ON tok.cs = est.cs

Postgres

Update _Tok Set spid = est . sid FROM _Tok tok INNER JOIN _eST est ON tok.cs = est.cs WHERE tok.cs = _Tok.cs

For Posgres, the performance critical line is the addition of the WHERE clause to bind the Update table to the FROM table. It makes sense -- once you understand (from the SQL Server perspective) that it is equivalent to using the alias in the Update clause


I don't understand your recommendation to get rid of views when possible.

There is nothing particularly slow about views in Postgres, nor is there anything particularly optimized about them. They act mostly like any other query, except they are available to you like a table is.

A view is only as good as the query you use to define it.


It could be my understanding of what was going on was incorrect, but I'll tell you what I observed, and please let me know if there was a better way to fix it. I found the easiest way to do so was just to avoid using Views. And frankly, wasn't surprised by this, as using UDFs w/ SQL Server also had really bad perf, so I didn't question that a particular feature was badly performing.

Ok, lets say you have two large tables: Sentences and Annotations, and you make a view that joins those, lets call it vwSentencesDetailed. Then you do a query like:

select * from vwSentencesDetailed where srcID = _srcID

Where srcID is an indexed field in the Sentences table that will let you get a narrow subset of the data. Let's say the particular file it came from....

The performance characteristic I observed was that the it behaved like (loosely):

select * from (Select * from Sentences s Inner Join Annotations a on a.sentID = s.sentID) vw where srcID = _srcID

rather than like:

select * from (Select * from Sentences s Inner Join Annotations a on a.sentID = s.sentID where srcID = _srcID) vw

I believe that SQL Server makes such a performance optimization, but Postgres does not.


It's hard to say what the problem was without seeing the actual definition of the tables and the output of EXPLAIN ANALYZE.

There are times where you may have to structure your view query slightly different than you would a standard query in order to get the planner to optimize things in the same way, particularly if the view is using subqueries.

Ultimately it is possible to have a view run just as fast a regular query, but you might (though not always) have to do a little bit of extra work with EXPLAIN ANALYZE to make sure things are working as you expect.


I'm liking the new trend of making databases "fast" by default, with "safe" configurations. People who care about safety should take the time make it so; people who want it fast, well, give it to them faster! So maybe PostgreSQL should offer faster defaults?


Not sure what you mean by "safe" by default. Speed and performance have nothing to do with security, if that's what you mean.

The reason that the default Postgres configuration does not perform well is that Postgres runs on a number of operating systems, and it is not possible to make assumptions about the hardware and configuration of the host. For example, increasing configuration parameters like shared_buffers to any useful value will require you to tweak the OS'es kernel resources. The Postgres documentation does a great job at describing the process on a number of operating systems:

http://www.postgresql.org/docs/current/interactive/kernel-re...


Plenty of software manages to share information between processes without using SysV shared memory, and therefore being subject to these kernel resource limits.

I'm not saying this should necessarily be fixed - I'm sure the Postgres team has better things to be working on.

But it's not a natural, unavoidable limitation, either.


It's a design decision. MySQL chose "fast defaults". PostgreSQL chose "safe defaults". If you want the other choice, use the other database.


> MySQL outperformed Postgres.

Although a statement like this is too general to mean much of anything, I doubt that MySQL with innodb (which should be used for most cases) ever bested Postgres in overall performance.


Indeed, (My)ISAM isn't really comparable to Postgres with regard to features anyway (transactions, for one), even if it is faster that's not a fair comparison. Only InnoDB is...


DBMSs are complex things. Any DBA proficient in both systems can show you a query (and table layout) in PostgreSQL that outperforms InnoDB and vice versa.


And with MySQL you throw out ACID or any type of consistency ...


InnoDB is a high-reliability and high-performance storage engine for MySQL. Starting with MySQL 5.5, it is the default MySQL storage engine. Key advantages of InnoDB include:

Its design follows the ACID model, with transactions featuring commit, rollback, and crash-recovery capabilities to protect user data.

http://dev.mysql.com/doc/refman/5.5/en/innodb-storage-engine...

Edit: swapped out Wikipedia for MySQL's official docs.


It may claim to be ACID compliant, however:

If I have a varchar(2) and I insert 3 characters in that field - I should expect that to error, not warning and silently truncate (which isn't picked up if you are doing inserts on the application level). I know there are options to turn it into an error - but, that is not the default.

Also, if i screw up a create table, inside a transaction .. i expect that create table to roll back too.

Both things which postgres does perfectly. Not to mention, plpgsql is way easier to write with less stress than whatever horrible extensions to SQL that MySQL has for procs and triggers.


So, if I'm reading this right, a release of MySQL that came out only a year ago and is still of development quality finally has ACID compliance by default? Or has there been a production-grade release since then?

EDIT: My mistake. Apparently I'm not sufficiently familiar with MySQL's revision number practices. According to this: http://dev.mysql.com/doc/refman/5.5/en/news-5-5-x.html#news-... 5.5.x is a production release; I saw an odd minor version and assumed it was dev, since that's what I'm used to everywhere else. My other questions still stand, though.


That's not really the point though - the different engines provided are for different tasks.

MyISAM for example supports fulltext indexes by default which was/is probably one of the most useful features for web development which explains why MySQL is so popular in that arena.

Whatever default was selected by MySQL is irrelevant - do your research and pick the right tool for the job.


"the different engines provided are for different tasks"

You should think of them more like separate DBMSs that are tied together in one system. The semantics change depending on the storage engine, so they aren't just drop-in replacements.

"MyISAM for example supports fulltext indexes..."

PostgreSQL has supported full text search for a long time.


I didn't claim they are drop in replacements, just like I wouldn't claim you can just jump from MySQL to PostgreSQL.

What I said was you need to do your research and pick the right tool for the job.

I haven't used PostgreSQL for quite a few years but I think it got built-in support for fulltext indexes in v8.3 (April 2008) whereas MySQL had them in v3.23 (April 2001).

A lot of shared hosting providers/package management systems wouldn't have supported/installed the extra extensions to make PostgreSQL support fulltext indexing prior to it being built-in I suspect.

To be clear, I don't care what you or anyone else uses - I was simply pointing out that just because it was the default engine it didn't mean you had to use it!


For what it's worth, AFAIK there is no reason other than development effort, for InnoDB not to have full text search. I am sure it will get it eventually.


MySQL had InnoDB at least in the 4.x branch. 5.5 just made it the default storage engine for when you don't specify an engine type. All sensible developers have been using InnoDB for a long time. Others suffered (I am looking at you, WordPress).


2008 ...


I guess "MySQL is not ACID-compliant", http://www.xtranormal.com/watch/7091415/ is not relevant anymore?


I can vouch for the inner joins blowing outer joins out of the water. Setting up your postgresql.conf appropriate to its environment is helpful as well, http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Serve... has a good deal of explanation.


But they have quite different meaning, one can't just replace OUTER with INNER.

  Table A has m rows
  Table B has n rows
all rows return a match (aka: join on 1=1)

  OUTER JOIN will return m*n rows
  INNER JOIN will return m*n rows
none rows return a match (aka: join on 1=0)

  OUTER JOIN will return m+n rows
  INNER JOIN will return 0 zero rows
INNER join enforces that match has to exist.

OUTER join doesn't.


in my case its about the SQLAlchemy ORM which uses by default OUTER JOIN to load a set of parent objects and their related objects. If the foreign key on the parent is NOT NULL, you can replace the OUTER JOIN with INNER. So yes clearly OUTER and INNER are different but its sometimes the case that the use of OUTER is unnecessary, other times the case that a query can be restructured to not require OUTER (such as, using an EXISTS to check for "no parent rows found" instead of OUTER JOIN and a NULL check).


Put your write ahead logs on a mirrored set of ssds as long as your write patterns can handle a build up of logs before archiving to slower storage. This gets the fsync out of the way so the application will block for a shorter amount of time.

Postures relies heavily on disk cache, so be sure to tune shared memory small enough to allow the kernel to cache commonly used disk pages. ZFS on FreeBSD and OpenSolaris has the ARC as a second level of caching between disk cache and slow disk IO. Te Adaptive Readahead Cache can take advantage of some ssds to up access for your hotspots.


Answering questions on the iPad before I go to sleep is a bad idea. My apologies on the misspellings.


A bunch of these performance tips are equally applicable to any SQL database. MySQL doesn't have partial indexes or GiST, for instance, and it uses ANALYZE TABLE instead of VACUUM ANALYZE, but the rest of the tips apply to it too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: