Hacker News new | past | comments | ask | show | jobs | submit login

>MySQL outperformed Postgres.

For my part, I thought Postgres was slow until 5 months ago (when I came from a port from SQL Server).

Then I learned something! The default install of Postgres is slow (at least on Windows). And on benchmarks against the baseline SQL Server port I had, it was about 2x slower. BUT there were a number of things I could do to make it faster: a) Use the EnterpriseDB wizard for tuning Postgres --> resulted in a 2x to 3x performance improvement. b) Adjust the way I did updates (there is a way to get Postgres updates to be fast, it just takes some massaging) c) Get rid of views when possible -- I found that certain query compilation optimizations I had taken for granted on SQL Server weren't available.

Doing all this resulted in a system that performed faster and w/ less RAM than SQL Server(and I had spent a bunch of time tuning the SQL Server perf in the first place!)




Out of curiosity, what were the things you did to make updates fast?


Here's a blog-like post (really a Facebook Note) I made about the differences a while back:

Having the right syntax for databases is critical. Here's the moderately optimized SQL for two DB systems for two different updates:

SQL Server

Update tok Set spid = est.sid FROM @Tok tok INNER JOIN @eST est ON tok.cs = est.cs

For SQL Server -- using the table alias for the update is critical for good performance. Another trick for SQL Server can be to do something like (which is recommended by the SQL Server team -- and it yields excellent speed improvements):

Update tok Set spid = est.sid FROM (SELECT * FROM @Tok) tok INNER JOIN @eST est ON tok.cs = est.cs

Postgres

Update _Tok Set spid = est . sid FROM _Tok tok INNER JOIN _eST est ON tok.cs = est.cs WHERE tok.cs = _Tok.cs

For Posgres, the performance critical line is the addition of the WHERE clause to bind the Update table to the FROM table. It makes sense -- once you understand (from the SQL Server perspective) that it is equivalent to using the alias in the Update clause


I don't understand your recommendation to get rid of views when possible.

There is nothing particularly slow about views in Postgres, nor is there anything particularly optimized about them. They act mostly like any other query, except they are available to you like a table is.

A view is only as good as the query you use to define it.


It could be my understanding of what was going on was incorrect, but I'll tell you what I observed, and please let me know if there was a better way to fix it. I found the easiest way to do so was just to avoid using Views. And frankly, wasn't surprised by this, as using UDFs w/ SQL Server also had really bad perf, so I didn't question that a particular feature was badly performing.

Ok, lets say you have two large tables: Sentences and Annotations, and you make a view that joins those, lets call it vwSentencesDetailed. Then you do a query like:

select * from vwSentencesDetailed where srcID = _srcID

Where srcID is an indexed field in the Sentences table that will let you get a narrow subset of the data. Let's say the particular file it came from....

The performance characteristic I observed was that the it behaved like (loosely):

select * from (Select * from Sentences s Inner Join Annotations a on a.sentID = s.sentID) vw where srcID = _srcID

rather than like:

select * from (Select * from Sentences s Inner Join Annotations a on a.sentID = s.sentID where srcID = _srcID) vw

I believe that SQL Server makes such a performance optimization, but Postgres does not.


It's hard to say what the problem was without seeing the actual definition of the tables and the output of EXPLAIN ANALYZE.

There are times where you may have to structure your view query slightly different than you would a standard query in order to get the planner to optimize things in the same way, particularly if the view is using subqueries.

Ultimately it is possible to have a view run just as fast a regular query, but you might (though not always) have to do a little bit of extra work with EXPLAIN ANALYZE to make sure things are working as you expect.


I'm liking the new trend of making databases "fast" by default, with "safe" configurations. People who care about safety should take the time make it so; people who want it fast, well, give it to them faster! So maybe PostgreSQL should offer faster defaults?


Not sure what you mean by "safe" by default. Speed and performance have nothing to do with security, if that's what you mean.

The reason that the default Postgres configuration does not perform well is that Postgres runs on a number of operating systems, and it is not possible to make assumptions about the hardware and configuration of the host. For example, increasing configuration parameters like shared_buffers to any useful value will require you to tweak the OS'es kernel resources. The Postgres documentation does a great job at describing the process on a number of operating systems:

http://www.postgresql.org/docs/current/interactive/kernel-re...


Plenty of software manages to share information between processes without using SysV shared memory, and therefore being subject to these kernel resource limits.

I'm not saying this should necessarily be fixed - I'm sure the Postgres team has better things to be working on.

But it's not a natural, unavoidable limitation, either.


It's a design decision. MySQL chose "fast defaults". PostgreSQL chose "safe defaults". If you want the other choice, use the other database.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: