Hacker News new | past | comments | ask | show | jobs | submit login
Secrets of PostgreSQL Performance (DjangoCon) (blip.tv)
111 points by mace on Sept 21, 2011 | hide | past | favorite | 27 comments



I wish this wasn't a video. When someone gives a talk, rather than just posting the video, won't someone at least post the slides? Or better yet, a transcript?


We have the video + links to slides and a liveblog on the Lanyrd page: http://lanyrd.com/2011/djangocon-us/shbqg/


Future feature for jtv? ;-)


Your wish is my command sir. I am hard of hearing so I try to transcribe talks. This one is more abbreviated than my last one.

Secrets of Postgres Performance

Measure Everything Before and After

Do not do Dumb Things

Cache Everything

Watch your query counts

Tuning Parameters

shared_buffers: set to 25% of available ram, move up/down 5% to find sweet spot

effective_cache_size: planning hint that tells PG how much RAM it can expect for OS disk cache

work_mem: per process amount of ORDER BY space

Other Parameters

wal_buffers: Set to 16MB and forget it

checkpoint_segments: Increase to at least 10

maintenance_work_mem: 50MB for every GB of RAM

synchronous_commit: turn off with data loss risks

Disks. Lots of disks.

Not all disks are created equal. Faster disks make a huge difference.

Configuration of disks matters a lot. RAID-5 is bad. RAID-1+0 is good.

Separate out the WAL onto its own disk. Can 4x write performance.

Worry about CPU after your disk/RAID/filesystem configurations are more or less optimal.

Use pgbouncer to pool connections

Tablespaces

Consider putting archive or legacy data onto slower drives/volumes

Separate your data and indices onto different volumes if possible

This applies even on SANs

Best of both worlds with Rackspace and Cloud Connect

EBS volumes and Software RAID is best but scary on AWS.

Choice of filesystem matters. No journaled filesystems for your WAL, it is a journal.

Use XFS for maximum performance.

USE noatime!

Avoid queries in loops.

Count() is slow


Awesome - thanks!

And look at that - a 35 minute video is compressed into 1 minute of reading and I bet, thanks to the summary, I obtained 75% of the relevant info that I would have got if I had watched the video in it's entirety.


You're welcome, hope it helped.

I really despise non-textual media.


I can hear and see perfectly. And I love concise text data. Long videos, audio or long documents with a lot of blabla sucks. Thank you.


Man you are awesome, thank you. I hate video for these kinds of things, it's like watching paint dry when I could have speed-read through a transcript in a tenth the time it takes to watch a video.

I don't know if I'm an edge case here, but if not there might be some kind of opportunity there.


I also hate videos; they are too slow to get to the point.

With text, at least I can speed read and ignore stuff that is not important to me.


That occurred to me earlier. There's a lot of people like us (I think) who really just want to accumulate as much "80/20" information as possible and don't want multi-media.

Call it, "The wiki for getting to the fucking point".


Tragically, gettothefuckingpoint.com is taken by a squatter. What a pity.


Workin' on it.


Very nice, thank you. Mostly all of this is good advice and correlates nicely with my experience.

One thing though:

"EBS volumes and Software RAID is best but scary on AWS"

I've managed an EBS RAID10 database for a few years now. I wouldn't touch this with a 10 foot pole.

Do yourself a favor, set up an m1.xlarge (or bigger) instance, put the ephemeral drives in a RAID0 and mirror across multiple machines using hot-standby, slony, londiste, or some other tool. You'll be much happier, your system will perform much better, and you'll have a failover strategy in place.


I dont understand this - is there a difference between an explicitly allocated EBS volume and the ephemeral volumes of an EBS-backed instance ?

Or are you focusing on the RAID10 part - but then everywhere RAID10 is touted to be the best RAID solution (right balance between performance and safety)


EBS is a shared networked SAN. The performance characteristics of it are not that great and even worse, highly variable. The last thing you want to be running your database on is a system where the performance characteristics vary greatly throughout the day and you have no control over it.

The ephemeral drives are drives directly attached to the server and to the best of my knowledge are not a shared resource. Their performance characteristics are highly consistent, but if your server goes down all data on those drivers are lost.

EBS sounds nice in theory, but by going to EBS RAID you throw away most of its benefits (such as snapshotting) and take on it's worst aspects.


Ephemeral volumes are shared, just not nearly as variable.


Yes, there is a difference between EBS volumes and ephemeral volumes of an EBS-backed instance.


And what happens when the AZ goes down? Don't you lose data then? Or do you mean multiple machines across AZ/region?


You'll want the mirror in a different AZ. Or you could also put it in a different region.

Use Wal-E to push wal logs to s3.


Absolutely. You should always be spreading your data across multiple availability zones and where feasible across multiple data centers and S3 is a great place to store your wal logs. We do the same thing.



I'm having a really hard time refraining from writing a really snarky "Switch to MySQL" type response, not because I think that MySQL outperforms PostgreSQL, but because there is always somebody on every MySQL article discussion who does this. Ok, that's off my chest, sorry to rant.

I enjoyed this video, even though I'm not a PostgreSQL guy, as it has a lot of good generic info. The advice about XFS and noatime was right on the money, although I don't think he strongly stated enough why you don't want to run EBS volumes over software RAID-0. In my experience this is a really bad idea, because just one of those EBS volumes getting picking up a laggy await (happens pretty frequently with volumes I've seen in the wild) will drag down the performance of the entire array. Also I'm told that the old "RAID-5 is always bad" notion is considerably more nuanced nowadays, and that many of the earlier OLTP write performance problems with RAID-5 have been largely mitigated with modern RAID controllers.


  " Also I'm told that the old "RAID-5 is always bad" notion is considerably more nuanced nowadays, and that many of the earlier OLTP write performance problems with RAID-5 have been largely mitigated with modern RAID controllers."
Not to be snarky, but do you have a source for that? RAID5 has a pretty big intrinsic write penalty.


"I'm having a really hard time refraining from writing a really snarky "Switch to MySQL" type response, not because I think that MySQL outperforms PostgreSQL, but because there is always somebody on every MySQL article discussion who does this. Ok, that's off my chest, sorry to rant."

I can think of good reasons to switch from mysql to postgres though, and I've a hard time coming up with reasons to switch from postgres to mysql (which is something else all together than choosing mysql over postgres in the first place - although even there, for serious deployments, the only real reasons I can see are in the "it's what we know" department).

Long time Postgres user so I may be a little biased ;-), but still.


This gave me the idea to try http://pgfouine.projects.postgresql.org he mentioned on my logs, and I'm already finding a bunch of stuff to fix after a quick run.


Anyone know of an available transcript?


Yes, see my response to emmett.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: