I wish this wasn't a video. When someone gives a talk, rather than just posting the video, won't someone at least post the slides? Or better yet, a transcript?
And look at that - a 35 minute video is compressed into 1 minute of reading and I bet, thanks to the summary, I obtained 75% of the relevant info that I would have got if I had watched the video in it's entirety.
Man you are awesome, thank you. I hate video for these kinds of things, it's like watching paint dry when I could have speed-read through a transcript in a tenth the time it takes to watch a video.
I don't know if I'm an edge case here, but if not there might be some kind of opportunity there.
That occurred to me earlier. There's a lot of people like us (I think) who really just want to accumulate as much "80/20" information as possible and don't want multi-media.
Call it, "The wiki for getting to the fucking point".
Very nice, thank you. Mostly all of this is good advice and correlates nicely with my experience.
One thing though:
"EBS volumes and Software RAID is best but scary on AWS"
I've managed an EBS RAID10 database for a few years now. I wouldn't touch this with a 10 foot pole.
Do yourself a favor, set up an m1.xlarge (or bigger) instance, put the ephemeral drives in a RAID0 and mirror across multiple machines using hot-standby, slony, londiste, or some other tool. You'll be much happier, your system will perform much better, and you'll have a failover strategy in place.
I dont understand this - is there a difference between an explicitly allocated EBS volume and the ephemeral volumes of an EBS-backed instance ?
Or are you focusing on the RAID10 part - but then everywhere RAID10 is touted to be the best RAID solution (right balance between performance and safety)
EBS is a shared networked SAN. The performance characteristics of it are not that great and even worse, highly variable. The last thing you want to be running your database on is a system where the performance characteristics vary greatly throughout the day and you have no control over it.
The ephemeral drives are drives directly attached to the server and to the best of my knowledge are not a shared resource. Their performance characteristics are highly consistent, but if your server goes down all data on those drivers are lost.
EBS sounds nice in theory, but by going to EBS RAID you throw away most of its benefits (such as snapshotting) and take on it's worst aspects.
Absolutely. You should always be spreading your data across multiple availability zones and where feasible across multiple data centers and S3 is a great place to store your wal logs. We do the same thing.
I'm having a really hard time refraining from writing a really snarky "Switch to MySQL" type response, not because I think that MySQL outperforms PostgreSQL, but because there is always somebody on every MySQL article discussion who does this. Ok, that's off my chest, sorry to rant.
I enjoyed this video, even though I'm not a PostgreSQL guy, as it has a lot of good generic info. The advice about XFS and noatime was right on the money, although I don't think he strongly stated enough why you don't want to run EBS volumes over software RAID-0. In my experience this is a really bad idea, because just one of those EBS volumes getting picking up a laggy await (happens pretty frequently with volumes I've seen in the wild) will drag down the performance of the entire array. Also I'm told that the old "RAID-5 is always bad" notion is considerably more nuanced nowadays, and that many of the earlier OLTP write performance problems with RAID-5 have been largely mitigated with modern RAID controllers.
" Also I'm told that the old "RAID-5 is always bad" notion is considerably more nuanced nowadays, and that many of the earlier OLTP write performance problems with RAID-5 have been largely mitigated with modern RAID controllers."
Not to be snarky, but do you have a source for that? RAID5 has a pretty big intrinsic write penalty.
"I'm having a really hard time refraining from writing a really snarky "Switch to MySQL" type response, not because I think that MySQL outperforms PostgreSQL, but because there is always somebody on every MySQL article discussion who does this. Ok, that's off my chest, sorry to rant."
I can think of good reasons to switch from mysql to postgres though, and I've a hard time coming up with reasons to switch from postgres to mysql (which is something else all together than choosing mysql over postgres in the first place - although even there, for serious deployments, the only real reasons I can see are in the "it's what we know" department).
Long time Postgres user so I may be a little biased ;-), but still.
This gave me the idea to try http://pgfouine.projects.postgresql.org he mentioned on my logs, and I'm already finding a bunch of stuff to fix after a quick run.