Internals of PostgreSQL

codetrotter · on Jan 20, 2019

Following the link to the blog of the author I found a blog post titled “postgres cluster management system in GoCardless”

http://www.interdb.jp/blog/pgsql/pg_pacemaker_01/

Which in turn lead me to

https://github.com/gocardless/our-postgresql-setup

And in turn to a fifteen minute talk titled “Zero-downtime Postgres upgrades”

https://www.youtube.com/watch?v=SAkNBiZzEX8

Which was interesting and informative IMO.

After watching that one I was lead further via the YouTube suggested videos to a 45 minute talk titled “Tuning PostgreSQL for High Write Loads”

https://www.youtube.com/watch?v=xrMbzHdPLKM

Which I liked a lot as well.

I think anyone who came here because they are interested in the subject of the OP link might be interested in watching these videos.

Though disclaimer of course: These videos told me things I didn’t know about scaling PostgreSQL, because there is a lot I don’t know about that. But if you already know a lot about that then of course these videos might not be so interesting to you.

asaddhamani · on Jan 20, 2019

Interesting stuff. If anyone wants to make a PR to add this stuff to https://github.com/dhamaniasad/awesome-postgres I'll be happy to accept.

chris_wot · on Jan 20, 2019

This is the thing I love about PostgreSQL, you can find out everything about the database. Oracle is ridiculous - they have something of the range of over a thousand hidden parameters, query tuning their CBO is somewhat of a black art and can change between point releases, and leads to articles like the following:

http://www.dba-oracle.com/art_so_undoc_parms_p2.htm

PostgreSQL, however, has none of these limitations.

billwashere · on Jan 20, 2019

Your comment reminded me of oraguy comment about oracle db

[1] https://news.ycombinator.com/item?id=18442637

forinti · on Jan 20, 2019

Oracle also has serious quality issues. The last patch set I applied had the wrong binaries for Oracle Text. With what they charge, you wouldn't expect that. I've never had such a thing happen with my Postgresql instances. I just don't see how to justify starting out with Oracle today.

gaius · on Jan 20, 2019

Don’t take anything seriously on dba-oracle.com - Burleson is well known in the Oracle community as an expert in self-promotion but his technical skill is mediocre at best.

Jonathan Lewis, Guy Harrison and Craig “the Hammer” Shallahammer are infinitely more experienced and insightful.

cneumuel · on Jan 20, 2019

Right. There are many Oracle internals experts who are presenting at conferences, they are also quite approachable. I'd especially recommend Tanel Poder, several of his troubleshooting sessions are available on youtube.

paulryanrogers · on Jan 20, 2019

IIRC Mysql too can change between point releases in surprising ways. (Though not sure if that practice began before or after Oracle's purchase of it.)

anarazel · on Jan 20, 2019

That's sometimes unavoidable. If there's a clear bug somewhere, fixing it will sometimes also affect queries/plans that weren't visibly affected by the bug. That's especially the case around costing issues - some queries might have regressed noticeably due to a bug since the last major version, but fixing it might affect other queries negatively (in a minor version). Postgres/we try to avoid that, but sometimes that's the most sensible way forward (and I assume the same is true for mysql etc).

vordoo · on Jan 20, 2019

Reminded me that "Postgres is easy to install, administer, maintain, and use... with just a little bit of orientation. This is that orientation."

Christophe Pettus: PostgreSQL Proficiency for Python People - PyCon 2014. https://www.youtube.com/watch?v=0uCxLCmzaG4

jsjohnst · on Jan 20, 2019

Does anyone know if something like this exists for other popular DBs?

aportnoy · on Jan 20, 2019

This only covers one feature of SQLite, but I recommend it nonetheless: https://www.sqlite.org/queryplanner.html

chris_wot · on Jan 20, 2019

The Inside SQL Server series by Microsoft Press is pretty impressive.

billwashere · on Jan 20, 2019

This series is now called Microsoft SQL Server Internals [1]. You can get the latest version - Microsoft SQL Server 2012 Internals (Developer Reference)- from Amazon [2].

[1] https://www.sqlserverinternals.com/publications/

[2] https://www.amazon.com/gp/product/0735658560/ref=as_li_tl?ie...

onezeronine2 · on Jan 20, 2019

As a person who have started in SQL server, I really like the succinct clarity of Postgresql functions and its lightweight installation on the server. Thanks for sharing, this could really help me in the future.

speg · on Jan 20, 2019

Aside: do you still need to use a cache service when using Postgres? Our Django web app forgoes any caching because “Postgres is fast enough” and “has its own cache”.

fleetfox · on Jan 20, 2019

Assuming you mean application level cache you can't really generalize it. In general you don't want to cache anything if you can because cache invalidation is a really hard problem.

aidos · on Jan 20, 2019

I read a comment like this and just desperately want to look at the query log to see what’s being run. Followed by explaining a few offenders to see what the planner is working with. You can get really deep into optimising your dB, but often there are a lot of low hanging fruit to work with that you can make “good enough” in no time at all.

anarazel · on Jan 20, 2019

That really depends on what you're caching. Sometimes e.g. you might want to have local caches on application servers, not because the database would be too slow or couldn't take the load, but because the roundtrip time to the database make page loads too slow if you hit the DB every time.

Or you have some complex ranking/relationship/aggregation calculations - it might computationally be infeasible for the database, as hard as people worked on its efficiency, to calculate such a query on each request...

bpicolo · on Jan 20, 2019

Depends on what you're using the cache for. Postgres is definitely fast enough for reads of a K/V pair at a scale your master node can support for most use cases, but there are cases that you can't suit with postgres alone at-scale, e.g:

1. Needing a consistent (meaning reading off replicas is not fresh enough) data source for reads and writes that happens every request to very high-request site. Think sessions, api rate limiting.

2. Needing a fast store for denormalized data.

brianwawok · on Jan 20, 2019

Generally true

But if some query takes 5 minutes to run and you can’t speed it up, you need to do something.

I cache a very few select queries that don’t need to be 100% accurate as scan a ton of rows.

fiatjaf · on Jan 21, 2019

Postgres is fast enough, but Django isn't.

mohsinhijazee · on Jan 20, 2019

Thank you for posting. Really great resource.

woile · on Jan 20, 2019

Awesome source, does anyone know how to convert it into a mobi? I'd like to send it to my kindle

progval · on Jan 20, 2019

You can get an epub with pandoc. A very crude script to do it:

  wget http://www.interdb.jp/pg/pgsql{01..11}.html http://www.interdb.jp/pg/img/fig-{1..11}-{01..34}.png http://www.interdb.jp/pg/img/fig-4-fdw-{1..7}.png http://www.interdb.jp/pg/img/udc1.jpg
  mkdir img
  mv *.jpg *.png img/
  pandoc -s pgsql{01..11}.html -o internals_pgsql.epub

From there, you should be able to convert it to mobi with Calibre or a similar tool.