The internals of PostgreSQL

elvinyung · on Jan 26, 2017

A really cool paper to read is The Design of POSTGRES [1] by Stonebraker et al. It's dated by its mentions of things like POSTQUEL, but it's still really interesting to read about the early design of seminal features like the extensible ADT system, from a time when it was still innovative.

1: http://db.cs.berkeley.edu/papers/ERL-M85-95.pdf

combatentropy · on Jan 26, 2017

I would like to read through this when I'm not so tired. Thank you!

From what I can tell, PostgreSQL is sensibly structured. At least, it seems better than MySQL. The hackiest thing I have heard so far about MySQL is that if you run the exact same SQL text more than once, it will fetch the result from cache. If you run a differently worded query, it will skip the cache, even if the query were to bring back the exact same data.

This is different than what I remember about PostgreSQL (correct me if I'm wrong). I remember reading in some book that PostgreSQL just let its rows become memory pages, and if a query resulted in data already in memory, then it got it from memory, otherwise it got it from disk. No need to make sure you didn't insert a space or something into your query the second time.

znep · on Jan 26, 2017

While that is true and the mysql query cache definitely became a scaling issue as machines got bigger with lock contention etc., postgres doesn't even have a similar feature period.

Suppose you have a query that is doing a group by on a low cardinality field (eg. state) on a hundred million row table. Large amount of data in, small amount out. postgres has to actually look at all the rows to re-run the query. The mysql query cache only has to pull the tiny number of results out of the cache for that specific query.

This is a dangerous feature because there is a super slippery slope as traffic increases... any change to any of the tables invalidates the cache when you need caching most, and hits all queries against those tables at the same time. But if used well in an environment with many copies of the data (replicated or otherwise) where you could manage when updates happened, it can be a lifesaver.

InnoDB also has its own buffer pool that caches rows and is in general terms similar to the postgres buffer cache.

It is fair to say that mysql has a lot more features like this that are "sometimes life-saving hacks that can bite you hard if you don't understand them" and postgres is much more restrained in ensuring the features are a little more general and thought through.

brianwawok · on Jan 26, 2017

MySQL needs the cache more often though with a limit of one index per query. Postgres seems like it more often can combine a few indexes and turn what scans 100k rows in MySQL into bitmasking two indexes and pulling 10 rows.

Seems sensible, each has features to support the base design.

morgo · on Jan 26, 2017

The limit of one index per query was lifted in 2005, with mysql 5.0.

brianwawok · on Jan 26, 2017

It's now one per table. Poor word choice by me, sorry.

morgo · on Jan 26, 2017

https://dev.mysql.com/doc/refman/5.5/en/index-merge-optimiza...

znep · on Jan 26, 2017

True point (disclaimer: don't know how mysql has evolved in the past 5 years).

Fits with my general feeling of "if you want to optimize for a super specific use case, sometimes mysql can provide some cool tricks for that use case that can be amazing" but if you want a more general solution if you need more flexibility or don't have the knowledge or don't know your future needs, postgres shines.

maxlybbert · on Jan 26, 2017

A few years ago, PostgreSQL added a feature where if (1) you ran a query (and there was no "order by" clause); and (2) before it finished, you started an identical query, the second query would simply piggyback off of the first -- until the first finished. After the original query finishes, the second query will retrieve any rows it missed.

When they added this optimization, they issued a warning about it, because identical queries without "order by" clauses will return the same results, but without a guarantee of how they're sorted.

jorgeleo · on Jan 26, 2017

"While that is true and the mysql query cache definitely became a scaling issue as machines got bigger with lock contention etc., postgres doesn't even have a similar feature period."

I am sorry, but I rather not have a feature than having a feature that is questionably implemented.

sqldba · on Jan 26, 2017

SQL Server does all of these things (for comparison).

It compiles queries and stores them in a cache. In many cases it can parameterise simple ones so even if the exact text changes it will pull back the matching plan (and it stores all this data so you can analyse later what plans were compiled, what was reused, which ones were similar, which were recompiled later - though hardly anyone deep dives into it because it's a tar pit of info and why bother instead of just focusing on worst cases).

Also when the query pulls data into memory it stays there in the buffer. If the rows change the buffers aren't discarded, they're flushed to disk (along with the log records). So if your query pulls back the exact same data then it's from memory sure but it's not stale or anything.

I don't consider any of these problems so I'm surprised how they could be interpreted as bad in other products. I don't understand enough about them to say but it's fun to discus.

znep · on Jan 26, 2017

From reading what you wrote, what I think you are talking about is statement caching where it can cache the "how do I run this" but still has to read each row (or indexed row) from the buffer cache.

The mysql query cache actually caches the results of the query, see https://dev.mysql.com/doc/refman/8.0/en/query-cache.html so it doesn't even need to look at the rows at all.

SQL Server may have a similar thing now, I'm not up to date on it, but wanted to call out the distinction based on how you described it.

No question, SQL Server has a lot of awesome qualities for the right use case and has come a long long way since the Sybase days.

brightball · on Jan 26, 2017

SQL Server does some great things for the data that's in it but if you ever want to get the data out (Elastic Search / Hadoop / Kafka / etc) you are in for a world of pain (and bills). The lock in alone is reason enough to steer clear of it for virtually all use cases.

As you start running into scaling issues and need to search for solutions that are faster to implement you'll reach for Enterprise and then for more cores and before you know it you're stuck with a $16,000 / core scaling cost that you can't avoid because you can't get your data out to offload things easily.

Dealing with SQL Server on a customer facing system is the stuff of nightmares.

dhd415 · on Jan 26, 2017

It's unclear what you mean about the difficulties of getting data out of SQL Server since it's really no harder or easier than getting data out of any other RDBMS. Scaling SQL Server, just like scaling any other RDBMS, can be challenging since traditional RDBMSs don't distribute well. While some workloads are CPU-limited, that is not true for all workloads so the per-core licensing model of SQL Server (which ~$7k per core for 2016 Enterprise edition, not $16k) is not always prohibitive if your workload is more disk or RAM-intensive. There are points of differentiation between SQL Server, PostgreSQL, and MySQL that might make one better suited to a particular workload than others, but those situations are dependent on the details of a particular workload and something as general as "getting data out" or "on a customer-facing system" aren't specific enough to identify any one of those as more suitable than another.

brightball · on Jan 26, 2017

https://www.microsoft.com/en-us/sql-server/sql-server-2016-p...

Enterprise per core: $14,256

dhd415 · on Jan 26, 2017

That's the price for a 2-core license pack.

brightball · on Jan 26, 2017

The notation shows per core and says sold as a 2 core pack. Does that make the actual price $28k for the pack or $14k for 2 cores. The way they list it is not particularly clear.

dhd415 · on Jan 26, 2017

The SQL Server licensing model is "per core", but the minimum licensing increment is 2 cores which costs (2 * $7k = $14k). It's obviously a confusing pricing scheme, but try pricing out Oracle some time!

treebeard901 · on Jan 26, 2017

Aside from the cost aspect of commercial software, this is false. It is no more difficult to feed your data to another system than it is with any other relational databases. There is certainly no bill involved with exporting your data. In fact, SQL Server includes the tools you need to do this with software like SQL Server Integration Services. SQL Server has its own implementation of the ANSI SQL standard but this is true of every database system. The majority of it is compatible, it's only when you use certain functions that you have to consider porting them. SQL Server is the database used for many customer facing applications and there is often no nightmare involved.

brightball · on Jan 26, 2017

With a PostgreSQL foreign data wrapper, you can actively push data into other data sources at the transactional level. You have pub sub functionality built in to let outside systems listen and react to data changes as well.

With SQL Server your options are variations of polling the database. We even tried posing the question of utilizing Elastic Search to developers at the PASS Summit recently and were widely met with groans of "Yea...that's a pain to do with SQL Server. You're basically out of luck."

For an internal standalone system it's great. For moving data into it do do analytics, great. For anything close to a real time web facing system with a growth pattern...absolute nightmare.

treebeard901 · on Jan 26, 2017

This is an impressive attempt to backup the original statements about SQL Server. It sounds like you may not be all that familiar with how the "Foreign Data Wrappers" work in Postgres and you are not familiar with the equivalent in SQL Server: Linked servers. Both implementations use OLEDB or ODBC drivers to provide transactional consistency to external applications.

gaius · on Jan 26, 2017

polling the database

Not true https://msdn.microsoft.com/en-us/library/t9x04ed2(v=vs.110)....

infogulch · on Jan 26, 2017

> if you ever want to get the data out ... you are in for a world of pain

What do you mean by "get data out"? I've not had any issues getting rows out of SQL Server to be indexed externally.

morgo · on Jan 26, 2017

This is only true for one specific kind of cache (query cache) which has been disabled by default for 4 years now.

It works well at cheating benchmarks, and masking un-optimized queries. It is not something that I would recommend.

crudbug · on Jan 26, 2017

I have been reading about Postgres architecture. The modular design enables - microkernel-like API where different languages can be integrated seamlessly.

The only thing, I am missing are incremental materialized views.

flaviuspopan · on Jan 26, 2017

Have you check out PipelineDB yet? They build on Postgres and include continuous views, aggregations, joins, etc.

http://docs.pipelinedb.com/continuous-views.html

crudbug · on Jan 26, 2017

From my understanding, PipelineDB is a fork not an extension for PG. I liked what CitusData did - You download PG and add Citus Extension.

grammr · on Jan 26, 2017

(Hi! I'm one of the founders of PipelineDB)

PipelineDB will be a standard PostgreSQL extension[0] by release 1.0.0. Currently we're about to release 0.9.7, and each release after that will incrementally factor out PipelineDB into a completely independent PostgreSQL extension.

That's the plan, anyways :) With https://www.stride.io being rolled out under heavy demand, we've got a lot on our plate!

[0] https://github.com/pipelinedb/pipelinedb/issues/1596

crudbug · on Jan 26, 2017

Thanks. The plan looks promising.

Just checked - stride.io , is it pipelineDB admin / user front end as a service ?

I am assuming there is a lot more going on behind the scenes ? Can you talk about the architecture.

grammr · on Jan 26, 2017

Stride is a realtime analytics API built for scale and performance. It allows you to construct networks of continuous processing nodes that either materialize results (think high-throughput aggregation) or fire webhooks ("when this aggregate exceeds this value, POST to a url"). Each node is represented by a continuous SQL query, and nodes can be chained together by "tailing" other nodes.

These processing nodes can be queried and joined on (with SQL) at any time to easily power realtime dashboards and other analytical applications. Check out the original announcement [0] and the Stride docs [1] for more detail.

And to answer your question, no, Stride is not simply a frontend for PipelineDB. While it does use PipelineDB and PipelineDB Cluster extensively, it also uses other systems to provide unique capabilities, and all of this is ultimately synergized behind a dead-simple HTTP API.

We've found that a hosted database isn't actually all that interesting for users nowadays, and near impossible to build a business around because they've essentially become a cheap commodity. So we aim to deliver maximum value to users by nailing one use case (realtime analytics) with a highly focused API that eliminates most of the complexity and decision points you'd encounter when trying to do the same with a generic database.

[0] https://www.pipelinedb.com/blog/announcing-stride-a-realtime...

[1] https://www.stride.io/docs

BenoitP · on Jan 26, 2017

+1 for incremental materialized views.

There are workarounds for emulating it with triggers, though. I have found the following discussion to have helped in a problem I had:

https://hashrocket.com/blog/posts/materialized-view-strategi...

makoz · on Jan 26, 2017

What resources have you been using to familiarize yourself with Postgres?

qaq · on Jan 26, 2017

I would highly recommend talks by Bruce Momjian https://momjian.us/ he has separate talks on pretty much everything related to PG + All The Dirt On Vacuum talk by Jim Nasby https://www.youtube.com/watch?v=L8nErzxPJjQ

crudbug · on Jan 26, 2017

[0] Online documentation - Internals - https://www.postgresql.org/docs/9.6/static/internals.html

[1] Wiki - https://en.wikibooks.org/wiki/PostgreSQL/Architecture

algesten · on Jan 26, 2017

you mean you want the materialized view to refresh automatically? or what does "incremental" mean in this context? (sorry I'm no DB-person)

Godel_unicode · on Jan 26, 2017

I'm not OP, but I assume that phrase is in reference to the Oracle capability to not need to completely re-run the underlying query in order to refresh a materialized view.

Here's the Oracle article about materialized view refresh, there are further details about the different schemes for incremental refresh within. Briefly, it's a method for refreshing by looking at the deltas since last refresh.

https://docs.oracle.com/database/121/DWHSG/refresh.htm#DWHSG...

crudbug · on Jan 26, 2017

Yes, view refresh with only delta changes.

merb · on Jan 26, 2017

at least we can use 'CONCURRENTLY' since 9.4 or so, which is already a bonus. and if the data gets to big it's better to use elasticsearch, where you push single objects to it. sadly it requires way more work inside a application and of course more ops work.

alexwebb2 · on Jan 26, 2017

This is now returning a 403 Forbidden.

etatoby · on Jan 26, 2017

Obligatory Archive.org link:

http://web.archive.org/web/20170126103158/http://www.interdb...

Also a reminder to everybody to give them a few $CURRENCY if you can. Archive.org offers a valuable service and are very underfunded.

mrkgnao · on Jan 26, 2017

The (intentional?) pun in $CURRENCY made me laugh.

pasbesoin · on Jan 26, 2017

Chapters >= 5 are apparently not in archive.org but are in Google's cache.

diyseguy · on Jan 26, 2017

also returning 404s, at least for chapters 5 and 6

shx8 · on Jan 26, 2017

I'm the author of this document. Now, my hosting server denies accessing from foreign countries because of too many accesses. Please wait for a several hours.

Jimmed · on Jan 26, 2017

Hug of death, methinks.

justinclift · on Jan 26, 2017

Yep, same here.

Guessing they're experiencing too much traffic or something, so this is their way to throttle it. ;)

sqldba · on Jan 26, 2017

It looks GREAT but...

I always see these things posted while they're still being written and then I never go back to see them when they're completed. And nobody posts when it's completed.

I'd prefer the posting not be done until it's finished, or, that there be an "email me when it's finished" button.

snoman · on Jan 26, 2017

If you can't be bothered to set a reminder, subscribe to the RSS feed, create an ifttt trigger or anything like that... I don't know what to tell you. There's lots of solutions to this predicament. This one's on you.

mickronome · on Jan 26, 2017

I don't think a clearly expressed opinion does warrant the kind of response you gave here. There is little of substance in your comment except questioning the posters character by implying he/she is either lazy or to stupid.

A tip on how to use IFTTT quickly and easily for this particular task would probably be a more constructive response. I for one don't know how to use IFTTT to reliably detect changes in any webpage, I thougt it was mostly for different api services and the like, as those are the only use-cases I've heard about.

hordeallergy · on Jan 26, 2017

As far as not having to read things twice, draft and final, I'd have to lean towards @sqldba's position.

vog · on Jan 26, 2017

There is also an interesting middleground named Evolving Publication where the author publishes section by section, and the RSS feed informs you whenever a new section/chapter is added to the article:

https://martinfowler.com/bliki/EvolvingPublication.html

So the RSS feed is not per article, but per article part.

karmacoda · on Jan 26, 2017

That is interesting, ta, although more work for the author.

icebraining · on Jan 26, 2017

I, OTOH, would rather know about it and start to take advantage of what's already available, rather than wait for the finish, if and when it ever comes.