Hacker News new | past | comments | ask | show | jobs | submit login
The internals of PostgreSQL (interdb.jp)
477 points by adamnemecek on Jan 26, 2017 | hide | past | favorite | 53 comments



A really cool paper to read is The Design of POSTGRES [1] by Stonebraker et al. It's dated by its mentions of things like POSTQUEL, but it's still really interesting to read about the early design of seminal features like the extensible ADT system, from a time when it was still innovative.

1: http://db.cs.berkeley.edu/papers/ERL-M85-95.pdf


I would like to read through this when I'm not so tired. Thank you!

From what I can tell, PostgreSQL is sensibly structured. At least, it seems better than MySQL. The hackiest thing I have heard so far about MySQL is that if you run the exact same SQL text more than once, it will fetch the result from cache. If you run a differently worded query, it will skip the cache, even if the query were to bring back the exact same data.

This is different than what I remember about PostgreSQL (correct me if I'm wrong). I remember reading in some book that PostgreSQL just let its rows become memory pages, and if a query resulted in data already in memory, then it got it from memory, otherwise it got it from disk. No need to make sure you didn't insert a space or something into your query the second time.


While that is true and the mysql query cache definitely became a scaling issue as machines got bigger with lock contention etc., postgres doesn't even have a similar feature period.

Suppose you have a query that is doing a group by on a low cardinality field (eg. state) on a hundred million row table. Large amount of data in, small amount out. postgres has to actually look at all the rows to re-run the query. The mysql query cache only has to pull the tiny number of results out of the cache for that specific query.

This is a dangerous feature because there is a super slippery slope as traffic increases... any change to any of the tables invalidates the cache when you need caching most, and hits all queries against those tables at the same time. But if used well in an environment with many copies of the data (replicated or otherwise) where you could manage when updates happened, it can be a lifesaver.

InnoDB also has its own buffer pool that caches rows and is in general terms similar to the postgres buffer cache.

It is fair to say that mysql has a lot more features like this that are "sometimes life-saving hacks that can bite you hard if you don't understand them" and postgres is much more restrained in ensuring the features are a little more general and thought through.


MySQL needs the cache more often though with a limit of one index per query. Postgres seems like it more often can combine a few indexes and turn what scans 100k rows in MySQL into bitmasking two indexes and pulling 10 rows.

Seems sensible, each has features to support the base design.


The limit of one index per query was lifted in 2005, with mysql 5.0.


It's now one per table. Poor word choice by me, sorry.



True point (disclaimer: don't know how mysql has evolved in the past 5 years).

Fits with my general feeling of "if you want to optimize for a super specific use case, sometimes mysql can provide some cool tricks for that use case that can be amazing" but if you want a more general solution if you need more flexibility or don't have the knowledge or don't know your future needs, postgres shines.


A few years ago, PostgreSQL added a feature where if (1) you ran a query (and there was no "order by" clause); and (2) before it finished, you started an identical query, the second query would simply piggyback off of the first -- until the first finished. After the original query finishes, the second query will retrieve any rows it missed.

When they added this optimization, they issued a warning about it, because identical queries without "order by" clauses will return the same results, but without a guarantee of how they're sorted.


"While that is true and the mysql query cache definitely became a scaling issue as machines got bigger with lock contention etc., postgres doesn't even have a similar feature period."

I am sorry, but I rather not have a feature than having a feature that is questionably implemented.


SQL Server does all of these things (for comparison).

It compiles queries and stores them in a cache. In many cases it can parameterise simple ones so even if the exact text changes it will pull back the matching plan (and it stores all this data so you can analyse later what plans were compiled, what was reused, which ones were similar, which were recompiled later - though hardly anyone deep dives into it because it's a tar pit of info and why bother instead of just focusing on worst cases).

Also when the query pulls data into memory it stays there in the buffer. If the rows change the buffers aren't discarded, they're flushed to disk (along with the log records). So if your query pulls back the exact same data then it's from memory sure but it's not stale or anything.

I don't consider any of these problems so I'm surprised how they could be interpreted as bad in other products. I don't understand enough about them to say but it's fun to discus.


From reading what you wrote, what I think you are talking about is statement caching where it can cache the "how do I run this" but still has to read each row (or indexed row) from the buffer cache.

The mysql query cache actually caches the results of the query, see https://dev.mysql.com/doc/refman/8.0/en/query-cache.html so it doesn't even need to look at the rows at all.

SQL Server may have a similar thing now, I'm not up to date on it, but wanted to call out the distinction based on how you described it.

No question, SQL Server has a lot of awesome qualities for the right use case and has come a long long way since the Sybase days.


SQL Server does some great things for the data that's in it but if you ever want to get the data out (Elastic Search / Hadoop / Kafka / etc) you are in for a world of pain (and bills). The lock in alone is reason enough to steer clear of it for virtually all use cases.

As you start running into scaling issues and need to search for solutions that are faster to implement you'll reach for Enterprise and then for more cores and before you know it you're stuck with a $16,000 / core scaling cost that you can't avoid because you can't get your data out to offload things easily.

Dealing with SQL Server on a customer facing system is the stuff of nightmares.


It's unclear what you mean about the difficulties of getting data out of SQL Server since it's really no harder or easier than getting data out of any other RDBMS. Scaling SQL Server, just like scaling any other RDBMS, can be challenging since traditional RDBMSs don't distribute well. While some workloads are CPU-limited, that is not true for all workloads so the per-core licensing model of SQL Server (which ~$7k per core for 2016 Enterprise edition, not $16k) is not always prohibitive if your workload is more disk or RAM-intensive. There are points of differentiation between SQL Server, PostgreSQL, and MySQL that might make one better suited to a particular workload than others, but those situations are dependent on the details of a particular workload and something as general as "getting data out" or "on a customer-facing system" aren't specific enough to identify any one of those as more suitable than another.



That's the price for a 2-core license pack.


The notation shows per core and says sold as a 2 core pack. Does that make the actual price $28k for the pack or $14k for 2 cores. The way they list it is not particularly clear.


The SQL Server licensing model is "per core", but the minimum licensing increment is 2 cores which costs (2 * $7k = $14k). It's obviously a confusing pricing scheme, but try pricing out Oracle some time!


Aside from the cost aspect of commercial software, this is false. It is no more difficult to feed your data to another system than it is with any other relational databases. There is certainly no bill involved with exporting your data. In fact, SQL Server includes the tools you need to do this with software like SQL Server Integration Services. SQL Server has its own implementation of the ANSI SQL standard but this is true of every database system. The majority of it is compatible, it's only when you use certain functions that you have to consider porting them. SQL Server is the database used for many customer facing applications and there is often no nightmare involved.


With a PostgreSQL foreign data wrapper, you can actively push data into other data sources at the transactional level. You have pub sub functionality built in to let outside systems listen and react to data changes as well.

With SQL Server your options are variations of polling the database. We even tried posing the question of utilizing Elastic Search to developers at the PASS Summit recently and were widely met with groans of "Yea...that's a pain to do with SQL Server. You're basically out of luck."

For an internal standalone system it's great. For moving data into it do do analytics, great. For anything close to a real time web facing system with a growth pattern...absolute nightmare.


This is an impressive attempt to backup the original statements about SQL Server. It sounds like you may not be all that familiar with how the "Foreign Data Wrappers" work in Postgres and you are not familiar with the equivalent in SQL Server: Linked servers. Both implementations use OLEDB or ODBC drivers to provide transactional consistency to external applications.



> if you ever want to get the data out ... you are in for a world of pain

What do you mean by "get data out"? I've not had any issues getting rows out of SQL Server to be indexed externally.


This is only true for one specific kind of cache (query cache) which has been disabled by default for 4 years now.

It works well at cheating benchmarks, and masking un-optimized queries. It is not something that I would recommend.


I have been reading about Postgres architecture. The modular design enables - microkernel-like API where different languages can be integrated seamlessly.

The only thing, I am missing are incremental materialized views.


Have you check out PipelineDB yet? They build on Postgres and include continuous views, aggregations, joins, etc.

http://docs.pipelinedb.com/continuous-views.html


From my understanding, PipelineDB is a fork not an extension for PG. I liked what CitusData did - You download PG and add Citus Extension.


(Hi! I'm one of the founders of PipelineDB)

PipelineDB will be a standard PostgreSQL extension[0] by release 1.0.0. Currently we're about to release 0.9.7, and each release after that will incrementally factor out PipelineDB into a completely independent PostgreSQL extension.

That's the plan, anyways :) With https://www.stride.io being rolled out under heavy demand, we've got a lot on our plate!

[0] https://github.com/pipelinedb/pipelinedb/issues/1596


Thanks. The plan looks promising.

Just checked - stride.io , is it pipelineDB admin / user front end as a service ?

I am assuming there is a lot more going on behind the scenes ? Can you talk about the architecture.


Stride is a realtime analytics API built for scale and performance. It allows you to construct networks of continuous processing nodes that either materialize results (think high-throughput aggregation) or fire webhooks ("when this aggregate exceeds this value, POST to a url"). Each node is represented by a continuous SQL query, and nodes can be chained together by "tailing" other nodes.

These processing nodes can be queried and joined on (with SQL) at any time to easily power realtime dashboards and other analytical applications. Check out the original announcement [0] and the Stride docs [1] for more detail.

And to answer your question, no, Stride is not simply a frontend for PipelineDB. While it does use PipelineDB and PipelineDB Cluster extensively, it also uses other systems to provide unique capabilities, and all of this is ultimately synergized behind a dead-simple HTTP API.

We've found that a hosted database isn't actually all that interesting for users nowadays, and near impossible to build a business around because they've essentially become a cheap commodity. So we aim to deliver maximum value to users by nailing one use case (realtime analytics) with a highly focused API that eliminates most of the complexity and decision points you'd encounter when trying to do the same with a generic database.

[0] https://www.pipelinedb.com/blog/announcing-stride-a-realtime...

[1] https://www.stride.io/docs


+1 for incremental materialized views.

There are workarounds for emulating it with triggers, though. I have found the following discussion to have helped in a problem I had:

https://hashrocket.com/blog/posts/materialized-view-strategi...


What resources have you been using to familiarize yourself with Postgres?


I would highly recommend talks by Bruce Momjian https://momjian.us/ he has separate talks on pretty much everything related to PG + All The Dirt On Vacuum talk by Jim Nasby https://www.youtube.com/watch?v=L8nErzxPJjQ



you mean you want the materialized view to refresh automatically? or what does "incremental" mean in this context? (sorry I'm no DB-person)


I'm not OP, but I assume that phrase is in reference to the Oracle capability to not need to completely re-run the underlying query in order to refresh a materialized view.

Here's the Oracle article about materialized view refresh, there are further details about the different schemes for incremental refresh within. Briefly, it's a method for refreshing by looking at the deltas since last refresh.

https://docs.oracle.com/database/121/DWHSG/refresh.htm#DWHSG...


Yes, view refresh with only delta changes.


at least we can use 'CONCURRENTLY' since 9.4 or so, which is already a bonus. and if the data gets to big it's better to use elasticsearch, where you push single objects to it. sadly it requires way more work inside a application and of course more ops work.


This is now returning a 403 Forbidden.


Obligatory Archive.org link:

http://web.archive.org/web/20170126103158/http://www.interdb...

Also a reminder to everybody to give them a few $CURRENCY if you can. Archive.org offers a valuable service and are very underfunded.


The (intentional?) pun in $CURRENCY made me laugh.


Chapters >= 5 are apparently not in archive.org but are in Google's cache.


also returning 404s, at least for chapters 5 and 6


I'm the author of this document. Now, my hosting server denies accessing from foreign countries because of too many accesses. Please wait for a several hours.


Hug of death, methinks.


Yep, same here.

Guessing they're experiencing too much traffic or something, so this is their way to throttle it. ;)


It looks GREAT but...

I always see these things posted while they're still being written and then I never go back to see them when they're completed. And nobody posts when it's completed.

I'd prefer the posting not be done until it's finished, or, that there be an "email me when it's finished" button.


If you can't be bothered to set a reminder, subscribe to the RSS feed, create an ifttt trigger or anything like that... I don't know what to tell you. There's lots of solutions to this predicament. This one's on you.


I don't think a clearly expressed opinion does warrant the kind of response you gave here. There is little of substance in your comment except questioning the posters character by implying he/she is either lazy or to stupid.

A tip on how to use IFTTT quickly and easily for this particular task would probably be a more constructive response. I for one don't know how to use IFTTT to reliably detect changes in any webpage, I thougt it was mostly for different api services and the like, as those are the only use-cases I've heard about.


As far as not having to read things twice, draft and final, I'd have to lean towards @sqldba's position.


There is also an interesting middleground named Evolving Publication where the author publishes section by section, and the RSS feed informs you whenever a new section/chapter is added to the article:

https://martinfowler.com/bliki/EvolvingPublication.html

So the RSS feed is not per article, but per article part.


That is interesting, ta, although more work for the author.


I, OTOH, would rather know about it and start to take advantage of what's already available, rather than wait for the finish, if and when it ever comes.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: