Hacker News new | past | comments | ask | show | jobs | submit login
PostgreSQL 11.3 and 10.8 (postgresql.org)
281 points by oskari on May 9, 2019 | hide | past | favorite | 157 comments



For those stuck on older versions of Postgres, I highly recommend paying the downtime to upgrade. Going from 9.x to 11 will get you a measurably large performance gain for free.


Out of interest (SQL Server guy mainly, so only partly keep up with what other engines are doing), what changes significantly affect performance (without making changes to your own code/configuration to make use of new features) in 10.x & 11.x?


The query planner in 10 got a lot better at enforcing row-level security constraints efficiently for some common scenarios, like 10-20x speedups. See https://github.com/postgres/postgres/commit/215b43cdc8d6b4a1... and the linked mailing list thread for more info, if you're curious.


One of the huge ones was the ability to use > 1 cpu core for big aggregations or huge select queries. That and a massively better query planner.


There is usually a bunch of small improvements in every release, and those can add up over time. In Postgres 10 and 11 a lot of stuff happened related to parallel queries, and many more queries can be run in parallel now. 11 added a JIT compiler to the query planner, but I'm not sure whether that is enabled by default yet.


It kinda bugs me that people say "SQL Server" to mean "Microsoft SQL Server". I mean, there are other sql servers.


SQL Server is a registered trademark of Microsoft at least in the US and I’m sure other jurisdictions. Using SQL Server to refer to Microsoft’s relational database product is perfectly reasonable and pretty much every will know what you mean. The generic term seems to be RDMS or some variation on that.


Referring to a RDBMS as a "SQL Server" would probably confuse people. Besides the fact that it's already the name of a specific product, the overall genre of software doesn't even serve SQL.


Same here. FWIW, I informally refer to it as "MS-SQL" to differentiate.


When I do that, the less technical people I report to get it confused with MySQL which is not what I want.


M$-SQL would be appropriate, but it might be confused with Oracle...


Good to know! Can you give an example? When I google "SQL Server" it seems like most results are related to MS SQL Server


An example of other SQL servers? PostgreSQL would be one, which is what this thread is about.


> It kinda bugs me that people say "SQL Server" to mean "Microsoft SQL Server". I mean there are other sql servers.

I don't mind that; sure, there are other RDBMS servers that support SQL as their main or exclusive language, but unless you are talking about ancient Sybase products (for which there is a very good reason for the shared branding) “SQL Server” is a clear, exclusive Microsoft product identity and not any worse of a label, say, FTP (yes, there are other file transfer protocols).

OTOH,what does bug me is when people say “SQL” to mean “Microsoft SQL Server”.


https://en.wikipedia.org/wiki/SQL_Server

Same as "FTP Server" doesn't mean proftpd, "HTTP Server" doesn't mean apache and "C compiler" doesn't mean gcc. SQL is a language. Server is a generic term.

Another example: I may ask my colleague the following: "Which SQL server should we use in our new project?". Does this mean, i would like to know the edition and version of the MS SQL Server or maybe (and from my point of view more likely) i am looking for PostgreSQL, MySQL or MSSQL. If i'd like to know which edition and version i should use, i'd ask "Which MS SQL Server should i use?". That's a difference.

Might be that our views differ but i can understand the parents points.


But httpd does mean Apache, and sh means Bourne shell.

It also helps that nobody says "The SQL server is down" when their mysql instance is down. Even when using a generic term it's "the database is down"


> ... and sh means Bourne shell.

1. Except on Android, stock Debian/Ubuntu/etc., and in the bootloader and initramfs of my embedded OS;

2. I'm guessing you actually meant 'Bourne Again Shell' instead of 'Bourne Shell'.


Eh... sh only refers to the generic specification of a bourne-family shell, usually bash or dash; almost nobody is actually using the original Bourne shell.


> It also helps that nobody says "The SQL server is down" when their mysql instance is down.

I've heard such sentences many times from many people.


sh refers to a POSIX shell. The bourne shell (bsh) is not entirely compatible with the Bourne Against Shell (bash), and bash will switch to a compatibility mode if called as sh.

Additionally, many distros, including Ubuntu and Debian, do not use bash as their default shell. (Ubuntu/Debian use dash, Arch uses zsh, Alpine uses busybox sh)


In openbsd, httpd means their own webserver. In plan9, there is also a webserver called httpd.


I find that the bigger problem for me is the number of RDBMs that use 'SQL' in the name somewhere means that non-technical people get confused between 'knowing SQL', as in knowing how to develop SQL code, and 'knowing SQL' as in knowing how to manage a particular database product (usually SQL Server). That's how I got turned into a DBA without realizing it.


Out of interest ;) SQL Server is such an expensive beast, ~$15K per core, what are your reasons for prefering it over PG?


I quite like Postgres and use it preferentially, but your numbers are off. SQL Server Enterprise costs more like $7500/core; that $15K pack, as far as I am aware, comes with two core licenses. SQL Server Standard 2016 costs $931 for a license if you use CALs ($209 a pop), or $3700/core.

Also bear in mind that almost nobody pays list price for any of this.


$7500 per core, but only $15,000 for two cores sounds like a sweet deal though.


There are no more server CPUs with less than 6 cores though.

We could happily manage with just 2 dedicated to SQL server, but you have to license all the cores you have. So 3K worth of low end hardware ends up costing you several multiples of that in licensing.


What features in SQL Server Enterprise do you need? More than 24 cores? More than 128GB of RAM?

I ask because SQL Server Standard is significantly cheaper when you're using CALs instead of unlimited connections. Ten servers fronting a SQL Server Standard install is $3K, which at a large enterprise is often within a director-level's discretionary equipment budget.


User CAL is only cheap if you only deal with small amount of users (internal app). Please note a user is not SQL User account, but represents actual individual person. If you have a webserver (with one db user) but a thousand of visitors, then Microsoft treats it as thousand user CALs . Following MS doc explains this in detail http://download.microsoft.com/download/6/F/8/6F84A9FE-1E5C-4... , look for keyword 'multiplexing'


Huh, TIL. I literally had a Microsoft sales rep tell me otherwise. Anything to get the sale, I guess.


> but you have to license all the cores you have

Is that the case? I'm sure I've seen people running on limited cores for licencing reasons rather than having to licence every core. Not sir if that was enforced by the engine refusing to use more, or via seeing processor affinities.


I imagine a VM is a good solution to that.


Windows also let's you set CPU affinity per process - could that be used as a simple workaround?


I laugh when people have to deal with these problems when others are solving real problems using free tools.


For someone who grew up with decent open source solutions in the software stack, that deal simply sounds insane and it should be the majority's thinking soon enough.

And if those paidwares don't solve your problems exclusively, I doubt people would choose to use it when they can solve their problems with the tools they grew up with.


Having to spend time dealing with the MS licensing mess, rather than building software, seems wasteful and in my experience can be likened to sitting on a cactus for an extended period of time.


I agree. (My own stuff is pretty much all Postgres and the occasional MySQL.) But when you're in an organization where your needs might merit SQL Server, I'd hope you'd have somebody to deal with that for you. ;)


Day job. Never had to pay for it out of my own pocket! I've worked at MS shops for quite some time.

Also while we can do what we like with our newer SaaS offerings, getting some of our on-prem clients to use PG over SQL Server would be an uphill struggle so not considering other options isn't entirely our decision.

That $15K sounds like an Enterprise Edition costing too, IIRC Standard is significantly cheaper. Most don't need Enterprise, especially since 2016Sp1 when a lot of previously Enterprise-only features became available in all editions (including the free Express edition in many cases). Developer edition is currently free (licensed for development use only, of course) and is essentially Enterprise edition with different licensing terms.


It's about $7000 per core for enterprise - and if you do something like license a whole blade and put VMs on top of it, you could fairly easily run 10-20+ SQL VMs on one decent sized blade for < 150k. And anything pre-prod is free. It's a small cost for most larger size companies of this scale. If you have one or two servers, then no, enterprise is probably out of reach.


I use MS-SQL for my day job, and Postgres for personal projects. Pros of MS-SQL are mainly 1) easy integration with the universe of Microsoft dev tools, frameworks, and infrastructure, and 2) great database tooling in itself:

* SSMS - SQL Server Management Studio

* The MS BI stack: SSRS (reporting), SSAS (analysis), SSIS (integration)


The tooling is indeed sweet, and in a big company I could see it paying for itself in saved dev time.


Sounds like a victim of vendor lock in. I assume you will bleed hard to get out of it.


How is this any different from the sunk cost that's a consequence of going with _any_ particular technology stack?

I mean, if you chose Python/Postgres/etc., presumably, you'd have to rewrite/retool everything if you went to a different stack.

If you're talking about open vs closed source: the C# compiler (Roslyn) is completely free and open source, as are a number of .NET components: CoreFX - Core foundational libraries, CoreCLR - runtime.

As well as PowerShell, VS Code, Typescript, ASP.NET Core, MVC, Blazor, F#, Z3 Theorem Prover, etc.

And that's from just the first few pages of their Open Source page: https://opensource.microsoft.com/


Analysis Services, Integration Services, Clustered Index


Big parallelism updates that the query planner can take advantage of.

I believe also updates to index seek or scan in that time.


I've just upgraded my hobby app from 9.6 to 11 and some of my old queries are now timing out :-|

I'll update this thread when I find out why.


    analyze
fixed pretty much everything :)


what do you mean? You looked at the analyse report and changed the query to avoid some unnecessary computation?


I maintain a couple of MySQL based applications. I don't really use any features outside of "standard SQL" is there a reason to switch over to Pg? I haven't used Pg before and usually default to MySQL.


One big argument: Transactional DDL. For example:

  begin;
  alter table foos add answer int not null default 42;
  alter table foos drop column plumbus;
  update foos set name = upper(name);
  create table bars (t serial);
  drop table dingbats;
  rollback;  // Or, of course, commit
What's the benefit? Atomic migrations. You can create, alter, drop tables, update data, etc. in a single transaction, and it will either commit complete if all the changes succeed, or roll back everything.

This is not possible in MySQL, or almost any other database [1], including Oracle — DDL statements aren't usually transactional. (In MySQL, I believe a DDL statement implicits commits the current transactions without warning, but I could be wrong.)

Beyond that, I'd mention: PostGIS, arrays, functional indexes, and window functions. You may not use these things today, but once you discover them, you're bound to.

[1] https://wiki.postgresql.org/wiki/Transactional_DDL_in_Postgr...


I use transactional DDL in my tests. All the tables, triggers, etc. are set up inside a transaction, and then the actual tests run inside nested transactions. At the end of the test run, the outer transaction gets rolled back, and everything disappears.

I don't know if it accomplishes anything truly new (other than ideas that aren't very useful in practice like being able to have multiple test runs going in parallel), but it's a pretty neat way to be able to do it and works well.


Transactional tests have some downsides, unfortunately. If your tests test transactional code, that code itself cannot create transactions; they have to use savepoints, which aren't quite the same. Transactional tests also don't work with testing anything concurrent, unless you share the session across threads/goroutines/whatever.

Lastly, if a test fails you'd typically like to leave the data behind so that you can inspect it. A transactional test that rolls back on failure won't allow that.


Save points, with proper management of them, seem to match a conceptual nested transaction as far as I've seen. We've got a test bed connection manager that mocks savepoint creation, rollback and committal into transaction management functions so doing something like beginTransaction && beginTransaction works fine.


There may be some semantic nuances. Savepoints aren't real transactions, so they "inherit" the locks of the parent transaction, for example. But it might not matter in practice in the context of tests.


That's true, thanks. None of those are relevant for me, but there definitely are cases where it wouldn't be the right approach.


I use Pg's Template Databases[1] for this. Gives me the freedom to use transactions and concurrency in tests.

Without this built-in feature, I'd have used filesystem snapshots, if I didn't mind the time it'd take to stop and start Pg.

----

1: https://www.postgresql.org/docs/current/manage-ag-templatedb...


I don't think one would migrate if Pg's strength is on 'alter table' which isn't what people do on a daily basis

Might want to mention the downside of using MySQL as well. (Am also interested to know as a daily MySQL user.)


If Oracle DDL is not transactional, what's the point of its Edition-Based Redefinition feature?


Oracle's "editions" are more like versioning, I think. Last I checked, only a very limited set of schema objects were editionable (views, triggers, procedures, etc.), not including tables or indexes.


At my PHP-shop company, most projects are limited to MySQL 5.7 (legacy reason, dependency reason, boss-likes-MySQL reason...). They are all handicapped by MySQL featureset, and can't update to 8 yet. If they had used Postgres some years ago, they would get:

- JSON column (actually MySQL 5.6 supports it but I doubt if it's as good as Postgres)

- Window functions (available in MySQL 8x only, while this has been available since Postgres 9x)

- Materialized views, views that is physical like a table, can be used to store aggregated, pre-calculated data like sum, count...

- Indexing on function expression

- Better query plan explanation


Also suffering under mysql 5.7 here and agree. Also even stuff like CTEs/WITH make queries more readable and composite field types like ARRAY are still missing (you see GROUP_CONCAT shenanigans being used instead).

For indexing on function expressions in particular, the workaround we use is to add a generated column and index that.


Be warned that in PostgreSQL, WITH is an optimization barrier, and is planned to remain that way to serve that purpose. If you can, prefer using views to enhance readability (and testability as a bonus). PostgreSQL views (unlike those in MySQL) do not prevent optimization across them.


No, CTEs are not planned to remain a barrier, this is already fixed in the next version which is in feature freeze right now.

https://www.depesz.com/2019/02/19/waiting-for-postgresql-12-...


My favorite feature of PostgreSQL 12 except perhaps REINDEX CONCURRENTLY, but I am very biased since I was involved in both patches (both were large projects involving many devs and reviewers). It is awesome to finally see both land.


This is the best news I've heard all week.


I have tried to express my joy at this news to my less SQL literate co-workers... that failed so I wanted to let it out here. This is the best news, I am overjoyed!


Oh wow, that is news to me! A welcome change.


Which is very often a good thing. I have tuned more than one query by moving a sub-query/derived table into a CTE.

What bothers me more, that a CTE prevents parallel execution, but I think that too is fixed with Postgres 12


> Indexing on function expression

MySQL 5.7 fully supports this. See https://dev.mysql.com/doc/refman/5.7/en/create-table-generat... and https://dev.mysql.com/doc/refman/5.7/en/create-table-seconda...

> JSON column (actually MySQL 5.6 supports it but I doubt if it's as good as Postgres)

Actually MySQL 5.6 doesn't support this, but 5.7 does, quite well: https://dev.mysql.com/doc/refman/5.7/en/json.html


Indexing a generated/computed column is not the same as creating an index on an expression. If you want to support several different expressions you need to create a new column each time.

Additionally, an ALTER TABLE blocks access to the table. Indexes can be created concurrently while other transactions can still read and write the table.

But MySQL doesn't support indexing the complete JSON value for arbitrary queries. You can only index specific expressions by creating a computed column with that expression and indexing that.


> If you want to support several different expressions you need to create a new column each time

Yes and no. Generated columns in MySQL can optionally be "virtual". An indexed virtual column is functionally identical to an index on an expression.

> Additionally, an ALTER TABLE blocks access to the table.

It depends substantially on the specific ALTER and version of MySQL. Many ALTERs do not block access to the table in modern MySQL; some are even instantaneous.

> But MySQL doesn't support indexing the complete JSON value for arbitrary queries. You can only index specific expressions by creating a computed column with that expression and indexing that.

What's the difference, functionally speaking? (Asking honestly, not being snarky -- I may not understand what you are saying / what the equivalent postgres feature is?)


> What's the difference, functionally speaking? (Asking honestly, not being snarky -- I may not understand what you are saying / what the equivalent postgres feature is?)

You create a single index, e.g:

create index on the_table using gin(jsonb_column);

And that will support many different types of conditions,

e.g.: check if a specific key/value combination is contained in the JSON:

where jsonb_column @> '{"key": "value"}'

this also works with nested values:

where jsonb_column @> '{"key1" : {"key2": {"key3": 42}}}'

Or you can check if an array below a key contains one or multiple values:

where jsonb_column @> '{"tags": ["one"]}' or where jsonb_column @> '{"tags": ["one", "two"]}'

Or you can check if all keys from a list of keys are present:

where jsonb_column ?& array['key1', 'key2']

All those conditions are covered by just one index.


Interesting, thanks! Indeed, there isn't an equivalent feature in MySQL yet.

Out of curiosity, how commonly is this used at scale? I'd imagine there are significant trade-offs with write amplification, meaning it would consume a lot of space and make writes slow. (vs making indexes on specific expressions, I mean. That said, you're right -- there are definitely use-cases where making indexes on specific expressions isn't practical or is too inflexible.)


I'm stuck on 5.7 because previous dev used the worst sprocs I've seen (no exaggeration) and until I've ripped them all out I daren't move to 8, it was on 5.5 when I started but with much effort I got it tested enough to reasonably confident that 5.7 would work.

It's an excruciating process though.


Actually window functions were introduced in Postgres 8.4


If there is an analytics db/replica, your data analysts will worship the ground you walk on if you migrate from MySQL to Postgres.


Interesting, can you elaborate, I'm considering a switch.


Not the parent, but even as a MySQL expert I'd agree that Postgres is often a better choice for OLAP-heavy workloads (e.g. very complex reporting and analytical queries). MySQL's query planner historically hasn't been great for these, and ditto for its feature set overall, although it is improving.

Meanwhile, for high-volume OLTP workloads, MySQL (with either InnoDB or MyRocks on the storage engine side) has some compelling advantages... this is one reason why social networks lean towards MySQL for their OLTP product data, and/or have stayed on MySQL despite having the resources to switch.

As with all things in computer science, there are trade-offs and it all depends on your workload :)


Please humor me if I sound a bit triggered in this response, as MySQL pre-v8.0 is like if the creators went to data analysts and asked: "what features could we remove, and what changes could we make to an SQL-standard-compliant DB (postgres), in order to slow you down as much as possible, hinder your ability to read and write queries, and generally make your life a living hell?"

I really get that MySQL is good for what it does, from an engineer's point of view. It is an absolute piss-poor excuse for a database, prior to v8.0.

So what's wrong with MySQL (again, prior to v8.0, but no one seems to use the damn current version)

-Not ANSI SQL compliant (unlike Postgres)

-No CTEs/WITH clause (?!)

-no WINDOW FUNCTIONS (?!?!?!?)

-"schemas are called databases" which makes for bizarre interpretation of `information_schema` queries, which behave the same across all other DBs except mySQL. What I mean to say is MySQL calls each schema it's own database. This results in having to connect the same DB multiple times to other programs/APIs/inputs which accept JDBC.

-Worse replication options than postgres, not default ACID compliant,

-Don't know the programming term for this... but the horrendous "select col1, col2, col3... colN, count(<field>) from table group by 1" implicit group by. Meaning the system takes your INVALID query, and does things underneath the hood to return a result. Systems should enforce correct syntax (you must group by all non-aggregation columns... mysql implicitly does this under the hood).

-on a tangentially related note to the prior one, MySQL returns null instead of a divide by zero error when you divide by zero. Divide by zero errors are one of the few things that should ALWAYS RETURN AN ERROR NO MATTER WHAT -mysql doesn't support EXCEPT clauses

-doesn't support FULL OUTER JOIN

-doesn't support generate_series,

-poor JSON support

-very limited, poor array/unnest support

-insert VALUES () (in postgres) not supported

-lack of consistent pipe operator concatenation,

-weird datatype suppport and in-query doesn't support ::cast

-doesn't support `select t1._* , t2.field1, t2.field2 from t1 join t2 on t1.id = t2.id` ; that is, you cannot select * from one table, and only certain fields from the other.

-case dependence in field and table names when not escape quoted (mysql uses backtick, postgres uses double quote for escaping names). What the fuck is this? SQL is a case-insensitive language, then the creators build-in case sensitivity?

-As I mentioned above, mysql uses backticks to escape names. This is abnormal for SQL databases.

-mysql LIKE is case-insensitive (what the hell, it's case-sensitive everywhere else). Postgres has LIKE, and ILIKE (insensitive-like).

-ugly and strange support for INTERVAL syntax (intervals, despite being strings, give a syntax error in mysql. Example: In postgres or redshift etc you would right `select current_timestamp - interval '1 week'. In MySQL, you'd have to do `select current_timestamp - interval 1 week` (the '1 week' could be '7 month' or '2 day'... it's a string, and should be in single quotes. MySQL doesn't do this)

-mysql doesn't even support the normal SQL comment of `--`. It uses a `#` instead. No other database does that.

-probably the worst EXPLAIN/EXPLAIN ANALYZE plans I've ever seen from any database, ever

-this is encapsulated in the prior points but you can't do something simple like `select <fields>, row_number() as rownum from table`. Instead you have to declare variables and increment them in the query

-did I mention it's just straight up not SQL standard compliant?

At least MySQL 8.0 supports window functions and CTEs (seriously it's a death knell to a data analyst not to have these). They are the absolute #1 biggest piece of missing functionality to an analyst in my opinion.

This entire post focused on "mySQL have-nots", rather than "Postgres-haves" so I do think there are actually _even more_ advantages to using Postgres over MySQL. I understand MySQL is very fast for writes, but to my understanding it's not even like Postgres is slow for writes, and on the querying side of the coin, it's a universe of difference.

If you ever use MySQL in the future and there will be a data analyst existing somewhere downstream of you, I implore you to use MySQL v8.0 and nothing older, at any cost, for their sake.


I'm not sure what you mean by the "star from one table, specific columns from another" issue... if I'm understanding correctly I'm pretty sure MySQL allows just that.

Division by zero errors and non-"magical" GROUP BY have been the default mode of operation for a _little_ longer, since the 5.7 series.


Hmm, you're right. Must've been user error on that one ;)

I stand by the rest of my points, however.


Still mostly the same for MySQL 8?


MySQL 8.0 has Window Functions and CTEs. Of the entire list of things I mentioned, window functions and CTE's comprise 80% of the "weight" of my issues of that list.

In other words, while I (or another analyst) would likely still prefer Postgres over MySQL of any version, I wouldn't really have too much to complain about if I was using v8.


- A robust security model with role inheritance that supports both column and row policies.

- PLV8/PLPython/C functions/etc (with security!)

- TimescaleDB

- Better JSON query support

- Foreign Data Wrappers

- Better window function support

- A richer extension ecosystem (IMO)

Honestly, at this point I wouldn't use MySQL unless you only care about slightly better performance for very simple queries and simpler multi-master scaling/replication. Even saying that, if you don't need that simple multi-master scaling RIGHT NOW, improvements to the Postgres multi-master scaling story are not too far off on the roadmap, so I would still choose PG in that case.


The benefits are better defaults in terms of data reliability. Hard to say if migration is worth it to you without a lot more details, but I'd definitely recommend trying it in a new project.


We just migrated a medium size project using pgloader with great success and minimal headaches, which seems like a big step up from the last time I had to migrate a project. Highly recommended, and it might be easier than you think!


Frankly, data reliability concerns with modern MySQL / InnoDB are very outdated FUD.

Many of the largest tech companies rely on MySQL as their primary data store. They would not do so if it was unreliable with persistence.

There are many valid reasons to choose Postgres over MySQL, or vice versa -- they have different strengths and weaknesses. But there are no major differences regarding data reliability today, nor have there been for many years now.


I haven't kept up, but I remember having constrains that were accepted by the parser, but ignored by the engine (using InnoDB, I don't remember what specific constrain was concerned).

Is it still the case?


It was the case until very recently. Check constraints are now supported in MySQL 8: https://dev.mysql.com/doc/refman/8.0/en/create-table-check-c...

I believe MariaDB added support for them a couple years earlier, but am not certain.

More broadly, I would agree it's a very painful "gotcha" to have aspects of CREATE TABLE be accepted by the parser but ignored by the engine. However, in MySQL's defense, theoretically this type of flexibility does allow third-party storage engines to support these features if the engine's developer wishes.

Ideally, the engine should throw an error if you try using a feature it does not support, but in a few specific cases it does not (at least for InnoDB). This can be very frustrating, for sure. But at least it's documented. And no database is perfect; they all have similarly-frustrating inconsistencies somewhere.


To be clearer, it's about the defaults regarding strictness (of the acceptance) of data that help to avoid trouble up front, not that mysql randomly corrupts/deletes it later.


Sure, and MySQL 5.7, released 4 years ago, fixed those bad defaults.

It isn't fair to compare Postgres-of-today to MySQL-of-over-4-years-ago.


Ok, looks like that version is in LTS dists now. Perhaps to poster's legacy apps are taking advantage of them.

There were other deficiencies mentioned in Klepmann's book on Designing Data apps, but I don't remember the specifics now.


Even if OP is using 5.6 or previous, strict mode has been available as an option for over 15 years, and can be enabled dynamically (no restart required).

I simply don't see any valid argument for avoiding MySQL due to "data reliability" concerns in 2019.

> There were other minor deficiencies mentioned in Klepmann's book on Designing Data apps, but I don't remember the specifics now.

Well, I can't really respond to non-specific points from a book I haven't read. I'm happy to respond to any specifics re: data reliability concerns, if you want to cite them. FWIW, I have quite extensive expertise on the subject of massive-scale MySQL (16 years of MySQL use; led development of Facebook's internal DBaaS; rebuilt most of Tumblr's backend during its hockey-stick growth period).


It takes years to build up trust, seconds to lose it. Obsolete documentation to disappear. It's an uphill battle. In the meantime there's postgres, and it's free.


MySQL defaults to InnoDB. Is there a different metric for reliability you had in mind?

Where I work, we chose MySQL back in 2012 due to production quality async replication. I think (but am never sure) that that is now good in Postgres land.

PG has a lot of SQL features I'd love to use and can't. OTOH MySQL's query planner is predictably dumb, which means I can write queries and have good idea about how well (or not) they'll execute.


I can't remember the last time I started a project using MySQL, may be it is catching up - but PITR, partition tables and document-oriented columns are some features I use a lot. Postgis also used to be stronger then the MySQL counterpart.


I'm pretty sure you're frequently using LIMIT, which (TIL) is non-standard SQL. (PostgreSQL shares that syntax though.)


It's hard to believe that Google Cloud SQL still only has 9.6 available.

EDIT: Apparently 11.1 is available in beta as of April 9th.


Actually, 11 is now in Beta. If you create a new instance it is listed as an option.


I tested it just before I posted the comment (to confirm) and didn't see it listed. Maybe it depends on your account?

EDIT: I'll try again. Looks like it was added April 9th

https://cloud.google.com/sql/docs/postgres/create-instance


i created an 11 (beta) instance yesterday and it worked as expected.


Running 10.7 and 10.6 on two production applications with Heroku. Thinking about moving to 11 to ensure support for the long run as I rarely need to touch this and it's very stable but would like to minimize any headaches in the future.

Any complications or hiccups I need to worry about moving from 10 to 11?

Per Heroku Docs: By supporting at least 3 major versions, users are required to upgrade roughly once every three years. However, you can upgrade your database at any point to gain the benefits of the latest version.


The release notes for version 11 include a list of potentially incompatible changes: https://www.postgresql.org/docs/11/release-11.html#id-1.11.6...


Thanks, yeah I just did some testing locally and made the upgrade on Heroku (the documentation was rock solid).


Question for PG happy users.

How do you manage failover and replication? At my previous job this was done by a consultant. Is this doable on a self hosted setup?

Thank you in advance.


PostgreSQL has replication built in now. I set it up at work, and it replicates reliably, in a fraction of a second. I've never had to fail over, but it seems straightforward to do so. The only hard part was following Postgres's documentation in setting it all up. It seemed to me a bit scattered to me. I had to jump around to different sections before I put it all together in my mind.


What do you use? Are there some instructions/articles that you'd recommend reading? Is it anything like Galera?

I know of BDR, but there hasn't much news about it lately, especially with more recent versions of Pg.

We like Galera for our simple needs: we use keepalived to do health checks, and if they pass the node participates in the VRRP cluster. If one node goes down/bad, another takes over.


If you want multi master in Postgres, I think BDR is going to be your best option, but the version for PG 10+ isn't open source so you'll have to pay for it. We're using the open source version on PG 9.4 currently in production, it's worked fine so far.

If you're just looking for a hot standby and dont need a multi master setup, you can set those up just with pg. https://www.postgresql.org/docs/9.4/hot-standby.html



We've been doing replication for 3+ years with zolando patroni. It works great. We run pg in docker and patroni too. First it was patroni with consul and right now its patroni with kubernes store (it store leader in endpoint). Highly recommend. There are other popular tools for this, it just a preference.


One option for automated health monitoring and failover is pglookout: https://github.com/aiven/pglookout


We use this, pgbouncer, and a bash script to link the two, for completely automated failover.

Queries done through pgbouncer just pause as if the query is really really slow when the db goes down, then when pglookout does the failover, the bash script switches pgbouncer's config and those pending queries are sent immediately.


Question from a Python web developer. (Django mainly, exploring Flask presently)

For a complex web-app, would you suggest an ORM (looking at SQLAlchemy) or a custom module with hand written queries and custom methods for conversion to python objects?

My app has a lot of complex queries, joins, etc. and the data-model is most likely to change quite a bit as the app nears production. I feel using an ORM is an unnecessary layer of abstraction in the thinking process. I feel comfortable with direct SQL queries, and in some cases, want to directly get JSON results from PGSQL itself.

Would that be a good idea, and more importantly, scalable?

Note : My app will be solely developed by me, not expecting to have a team or even another developer work on it.


Use both. Many of the business logics are just as simple as query by id, filter/sort by a couple of columns. A smart ORM will handle fetching relationships without hitting N+1 problem

For advanced queries, you can write raw SQL

The way I see it, an ORM has three useful features:

- A migration/seed mechanism (you will need it anyway)

- A schema definition for mapping tables to object

- A query builder

If you feel that an ORM is too heavy, you can seek for just the query builder.


i worked on a mid-sized django app and that was basically what we did:

* for normal queries (select /cols from table where id etc etc) we just used plain django orm. even for weird joins, django orm makes it a lot easier than using raw sql

when we needed raw speed, we just wrote raw sql and delegated to django sql layer -- that way we leverage everything the framework has with raw sql power.


Even when the ORM models start to get cumbersome I like to use sqlalchemy.sql to assemble SQL queries.

It maps pretty much 1:1 to SQL and for me it beats the alternative (using text interpolation for composing queries).


SQL alchemy is good for fairly straightforward queries where you mainly need to do "select * from ..." and you want to pull down related rows from another table based on a foreign key in the first table as a separate query. It's also good if you have a lot of junior devs that don't know SQL and you want to encapsulate complex sql logic for them.

If you're doing anything more complex than these basic sorts of queries and subqueries, or your developers are proficient in sql, using even a very good ORM like sqlalchemy is going to be a step down.

Since you say you're doing this all yourself, and SQL is probably the most ubiquitous programming language (in terms of percentage of jobs requiring it, not total LOC) so learning opportunities there are more valuable, I would go direct.


I have used Postgres with both Django and Flask quite a bit now.

Since you're probably used to dealing with and migrating your tables manually, I would keep custom SQL for all your complex operations, and use SQLAlchemy for doing basic insert/update/select. Django also has an "unmanaged" mode where you can create a model and it will avoid trying to create a migration to create the table.

Of course, you have to manually update the model if you manually change your DDL.

Watch out for differences on how you are serializing data from Django/SQLAlchemy models vs. raw dicts from PsychoPG.

I like to organize my SQL by keeping each query in a separate .sql file and writing a little wrapper that fetches the files (+1 for caching it) and then executing it. I'm not a fan of lots of inline SQL mixed with Python.

Overall I think it's a great + powerful setup!


I usually put all my SQL statements in a single python file, and import the file as a module. I get to have descriptive variable names for the queries.


Pyscopg2 + Raw SQL inside of “”” “”” strings, and use %(foo)s as a parameter placeholder. Cur.execute will accept a parameter dictionary like:

cur.execute(query, {‘foo’: bar})

Passing values directly into cur.execute is the best way to prevent SQL injection as well since it will sanitize the input params upon running


Fellow Python-Postgres web dev here. (Pyramid is my framework of choice, check it out!)

I'm developing a web application that uses SQLAlchemy. The ORM has been a huge boon for CRUD functionality. We also have some very complicated reporting features and use SQLAlchemy's query builder almost exclusively. I find that the query builder maps very cleanly to SQL, so I can still "think" in SQL while writing and reading it. And the query builder makes complex query composition easier to manage.


I've found that 'sqlalchemy.sql.text' works well for complex queries that don't need to be built up incrementally, and the fluent sql interface is great otherwise. Also, reflection can be really useful when working with existing databases, and for maintenance scripts that might not need to be tied directly to your model.

SQLAlchemy provides more than just the ORM... I actually wish the docs were structured differently to better emphasize that in search results, etc.


You can use SQLAlchemy Core for SQL generation and execution, without using its ORM. This lets you build queries from reusable Python objects rather than strings, and use Alembic for DB migrations, while still retaining control over the generated SQL.


> likely to change

Hard to say, but don't forget about migration support, which is quite helpful.


Totally wish we could upgrade but for some reason AWS have still not implemented any upgrade path for Aurora PostgreSQL other than dump and reimport despite apparently working on it for a year...


Does AWS Aurora actually use postgres or is it simply a postgres compatible API on top of their own technology?


As with RDS Postgres, it's Amazon's fork of Postgres.

With Aurora, the storage layer is swapped out entirely for a distributed storage engine, that I believe is based upon DynamoDB.

The wire protocol and server interface are much the same as regular Postgres, though there are some additional benefits as well as caveats as you might expect


I'm pretty sure it's a fork of PG based on my experience.


this is a huge issue for us and I'm extremely unhappy this was not clear in the docs


What's worse is the documentation straight up lies. It states you can perform a major version upgrade by resorting a snapshot and selecting a higher version. I mean it's still not ideal except if you do try this you'll find the option doesn't actually exist - either via the console or API/CLI!

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

This has really put us off using other AWS managed products and was a major factor in us deciding against using Amazon Elasticsearch Service.


MySQL has Galera: is there a multi-master option for Pg?

I know of BDR, earlier versions of which are open source, but there hasn't been much movement with Pg 10 or 11 AFAICT.

We don't do anything complicated, but simply want two DBs (with perhaps a quorum system) that has a vIP that will fail-over in case one system goes down (scheduled or otherwise).

Galera provides this in a not-too-complicated fashion.


PostgreSQL has logical replication built-in since version 10. This allows you to replicate specific tables between multiple master databases, accepting writes on each. You define a merge function in case there's conflicts.


impressive and .. upgrade on 10.x now in process, easily, quickly, thanks to the Postgres PGDG Debian/Ubuntu repos .. BUT do not choose meta-package postgres ! Under Ubuntu at least, upgrading the meta-package postgres adds an entire new server 11+ without confirmation .. why is this tolerated.. genuinely annoying


I think you are looking for "apt-get upgrade" and not "apt-get dist-upgrade". Or, just install the version you specifically want


It's unfortunate that the official Docker images haven't been updated yet (on DockerHub).


keep in mind that the "official" docker images are "offical" in the sense of docker inc marking them as official, not in the sense of "the upstream provides these". This is the repo for the Dockerfiles https://github.com/docker-library/postgres and it begins with:

> This is the Git repo of the Docker "Official Image" for postgres (not to be confused with any official postgres image provided by postgres upstream)


I'm not a database guy so have no clue, but why are there so many versions receiving support? Is there just that much legacy crap they can't get away from, like Python?


People are slow to upgrade database systems, as it can take a log of regression testing to make absolutely sure your applications don't rely on unsupported/undocumented/undefined behaviours that make them compatible with the newest release (or are affected by officially acknowledged breaking changes). Especially in enterprise systems. Even if developers upgrade quickly, their clients with on-prem installations may not. That means that to be taken seriously you need to support your major and minor releases for some time to be accepted as a serious option in some arenas.

Supporting five versions is no more than MS do: currently SQL Server versions 2017, 2016sp2, 2016sp1, 2014sp3, 2014sp2, 2012sp4, 2008R2sp2 and 2008sp3. 2008sp3, 2008R2sp2, and 2016sp1 will hit their final EOL in a couple of months taking SQL Servers's supported list back down to 5 too.

I expect other significant DB maintainers have similar support life-time requirements for much the same reasons, though I'll leave researching who does[n't] as an exercise for the reader.


2008 and R2 are still in a supported phase of life. It's the "exorbitant support fee" phase. Nevertheless, you can still get Microsoft support for the two after the "EOL". It's more an end-of-public life


Aye, and by the same technicality you can still get support for 2005.

Similar with PG I assume. You could always pay someone an expensive contracting fee to support your use of an older version than is publicly supported.


With databases being often mission critical, the PostgreSQL people decided heroically to support major versions for 5 years -- and as they come out with a new major version every year, minor updates come out for 5 different branches.

Note the recent versioning change: 9.4, 9.5, 9.6 were the previous 3 major versions bases, and the last two are 10 and 11.


1) It's stateful, so upgrades also have to upgrade the state (MBs, GBs, TBs of data)

2) It's horrifically high risk because downgrading is usually not a thing

3) It usually requires downtime.


Moving between major versions of Postgres requires downtime proportional to the size of the database. Supporting older versions allows users to go many years without having to do this.


I upgraded from 9.3 → 11.2 a few months ago using pg_upgrade[1], on a master+slave database with 150GB of data. I did a fair amount of testing, but the final procedure was very fast and smooth.

1. Test the upgrade: set up an additional secondary (9.3), break the replication link (promote it to a master). Test the upgrade on that. It was really fast, under 30 seconds to shut down the old DB, run the in-place upgrade, and start up the new DB.

2a. In production: set up an additional secondary (9.3). Make the primary read-only. Promote the new secondary to a master. Shut down, upgrade to 11.2, restart. Point applications at it.

2b. Backout plan: leave the applications pointing at the original database server, make it read-write.

There are other options, including with only seconds of downtime, but <1 minute with pg_upgrade was simple and very acceptable for us.

[1] https://www.postgresql.org/docs/current/pgupgrade.html

[2] https://www.postgresql.org/docs/current/upgrading.html


This is a nice way to do that, but you have a low volume of data, and you think 30 seconds is fast and 1 minute of downtime is acceptable. I question these assumptions.

Consider the situation when you're adding thousands of new records per seconds, and the database is being used every second (quite literally: to compute per seconds statistics).

A better solution is to have triggers on the old master, to do the same inserts on the new master (after copying the data/promoting a replica/whatever), and have similar triggers on the new master when the IP is not the old master (to be able to backout to the old server)

Then both the new and the old master run "in parallel", with the same data, and you can have the apps use the new server (on a new domain name, new ip, new port, whatever) when you want - on a app by app basis if you want. You can keep both until you decide to decommission the old master.


That's a good plan for someone with a larger or busier database, but my approach was the correct solution for us. You can be satisfied that your taxes aren't wasted by us implementing anything more complicated :-)

(Most of the writes to our PostgreSQL server are batch processes, decoupled with a message broker, and the message consumers were easily paused during the upgrade.)


Not everyone is google, and hec. even google an their services are down sometimes.

If there's not too much data, it seems like the safest and fastest option.


Postgres users actually generally upgrade faster than those using other databases because there are a lot of new features each year. But once your database gets huge then upgrading still becomes a pain, so that's why they keep providing security support and bug fixes for older versions as well.


pg_upgrade with the --link option is extremely fast and doesn't really depend on the size of the database.


Interesting. I just use RDS and it always seems to take 15+ min even though our database is tiny.


Well supported older releases of the database engine, with clearly defined migration documentation and technology -- are the hallmark of successful Open source software ecosystem.

Because it mirrors and supports the reality of the business world.

Every large or small organization that manages their business, every year make 'Grow/Invest', 'Maintain', 'Disinvest' decision for each of the product/service lines.

Does not matter if is software, or making kielbasa. Postgres is exceptional, and is supporting the first 2.


With semantic versioning each time the major version changes it signifies a breaking change. If you have an application that breaks from one of those breaking changes you may not see it as a business opportunity to update because it “works” as it is. However, minor version changes can include anything that doesn’t break. So security patches are hopefully added to any major version that is officially supported.


PostgreSQL versioning is similar to semantic versioning, but doesn't follow it precisely. Major versions require a dump and restore (or other transform, such as an upgrade) of the on-disk data. Minor versions are fixes. Prior to PostgreSQL 10, the changes in the second numeric place are considered "major" versions. So, the past 5 major versions are 11, 10, 9.6, 9.5, and 9.4. The most recent versions of each of those are respectively 11.3, 10.8, 9.6.13, 9.5.17, and 9.4.22.


It's not "legacy crap". There just is long-term-support for versions cause it's not that easy to upgrade (both technically & others).

There's legacy crap everywhere, all langs,db,versions etc. Supported sometimes for 10+ years.


why did you have to bring Python into this? Every language used widely will have "legacy" crap.


not broken, don't fix


There are some shockingly old releases of PostgreSQL still in production for this reason.

Security updates should push the upgrade path a little harder, but there are still cases where a database can be completely isolated from the network and that might not even matter.


I inherited a production system with a PostgreSQL 8.1 database. It's one of the most reliable systems I have.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: