PostgreSQL 17

kiwicopple · 2024-09-26T14:03:04 1727359384

Another amazing release, congrats to all the contributors. There are simply too many things to call out - just a few highlights:

Massive improvements to vacuum operations:

    "PostgreSQL 17 introduces a new internal memory structure for vacuum that consumes up to 20x less memory."

Much needed features for backups:

    "pg_basebackup, the backup utility included in PostgreSQL, now supports incremental backups and adds the pg_combinebackup utility to reconstruct a full backup"

I'm a huge fan of FDW's and think they are an untapped-gem in Postgres, so I love seeing these improvements:

    "The PostgreSQL foreign data wrapper (postgres_fdw), used to execute queries on remote PostgreSQL instances, can now push EXISTS and IN subqueries to the remote server for more efficient processing."

peiskos · 2024-09-28T16:00:56 1727539256

A bit off topic, can someone suggest how I can learn more about using databases(postgres specifically) in real world applications? I am familiar with SQL and common ORMs, but I feel the internet is full of beginner level tutorials which lack this depth.

Superfud · 2024-09-30T05:04:43 1727672683

For PostgreSQL, the manual is extremely well written, and is warmly recommended reading. That should give you a robust foundation.

brunoqc · 2024-09-26T20:22:13 1727382133

I batch import XMLs, CSVs and mssql data into postgresql.

I'm pretty sure I could read them when needed with fdw. Is it a good idea?

I think it can be slow but maybe I could use materialized views or something.

mind-blight · 2024-09-26T20:58:19 1727384299

I've been using duckdb to import data into postgres (especially CSVs and JSON) and it has been really effective.

Duckdb can run SQL across the different data formats and insert or update directly into postgres. I run duckdb with python and Prefect for batch jobs, but you can use whatever language or scheduler you perfer.

I can't recommend this setup enough. The only weird things I've run into is a really complex join across multiple postgres tables and parquet files had a bug reading a postgres column type. I simplified the query (which was a good idea anyways) and it hums away

brunoqc · 2024-09-26T21:16:44 1727385404

Thanks. My current pattern is to parse the files with rust, copy from stdin into a psql temp table, update the rows that have changed and delete the rows not existing anymore.

I'm hoping it's less wasteful than truncating and importing the whole table every time there is one single change.

mind-blight · 2024-09-27T00:02:09 1727395329

It probably is, but you can do that workflow with DuckDB too. It's just really flexible since it can all be done in SQL. Check out the DuckDB postgres connection (https://duckdb.org/docs/extensions/postgres.html)

To import from a Salesforce Bulk Export CSV (for example) into a postgres table is a few lines of SQL

``` INSERT INTO pgdb.accounts ( id, created_at, updated_at ) SELECT "Id", "CreatedDate"::TIMESTAMP, "UpdatedDate"::TIMESTAMP FROM read_csv('/path/to/file.csv') WHERE "Id" NOT IN (SELECT id FROM pgdb.accounts) ```

The path can be in a cloud storage provider as well, which is really nice. You can do updates which join from the postgres db or the other CSV files (or a MySQL database) as well. The data transforms (casting to timestamp, uuid, etc.) have been super handy along with all the other SQL niceties that you get

kiwicopple · 2024-09-26T20:32:59 1727382779

“it depends”. Some considerations for mssql:

- If the foreign server is close (latency) that’s great

- if your query is complex then it helps if the postgres planner can “push down” to mssql. That will usually happen if you aren’t doing joins to local data

I personally like to set up the foreign tables, then materialize the data into a local postgres table using pg_cron. It’s like a basic ETL pipeline completely built into postgres

victorbjorklund · 2024-09-27T16:48:07 1727455687

Oh. That is smart using it as a very simple ETL pipeline.

baq · 2024-09-27T08:46:45 1727426805

check out clickhouse. you might like it.

ellisv · 2024-09-26T15:43:31 1727365411

> I'm a huge fan of FDW's

Do you have any recommendations on how to manage credentials for `CREATE USER MAPPING ` within the context of cloud hosted dbs?

darth_avocado · 2024-09-26T17:48:54 1727372934

If your company doesn't have an internal tool for storing credentials, you can always store them in the cloud provider's secrets management tool. E.g. Secrets Manager or Secure String in Parameter Store on AWS. Your CI/CD pipeline can pull the secrets from there.

kiwicopple · 2024-09-26T20:24:58 1727382298

in supabase we have a “vault” utility for this (for example: https://fdw.dev/catalog/clickhouse/#connecting-to-clickhouse). Sorry I can’t make recommendations for other platforms because i don’t want to suggest anything that could be considered unsafe - hopefully others can chime in

frankramos · 2024-09-28T13:13:33 1727529213

Just upgraded Supabase to a Pro account to try FDW but there doesn’t seem to be solid wrappers for MySQL/Vitess from Planetscale. This would help a ton of people looking to migrate. Does anyone have suggestions?

netcraft · 2024-09-26T18:31:32 1727375492

I remember when the json stuff started coming out, I thought it was interesting but nothing I would ever want to rely on - boy was I wrong!

It is so nice having json functionality in a relational db - even if you never actually store json in your database, its useful in so many situations.

Being able to generate json in a query from your data is a big deal too.

Looking forward to really learning json_table

jackschultz · 2024-09-26T14:50:47 1727362247

Very cool with the JSON_TABLE. The style of putting json response (from API, created from scraping, ect.) into jsonb column and then writing a view on top to parse / flatten is something I've been doing for a few years now. I've found it really great to put the json into a table, somewhere safe, and then do the parsing rather than dealing with possible errors on the scripting language side. I haven't seen this style been used in other places before, and to see it in the docs as a feature from new postgres makes me feel a bit more sane. Will be cool to try this out and see the differences from what I was doing before!

0cf8612b2e1e · 2024-09-26T18:26:26 1727375186

A personal rule of mine is to always separate data receipt+storage from parsing. The retrieval is comparatively very expensive and has few possible failure modes. Parsing can always fail in new and exciting ways.

Disk space to store the returned data is cheap and can be periodically flushed only when you are certain the content has been properly extracted.

cjonas · 2024-09-27T05:01:23 1727413283

Did you mean "retrieval is comparatively inexpensive"? I think I'm on the same page but this threw me off.

franga2000 · 2024-09-27T13:54:15 1727445255

I read it as retrieval being the requests to the scraped site. I can parse a few thousand HTML pages in minutes, but fetching them in the first place takes hours.

0cf8612b2e1e · 2024-09-27T15:55:00 1727452500

Exactly what I intended. Scraping is slow (and may be an irreplaceable snapshot in time). Parsing is fast and repeatable so should be done in a separate process from a stored copy.

erichocean · 2024-09-26T19:14:27 1727378067

I ended up with the same design after encountering numerous exotic failure modes.

abyesilyurt · 2024-09-26T18:19:37 1727374777

> putting json response (from API, created from scraping, ect.) into jsonb column and then writing a view on top to parse

That’s a very good idea!

ellisv · 2024-09-26T15:21:08 1727364068

It is definitely an improvement on multiple `JSONB_TO_RECORDSET` and `JSONB_TO_RECORD` calls for flattening nested json.

pestaa · 2024-09-26T14:44:00 1727361840

Very impressive changelog.

Bit sad the UUIDv7 PR didn't make the cut just yet:

https://commitfest.postgresql.org/49/4388/

ellisv · 2024-09-26T15:25:49 1727364349

I've been waiting for "incremental view maintenance" (i.e. incremental updates for materialized views) but it looks like it's still a few years out.

JamesSwift · 2024-09-26T19:22:10 1727378530

I'm a huge fan of views to serve as the v1 solution for problems before we need to optimize our approach, and this is the main thing that serves as a blocker in those discussions. If only we were able to have v2 of the approach be an IVM-view, we could leverage them much more widely.

whitepoplar · 2024-09-26T17:13:43 1727370823

There's always the pg_ivm extension you can use in the meantime: https://github.com/sraoss/pg_ivm

ellisv · 2024-09-26T17:32:19 1727371939

Unfortunately we use a cloud provider to host our databases, so I can only install limited extensions.

gregwebs · 2024-09-26T19:56:39 1727380599

REFRESH CONCURRENTLY is already an incremental update of sorts although you still pay the price of running a full query.

isoprophlex · 2024-09-26T20:11:13 1727381473

Wow, brilliant! I never knew this existed. Going to try this out tomorrow, first thing!

garyclarke27 · 2024-09-27T10:16:37 1727432197

Agreed big disappointment that "incremental view maintenance" is taking so long to get into the core - despite several IVM extensions. For me this is by far the most important capability missing from Postgres.

0cf8612b2e1e · 2024-09-26T18:20:51 1727374851

I must be missing something because that feels easy to implement. A date seconds + random data in the same way as UUID4.

Where is the iceberg complexity?

lfittl · 2024-09-26T18:54:31 1727376871

In my understanding it was a timing issue with the UUIDv7 RFC not being finalized before the Postgres 17 feature freeze in early April. Shouldn't be an issue to get this in for Postgres 18, I think.

0cf8612b2e1e · 2024-09-26T21:02:08 1727384528

That would do it. I was mistakenly thinking v7 was done for months/years at this point.

Meniceses · 2024-09-27T11:24:24 1727436264

My assumption is that because you can easily do this through software when using UUID and probably a lot do it like this, the pressure of supporting it, is a lot lower than expected.

nikita · 2024-09-26T18:02:55 1727373775

A number of features stood out to me in this release:

1. Chipping away more at vacuum. Fundamentally Postgres doesn't have undo log and therefore has to have vacuum. It's a trade-off of fast recovery vs well.. having to vacuum. The unfortunate part about vacuum is that it adds load to the system exactly when the system needs all the resources. I hope one day people stop knowing that vacuum exists, we are one step closer, but not there.

2. Performance gets better and not worse. Mark Callaghan blogs about MySQL and Postgres performance changes over time and MySQL keep regressing performance while Postgres keeps improving.

https://x.com/MarkCallaghanDB https://smalldatum.blogspot.com/

3. JSON. Postgres keep improving QOL for the interop with JS and TS.

4. Logical replication is becoming a super robust way of moving data in and out. This is very useful when you move data from one instance to another especially if version numbers don't match. Recently we have been using it to move at the speed of 1Gb/s

5. Optimizer. The better the optimizer the less you think about the optimizer. According to the research community SQL Server has the best optimizer. It's very encouraging that every release PG Optimizer gets better.

sgarland · 2024-09-26T23:03:40 1727391820

MySQL can be faster in certain circumstances (mostly range selects), but only if your schema and queries are designed to exploit InnoDB’s clustering index.

But even then, in some recent tests I did, Postgres was less than 0.1 msec slower. And if the schema and queries were not designed with InnoDB in mind, Postgres had little to no performance regression, whereas MySQL had a 100x slowdown.

I love MySQL for a variety of reasons, but it’s getting harder for me to continue to defend it.

on_the_train · 2024-09-26T14:29:22 1727360962

My boss insisted on the switch from oracle to mssql. Because "you can't trust open source for business software". Oh the pain

bityard · 2024-09-26T18:30:44 1727375444

I ran into a lot of that 20 years ago, surprised to hear it's still a thing at all given how it's basically common knowledge that most of the Internet and Cloud run on open source software.

I once met an older gentleman who was doing IT work for a defense contractor. He seemed nice enough. We were making small talk and I happened to mention that I had recently installed Linux on my computer at home. He tone changed almost immediately and he started ranting about how Linux was pirated source code, stolen from Microsoft, all of it contains viruses, etc. He was talking about the SCO vs Linux lawsuits but of course got absolutely ALL of the details wrong, like which companies were even involved in the lawsuits. He was so far off the deep end that I didn't even try to correct him, I just nodded and smiled and said I was actually running late to be somewhere else...

gigatexal · 2024-09-26T14:32:09 1727361129

So from one expensive vendor to another? Your boss seems smart. ;-)

What’s the rationale? What do you gain?

systems · 2024-09-26T14:48:55 1727362135

Well, from one VERY expensive vendor, to another considerably less expensive vendor

Also, MSSQL have few things going for it, and surprisingly no one seem to be even trying to catch up

    - Their BI Stacks (PowerBI, SSAS)
    - Their Database Development (SDK) ( https://learn.microsoft.com/en-us/sql/tools/sql-database-projects/sql-database-projects?view=sql-server-ver16 )

The MSSQL BI stack is unmatched , SSAS is the top star of BI cubes and the second option is not even close

SSRS is ok, SSIS is passable , but still both are very decent

PowerBI and family is also the best option for Mid to large (not FAANG large, but just normal large) companies

And finally the GEM that is database projects, you can program your DB changes declaratively, there is nothing like this in the market and again, no one is even trying

The easiest platformt todo evolutionary DB development is MS SQL

I really wish someone will implement DB Projects (dacpac) for Postgresql

gigatexal · 2024-09-26T15:17:16 1727363836

What do you love so about PowerBI? I’ve not looked at it very closely. I’ve worked with Tableau and Looker and LookerPro. All of which seemed fine.

Is it a home run if the end users aren’t on Windows or using Excel? I’m thinking about the feature where you can use a SQLServer DB as a data source to Excel.

bradford · 2024-09-26T18:21:32 1727374892

> What do you love so about PowerBI?

For a large portion of my career, the dashboarding solutions I've worked with have followed a similiar model: they provide a presentation layer directly on top of a query of some kind (usually, but not always, a SQL query). This seems like a natural next step for organizations that have a lot of data in one spot, but no obvious way of visualizing it.

But, after implementing and maintaining dashboards/reports constructed in this way, big problems start to arise. The core of the problem is that each dashboard/report/visual is tightly coupled to the datasource that's backing it. This tight coupling leads to many problems which I won't enumerate.

Power BI is great because it can provide an abstraction layer (a semantic model) on top of the many various data sources that might be pushed into a report. You're free to combine data from Excel with msSql or random Json, and it all plays together nicely in the same report. You can do data cleaning in the import stage, and the dimension/fact-tables pattern has been able to solve the wide variety of challenges that I've thrown at it.

All this means that the PowerBI reports I've made have been far more adaptable to changes than the other tightly coupled solutions I've used. (I haven't used Tableau, but my understanding is that it provides similar modeling concepts to decouple the data input from the data origin. I'm not at all familiar with Looker).

[disclaimer, I'm a Microsoft Employee (but I don't work in Power BI)]

jaakl · 2024-09-27T09:30:03 1727429403

The problem of semantic models what I've seen in tools like Looker, Tableau and Qlik (very probably same for PowerBI) is that they are tightly coupled to the tool itself, work within them only. Now you want "modern data system" then you want them decoupled and implemented with an open semantic model which is then accessable by data consumers in Google spreadsheets, Jupyter notebooks and whatever BI/Analytics/reporting tools your stakeholder uses or prefers. There are very new solutions for this like dbt semantic models; their only issue is that they tend to be so fresh that bigger orgs (where they do make most sense) may be shy on implementing them yet. To the original topic - not sure how much PG17 can be used in these stacks, usually much better are analytical databases - BigQuery, Snowflake, maybe Redshift, future (Mother)Duck(db)

greggyb · 2024-09-27T13:16:05 1727442965

The semantic model in Power BI is not tightly coupled to the tool. It is an SSAS Tabular model. It is pretty trivial to migrate a Power BI model to Analysis Services (Microsoft's server component for semantic models, hostable on-prem or as a cloud offering).

Both Power BI and Analysis Services are is accessible via XMLA. XMLA is an old standard, like SOAP old, much older than dbt.

XMLA provides a standard interface to OLAP data and has been adopted by other vendors in the OLAP space, including SAP and SAS as founding members. Mondrian stands out in my mind as an open source tool which also allows clients to connect via XMLA.

From what I can see, dbt only supports a handful of clients and has a home-grown API. While you may argue that dbt's API is more modern and easier to write a client for (and I'd probably agree with you! XMLA is a protocol, not a REST API), the point of a standard is that clients do not have to implement support for individual tools.

And of course, if you want API-based access there is a single API for a hosted Power BI semantic model to execute arbitrary queries (not XMLA), though its rate limits leave something to be desired: https://learn.microsoft.com/en-us/rest/api/power-bi/datasets...

Note the limit on "one table" there means one resultset. The resultset can be derived from a single arbitrary query that includes data from multiple tables in the model, and presented as a single tabular result.

Note: XMLA is an old standard, and so many libraries implementing support are old. It never took off like JDBC or ODBC did. I'm not trying to oversell it. You'd probably have to implement a fair bit of support yourself if you wanted a client tool to use such a library. Nevertheless it is a protocol that offers a uniform mechanism for accessing dimensional, OLAP data from multiple semantic model providers.

With regard to using PG17, the Tabular model (as mentioned above, shared across Power BI and Analysis Services) can operate in a caching/import mode or in a passthrough mode (aka DirectQuery).

When importing, it supports a huge array of sources, including relational databases (anything with an ODBC driver), file sources, and APIs. In addition to HTTP methods, Microsoft also has a huge library of pre-built connectors that wrap APIs from lots of SaaS products, such that users do not even need to know what an API is: these connectors prompt for credentials and allow users to see whatever data is exposed by the SaaS that they have permissions to. Import supports Postgres

When using DirectQuery, no data is persisted by the semantic model, instead queries are generated on the fly and passed to the backing data store. This can be configured with SSO to allow the database security roles to control what data individual users can see, or it can be configured with security roles at the semantic model layer (such that end users need no specific DB permissions). DirectQuery supports Postgres.

With regard to security, the Tabular model supports static and dynamic low-level and object-level security. Objects may be tables, columns, or individual measures. This is supported for both import and DirectQuery models.

With regard to aggregation, dbt seems to offer sum, min, max, distinct count, median, average, percentile, and boolean sum. Or you can embed a snippet of SQL that must be specific to the source you're connecting to.

The Tabular model offers a full query language, DAX, that was designed from the ground up for expressing business logic in analytical queries and large aggregations. Again, you may argue that another query language is a bad idea, and I'm sympathetic to that: I always advise people to write as little DAX as possible and to avoid complex logic if they can. Nevertheless, it allows a uniform interface to data regardless of source, and it allows much more expressivity than dbt, from what I can see. I'll also note that it seems to have hit a sweet spot, based on the rapid growth of Power BI and the huge number of people who are not programmers by trade writing DAX to achieve business goals.

There are plenty of pain points to the Tabular model as well. I do not intend to paint a rosy picture, but I have strived to address the claims made in a way that makes a case for why I disagree with the characterization of the model being tightly coupled to the reporting layer, the model being closed, and the model being limited.

Side note: the tight coupling in Power BI goes the other way. The viz layer can only interact with a Tabular model. The model is so broad because the viz tool is so constrained.

jaakl · 2024-09-27T18:02:56 1727460176

Thanks for the writeup. There are indeed use cases, especially in the MS multiverse. Proof of the none->basic->complex “can do everything” (soap,xml,rpc)->radically simpler “do what really matters” (rest, json, markdown) path. I’m not really sure if dbt semantic layer is the final open “standard” for the future analytical models and metrics, it has own questionmarks, it is literally just a transformer with metrics as addon and there are just initial implementations, but today I’d rather give that thing a try. Simpler is so much better

bigger_cheese · 2024-09-27T01:47:45 1727401665

My Org has been heavily pushing us to use Power BI, I have found it has a lot of pain points.

I expect how useful it is depends a lot on your use cases. If you want to throw a couple of KPI's on a dashboard it fulfils that requirement pretty well but if you want to do any analytics (beyond basic aggregations like, min, max and median) or slightly complicated trending (like multiple Y axis) Power BI is painfully complicated.

systems · 2024-09-27T03:22:39 1727407359

PowerBI might be the only BI solution with a decent functional language

DAX is supper powerful, if you design your datawarehouse as a proper stat scheme, there is nothing you cannot calculate using DAX

Nothing in the market is even close, most BI tool have magic expression language, DAX is a proper language

bigger_cheese · 2024-09-27T05:50:56 1727416256

I'm sure it can work but I've found it hard to slot into our use cases.

Power BI's trending is one thing I have really struggled with the limitations it has.

For one of my use cases (industrial plant data) we often want to trend timeseries data on a common xaxis with uniquely scaled yaxis.

i.e Temperature, Pressure, Flow Rate, Fill height etc all have differently scaled yaxis but common xaxis.

Something like this example https://imgur.com/a/zd5Giom

Power BI limits you to a single primary and a single secondary Y axis only (i.e limit of two scales).

systems · 2024-09-27T06:41:10 1727419270

sorry weird question, did you keep changing this link? was one of the link on powerbi.tips

can you put that back, for some reason my browser history lost it , or something weird happened, not sure what it is

bigger_cheese · 2024-09-29T22:40:32 1727649632

I edited the link once to a graph that better illustrated what I meant both results came from a google image search.

gigatexal · 2024-09-26T15:35:36 1727364936

I got my start in a SQLserver all MS shop.

I hated SSIS. I wished I just had something akin to Python and dataframes to just parse things in a data pipeline. Instead I had some graphical tool whose error messages and deployments left a lot to be desired.

But SQLServer and Microsoft in general make for good platforms for companies: they’re ubiquitous enough that you won’t find it hard to find engineers to work on your stack, there’s support, etc

systems · 2024-09-26T16:49:21 1727369361

The key feature of SSIS, is parallel dataflow

You can so easily write (and schedule) parallel dataflows in SSIS, to do the same code using a general purpose programming language would be a lot harder

Also remember that dataflows are data pipe streams, so SSIS can be very very fast

Anyway, there is BIML, which allow you to create SSIS package by writing XML, I personally never used it, mainly because its licensing situation seemed weird to me ( i think BIML is free, but the tool support is not, and MS SSDT doesnt support BIML coding i thinkg)

gigatexal · 2024-09-26T17:12:26 1727370746

Yeah I think I never gave it a fair shake. I think — like most things — if understood and used properly it can be amazing.

wredue · 2024-09-26T15:47:41 1727365661

SSIS is for integrations, and pandas is definitely not. I’m not sure what you’re trying to do with SSIS that you’re also doing with pandas, but it’s probably wrong. SSIS is far more geared to data warehousing integrations, while pandas would be reading a data warehouse and doing stuff with it. SSIS isn’t really meant for processing incoming data, even if you can kind of hack it together to do that.

I will say that when we want “real time” integrations, SSIS is phenomenally bad. But that’s not entirely unexpected for what it is.

gigatexal · 2024-09-26T16:06:52 1727366812

We don't need to be so pedantic. Python -- as it often is -- would be the glue, it would be the connecting part, and pandas (polars, duckdb, anything really) would be the processing part. Then once processed the outputs would be placed somewhere be it an update to a db table or some other downstream thing.

wredue · 2024-09-27T20:27:29 1727468849

I was just saying you should most likely not be doing data processing with SSIS. That’s not what it’s for even if it can be cobbled in to doing some.

greggyb · 2024-09-26T19:28:57 1727378937

Power BI embeds a quite good columnstore database engine. It is exactly the same database engine as in the SQL Server Analysis Services Tabular model.

For the types of queries typical in BI (primarily aggregations that involve lots of table scans), this yields great performance. Non-technically savvy users can import millions of rows of data and explore this with interactive visuals. The performance characteristics of the database engine allow moderately complex logic to remain high-performance into 10s or 100s of millions of records. This is on laptop-spec hardware. 100Ms of records will need enough supporting RAM, but the data is highly compressed, so this fits.

The performance scaling allows simple aggregates of single-digit-billions-of-records to remain fast: usually just a couple of seconds. Performance tuning for more complex logic or larger datasets becomes complex.

When I talk about the performance characteristics above, I mean for the in-memory database engine. This performance comes out of the box. There are no indices to create or typical database administration. Users set up an import of data and define relationships among tables, and define aggregation logic. The query language is DAX, a functional, relational language built with the goal of making business logic and aggregations simple to write.

For simple use cases, users do not even need to write their logic in DAX. The viz layer automatically generated basic aggregates and provides a sophisticated filtering and interactive viz canvas experience.

There is also an ETL layer, called Power Query. This has very different performance characteristics, and in general you should try to minimize your work in this tool and its M language, as it has very different semantics. The sweet spot is intra-row logic and basic filtering of data that will be imported into the columnstore database I mentioned above.

The same underpinning technology, VertiPaq, supports columnar compression and storage/low level query in other Microsoft products.

Columnstore indices in SQL Server also use VertiPaq for compression and storage engine.

The data storage layer in Microsoft Fabric utilizes parquet files in delta tables (same tech as Databricks). Part of the VertiPaq engine is a set of compression optimization heuristics that are quite good. They apply this optimization to get 30%-50% compression improvements in the parquet files they generate. They call this feature V-order (v for VertiPaq).

The standard compression algorithms in Parquet include dictionary and run-length encoding. The same are used (though obviously with different implementations) in VertiPaq's native storage format. The compression optimization heuristics include some for identifying superior sort orders to help maximize the effect of these styles of compression. V-order is the application of these heuristics to the data before applying compression for the parquet files.

To be 100% clear, V-order is a pre-processing step that yields vanilla parquet files, not any sort of extension to the parquet format. V-order is applied automatically in the storage layer for Microsoft Fabric, called OneLake.

You may come across documentation for xVelocity; this is an abandoned name for the same technology, VertiPaq.

I can talk about this literally all day. If you've got specific followups, feel free to ask here or email me at (any name) at (my username) dot com.

greggyb · 2024-09-26T19:37:41 1727379461

Follow up regarding the platform question:

Power BI is a desktop application for Windows and also a SaaS solution for hosting the solutions you build in the desktop tool.

The SaaS product offering is confusing for the time being due to the introduction of Fabric, which includes Power BI and a whole data stack besides.

Your Power BI power users will need a Windows platform available, though I have seen plenty of use of Parallels or a remote desktop infrastructure for users not using a Windows device.

There are authoring experiences in-browser that are improving and in some cases surpassing what is in Power BI Desktop, but if you're doing a lot of work authoring in Power BI, then the desktop application is where you want to be for now.

For pure querying or just data model development, there are third party tools that are superior to Power BI Desktop (disclosure: I do work part time with one of these third party vendors).

kragen · 2024-09-26T15:15:31 1727363731

does 'bi' mean 'olap'?

because the literal expansion 'business intelligence' (or the explanation in the Wikipedia article) is hard to interpret as something that makes sense in contexts like this where you're apparently talking about features of software rather than social practices. the reference to 'bi cubes' makes me think that maybe it means 'olap'?

marcosdumay · 2024-09-26T19:05:55 1727377555

It basically means "report editors".

And no, MS ones aren't miles ahead of the competition. But are bundled with Office, so the competition is mostly going out of business by now.

greggyb · 2024-09-26T19:57:55 1727380675

Except Microsoft's report editor, Power BI, includes a powerful columnstore database that can handle importing and rapidly aggregating 100Ms or even 1Bs of records with query response times of just a couple seconds.

That has been the differentiator. Power BI has lots going for and against it, but its database is miles ahead of competitors' report tools.

Edit: clarification: importing large volumes of data takes more than a few seconds. The query performance of the in-memory database after import is complete is what I referred to above in my original comment.

systems · 2024-09-26T16:00:51 1727366451

DSS became BI became AI

At some point in time, Decision Support Systems became Business Intelligence and nowadays this is switching ti AI

BI (formerly DSS) is a set of tools that enable Business Analytics , OLAP and technologies that implements OLAP (Cubes and Columnar databases) enables BI

gigatexal · 2024-09-26T15:32:28 1727364748

I think so but it’s overloaded and basically used for reporting and analysis stuff.

setr · 2024-09-26T17:48:24 1727372904

bi tooling is usually report/analysis builders, intended to be consumed by.. business users. More importantly, they're usually also meant to be used by business users to build their own reports in the first place -- although in practice it often ends up going back to IT to maintain/update anyways. Basically anything that competes with excel pivot tables is a bi tool.

OLAP/cubes usually underpins bi tooling, since the usecase is almost entirely across-the-board aggregations.

Guillaume86 · 2024-09-26T20:41:55 1727383315

I fully agree on the sql project type, it's amazing, another great side benefit is intellisense on your db schema (find all references, etc), git blame on the schema is also great. Doesn't redgate offer kinda the same tool commercially with support for more than mssql? Never tried it but IIRC it looked similar...

tianzhou · 2024-10-01T03:37:58 1727753878

>> I really wish someone will implement DB Projects (dacpac) for Postgresql

Bytebase did https://www.bytebase.com/docs/change-database/synchronize-sc...

pphysch · 2024-09-26T16:42:16 1727368936

> And finally the GEM that is database projects, you can program your DB changes declaratively, there is nothing like this in the market and again, no one is even trying

Automatic migration tools have essentially been doing this for a while (e.g. Django migrations). But I agree it would be nice if Postgres had better support built in.

koolba · 2024-09-26T17:58:29 1727373509

Database migrations to arrive at a target state are an interesting beast. It’s not just the destination that matters, it’s how you get there.

The naive approach of applying whatever DDL you need to turn database schema A into B is not necessarily the one anybody would actually want to run, and in larger deployments it’s most definitely not the one you’d want to run.

Intermediate availability and data migration go well beyond simply adding tables or columns. They’re highly application specific.

The main improvements in this space are making existing operations CONCURRENT (e.g., like index creation) and minimizing overall lock times for operations that cannot (e.g., adding columns with default values).

pphysch · 2024-09-26T19:34:36 1727379276

I agree automating migrations correctly from state A to B without any risk of data loss or downtime is essentially an intractable problem.

Still, the popular tools seem to have good heuristics and good DBAs recognize when migrations must be written by hand.

tianzhou · 2024-10-01T03:46:58 1727754418

(Bytebase co-founder) Bytebase implements this https://www.bytebase.com/docs/change-database/synchronize-sc...

sroussey · 2024-09-26T15:46:24 1727365584

At least both of these databases support clustered indexes. For decades.

manishsharan · 2024-09-26T15:15:09 1727363709

MS Azure SQL on is very cheap compared to any other self hosted database including PG and MS SQL. Unless they running a super heavy workload , this solution will meet most business requirements.

But this is also a gateway drug for cloud addiction.

kjax · 2024-09-26T15:53:02 1727365982

After scanning the Azure and AWS pricing pages and calculators for comparably speced Azure Manged SQL and Amazon RDS for PostgreSQL instances, AWS's RDS comes out at about 1/3 the price of Azure Managed SQL.

This matches my experience with deployments on AWS and Azure. Both are managed db instances, but Azure's is consistently more expensive for comparable offerings on AWS or GCP.

manishsharan · 2024-09-26T17:26:00 1727371560

I do not have experience with AWS RDS so I cannot speak to that.

In my experience, GCP Cloud SQL for Postgres has been more expensive than MS Azure SQL. In our tests, CloudSQL also was not comparable to the resiliency offered by Azure SQL. Things like Automated DR and automated failover etc. were not at par with what Azure offered. Not to mention , Column level encryption is standard for Azure SQL.

AlfeG · 2024-09-26T17:08:25 1727370505

Is there any analog for Azure SQL Elastic Pools for Postgres anywhere?

We pay in total something around 600 bucks to manage around 250 databases in MSSQL servers (with failover for prod databases, DTU based model)

We pay for log analytics more then we pay for Sql Servers.

Those Elastic Pools is a blocker for us on the way to migrate to Postgres from MSSQL...

ilkhan4 · 2024-09-26T18:16:09 1727374569

It's been a while since I've looked at elastic pools in Azure, but maybe Neon (https://neon.tech/) or recently Nile (https://www.thenile.dev/) might work in terms of spinning up a bunch of separate logical DBs with shared compute/storage.

clarkbw · 2024-09-26T20:13:34 1727381614

(neon employee) and Neon will be available on Azure very soon ( https://neon.tech/blog/neon-is-coming-to-azure )

nikita · 2024-09-26T18:18:46 1727374726

(neon ceo). We have lots of examples of this. Here is one with Retool. https://neon.tech/blog/how-retool-uses-retool-and-the-neon-a...

on_the_train · 2024-09-26T14:37:12 1727361432

Exactly. Supposedly the paid solution ensures long term support. The most fun part is that our customers need to buy these database licenses, so it directly reduces our own pay. Say no to non-technical (or rational) managers :<

remram · 2024-09-26T14:46:17 1727361977

Did you not pay for Oracle?

on_the_train · 2024-09-26T15:38:01 1727365081

We pay, but what hurts is that our customers need to pay, too. For both oracle and ms of course

marcosdumay · 2024-09-26T19:02:07 1727377327

Microsoft is way less likely to sue you and a couple of orders of magnitude cheaper than Oracle.

Besides, managing Microsoft licensing is a bliss close to Oracle's. And yeah, MSSQL is much better in almost every way than Oracle.

If you only compare those two, it's a non-brainier.

0cf8612b2e1e · 2024-09-26T18:28:45 1727375325

A boolean column type.

throw351203910 · 2024-09-26T18:59:54 1727377194

What your boss doesn't realize is your business already depends on FOSS. Here are a few examples:

- Do you use any cloud provider? Those platforms are built on top of open source software: Linux, nginx (e.g Cloudflare's edge servers before the rust rewrite), ha-proxy (AWS ELB), etc - Either the software your business builds or depends on probably uses open source libraries (e.g: libc, etc) - The programming languages your business uses directly or indirectly are probably open source

My point is that folks that make these kinds of statements have no clue how their software is built or what kind software their business actually depends on.

WuxiFingerHold · 2024-09-27T03:11:09 1727406669

SQL Server (MSSQL) is not bad at all, just expensive.

The part that your boss doesn't trust Postgres is hilarious, of course.

jmull · 2024-09-26T16:32:34 1727368354

Well, you can't necessarily trust open source for business software.

The more deeply your business depends on something, the more careful you need to be when selecting the source for that something. (And the db is often very deeply depended on.)

You want to see why their long-term incentives align with your own needs.

But a revenue stream is just one way to do this, and not a perfect one. (Case in point: Oracle.)

In my experience, SQL Server isn't bad though. I know a couple commercial products that started with SQL Server in the late '90s and remain happy with it now. The cost hasn't been a problem and they like the support and evolution of the product. They find they can adopt newer versions and features when they need to without too much pain/cost/time.

(Not that I don't think Postgres is generally a better green-field pick, though, and even more so porting away from Oracle.)

baq · 2024-09-26T20:54:57 1727384097

mssql is a great rdbms. t-sql is... different... in certain places but all in all if cost isn't a big issue you really can't go wrong by picking it.

Rican7 · 2024-09-26T20:09:17 1727381357

Wow, yea, the performance gains and new UX features (JSON_TABLE, MERGE improvements, etc) are huge here, but these really stand out to me:

> PostgreSQL 17 supports using identity columns and exclusion constraints on partitioned tables.

> PostgreSQL 17 also includes a built-in, platform independent, immutable collation provider that's guaranteed to be immutable and provides similar sorting semantics to the C collation except with UTF-8 encoding rather than SQL_ASCII. Using this new collation provider guarantees that your text-based queries will return the same sorted results regardless of where you run PostgreSQL.

ktosobcy · 2024-09-26T15:22:25 1727364145

Would be awesome if PostgreSQL would finally add support for seamless major version upgrade…

dewey · 2024-09-26T15:51:34 1727365894

Maybe https://github.com/pgautoupgrade/docker-pgautoupgrade is interesting for you.

ticoombs · 2024-09-26T21:29:29 1727386169

I use this for my Lemmy instance & lemmy-ansible and it's been great! No longer having to support upgrade scripts and write a complete upgrade process[1] for people to follow has made my life a lot easier! Amazing product

- [1] https://github.com/LemmyNet/lemmy-ansible/blob/main/UPGRADIN...

ktosobcy · 2024-09-27T07:07:35 1727420855

I heard about that project but it still somewhat convoluted. Imagine being able to simply use "postgres:latest" or better yet "postgres:15" and switch to "postgres:16" and it would just update (like any other minor version does, or any other db, like mysql, does)

justinclift · 2024-10-01T04:24:25 1727756665

Yeah, we've offered to make that happen but there's been zero response:

https://github.com/docker-library/postgres/issues/37#issueco...

Btw, what do you find convoluted about our project (pgautoupgrade)?

It started out as super simple like you mention above... but it also started out using Alpine images so didn't work so well for people coming from Debian based ones. We've added support for Debian based ones now too, but that's introduced more complexity to the README (as has other options). o_O

ellisv · 2024-09-26T15:26:20 1727364380

I'm curious what you feel is specifically missing.

levkk · 2024-09-26T15:43:00 1727365380

pg_upgrade is a bit manual at the moment. If the database could just be pointed to a data directory and update it automatically on startup, that would be great.

olavgg · 2024-09-26T16:28:38 1727368118

I agree, why is this still needed? It can run pg_upgrade in the background.

vbezhenar · 2024-09-26T18:43:17 1727376197

It needs binaries for both old and new versions for some reason.

justinclift · 2024-10-01T04:27:34 1727756854

When you say "in the background" what are you meaning?

Unless something has radically changed with this last release, then the PostgreSQL database needs to be offline while pg_upgrade is running.

ktosobcy · 2024-09-27T07:10:22 1727421022

Being able to simply switch from "postgres:15" to "postgres:16" in docker for example (I'm aware about pg_autoupdate but it's external and I'm a bit iffy about using it)

What's more, even outside of docker running `pg_upgrade` requires both version to be present (or having older binary handy). Honestly, having the previous version logic to load and process the database seems like it would be little hassle but would improve upgrading significantly...

jeltz · 2024-09-27T16:36:50 1727455010

It would be a major hassle to do so since it would mean keeping around major parts of the code for all old versions of PostgreSQL. Possible to do? Yes. A hassle for the dev team? Yes, a huge one. Worth it? Doubtful.

ktosobcy · 2024-09-28T11:25:34 1727522734

> It would be a major hassle to do so since it would mean keeping around major parts of the code for *all old versions of PostgreSQL.*

WTF?

It was only implied to be able to do migration between major versions, not all of them (which doesn't make any sense).

I don't expect PostgreSQL changing that significantly between major version the upgrade would be such a huge hassle…

TBH PostgreSQL seem to be the only tool/software that has such a bonkers upgrade path between major versions…

jeltz · 2024-09-30T08:19:08 1727684348

Patches welcome since you obviously know more than me about the code base. And I obviously meant only major versions when I said all versions. PostgreSQL releases one new major version per year so that adds up quickly.

> I don't expect PostgreSQL changing that significantly between major version the upgrade would be such a huge hassle…

PostgreSQL would need code which transforms between the old AST to the new AST for example to support views and check constraints so PostgreSQL would need to keep around a version of the AST for every major version of PostgreSQL plus the code to deparse it. It would also need to keep around the code to read every version of the catalog tables (it partially does this in pg_dump, but only partially since parts of pg_dump is implemented using server side functions).

There are also likely a bunch more complications caused by the pluggable nature of PostgreSQL: e.g. custom plan nodes which might make the whole project a non-starter.

So this would at least mean basically a total rewrite of pg_dump plus maintaining 5 to 10 versions of the AST. But likely much more work since I have likely forgot a bunch of stuff which needs to change. A huge project which increases the maintenance burden for a relatively small gain.

> TBH PostgreSQL seem to be the only tool/software that has such a bonkers upgrade path between major versions…

This is caused mostly by PostgreSQL being heavily extensible unlike its competitors.

justinclift · 2024-10-01T04:33:50 1727757230

> A huge project which increases the maintenance burden for a relatively small gain.

No, it would be for a huge gain for the people who run PostgreSQL. Every long running PostgreSQL installation use has to go through the process and/or potential downtime figuring out a (potentially forced) upgrade every few years.

Instead of that, PG being able to transparently (or at least automatically) upgrade major versions as needed would remove that absolutely huge pain point.

Other databases have recognised this as being a major problem, then put the time and effort into to solving it. The PostgreSQL project should too. :)

justinclift · 2024-10-01T05:26:01 1727760361

> I'm a bit iffy about using it

It's not perfect, as it doesn't (yet) recreate indexes or automatically run `vacuum analyze` afterwards. We're working on those though. :)

You do make backups of your database though yeah?

ktosobcy · 2024-10-02T15:37:15 1727883435

Yes I do, but for the simplicy I just stick with the same PG version and doesn't touch it once deployed on local RPi :D

justinclift · 2024-10-03T00:36:35 1727915795

Heh Heh Heh. Yeah, that's how most people seem to do it, even on non-docker systems. ;)

Works well for a few years as PG is super stable. But we tend to see people when they've run their systems longer than that and it's gotten to the point of their PG install is end of life (or similar circumstance), so they're looking at their upgrade options.

The pgautoupgrade approach is one potential way to fix that, and (in theory) should mean that once they're using it they don't have to worry about major version upgrades ever again (in a positive way).

lpapez · 2024-09-26T17:52:43 1727373163

Amazing release, Postgres is a gift that keeps on giving.

I hope to one day see Incremental View Maintenance extension (IVM) be turned into a first class feature, it's the only thing I need regularly which isn't immediately there!

andreashansen · 2024-09-26T16:59:57 1727369997

Oh how I wish for Postgres to introduce system-versioned (bi-temporal) tables.

qianli_cs · 2024-09-26T17:43:44 1727372624

What's your use case for system-versioned tables? You could use some extensions like Periods that support bi-temporal tables: https://wiki.postgresql.org/wiki/Temporal_Extensions

Or you could use triggers to build one: https://hypirion.com/musings/implementing-system-versioned-t...

baq · 2024-09-26T21:02:13 1727384533

any system of record has a requirement to be bitemporal, it just isn't discovered until too late IME. I don't know if there's a system anywhere which conciously decided to not be bitemporal during initial design.

jeltz · 2024-09-26T18:00:39 1727373639

It will hopefully be in PostgreSQL 18.

infamia · 2024-09-27T00:07:23 1727395643

Is there an active effort to get temporal tables into Postgres at the moment?

infamia · 2024-09-28T03:18:10 1727493490

It looks like Paul A. Jungwirth and others are trying to get Temporal Tables into Postgres 18.

https://illuminatedcomputing.com/posts/2024/07/temporal-reve...

yen223 · 2024-09-26T14:49:27 1727362167

MERGE support for updating views is huge. So looking forward to this

majkinetor · 2024-09-26T17:00:41 1727370041

Now it also supports RETURNS keyword !

clarkbw · 2024-09-26T15:47:41 1727365661

Some awesome quality-of-life improvements here as well. The random function now takes min, max parameters

SELECT random(1, 10) AS random_number;

clarkbw · 2024-09-26T15:57:25 1727366245

It's never available on homebrew the same day so we all worked hard to make it available the same day on Neon. If you want to try out the JSON_TABLE and MERGE RETURNING features you can spin up a free instance quickly.

https://neon.tech/blog/postgres-17

(note that not all extensions are available yet, that takes some time still)

dewey · 2024-09-26T17:52:23 1727373143

How is your hosted managed Postgres platform an alternative to Homebrew (Installing PG on your local Mac)? There's also many Homebrew formulas that have PG 17 already like (https://github.com/petere/homebrew-postgresql/commit/2faf438...).

clarkbw · 2024-09-26T19:18:10 1727378290

oh, i see I'm getting downvoted; this isn't a sales pitch for my "managed postgres platform".

every year a new postgres release occurs and i want to try out some of the features i have to find a way to get it. usually nobody has it available.

here's the list of homebrew options I see right now:

brew formulae | grep postgresql@ postgresql@10 postgresql@11 postgresql@12 postgresql@13 postgresql@14 postgresql@15 postgresql@16

maybe you're seeing otherwise but i updated a min ago and 17 isn't there yet. even searching for 'postgres' on homebrew doesn't reveal any options for 17. i don't know where you've found those but it doesn't seem easily available.

and i'm not suggesting you use a cloud service as an alternative to homebrew or local development. neon is pure postgres, the local service is the same as whats in the cloud. but right now there isn't an easy local version and i wanted everyone else to be able to try it quickly.

openrisk · 2024-09-26T19:18:05 1727378285

There was LAMP and then MERN and MEAN etc. and then there was Postgres.

Its not quite visible yet, but all this progres by postgres (excuse the pun) on making JSON more deeply integrated with relational principles will surely at some point enable a new paradigm, at least for full stack web frameworks?

winrid · 2024-09-26T19:24:21 1727378661

The problem with postgres's JSON support is it leads to last write win race conditions compared to actual document stores. Once they can support operations like incrementing a number while someone else updates another field, without locking the row, then maybe.

I don't think you'll see a paradigm shift. You'll just see people using other documents stores less.

sgarland · 2024-09-26T22:34:52 1727390092

We’re already in a new paradigm, one in which web devs are abandoning referential integrity guarantees in favor of not having to think about data modeling.

I’d say they’re then surprised when their DB calls are slow (nothing to do with referential integrity, just the TOAST/DETOAST overhead), but since they also haven’t experienced how fast a DB with local disks and good data locality can be, they have no idea.

pnt12 · 2024-09-27T07:31:42 1727422302

Can you elaborate on what's TOAST/DETOAST?

davidrowley · 2024-09-27T08:30:11 1727425811

The Oversized-Attribute Storage Technique. https://www.postgresql.org/docs/17/storage-toast.html

switch007 · 2024-09-26T20:07:41 1727381261

Props to all the people (and companies) behind Postgres https://www.postgresql.org/community/contributors/

lukaslalinsky · 2024-09-27T10:07:57 1727431677

Wow, it finally has failover support for logical replication slots. That was biggest reason why I couldn't depend on logical replication, since the master DB failover handling was too complex for me to deal with.

veggieroll · 2024-09-26T14:27:26 1727360846

Loving the continued push for JSON features. I'm going to get a lot of use out of JSON_TABLE. And json_scalar & json_serialize are going to be helpful at times too. JSON_QUERY with OMIT QUOTES is awesome too for some things.

I hope SQLite3 can implement SQL/JSON soon too. I have a library of compatability functions to generate the appropriate JSON operations depending on if it's SQLite3 or PostgreSQL. And it'd be nice to reduce the number of incompatibilities over time.

But, there's a ton of stuff in the release notes that jumped out at me too:

"COPY .. ON_ERROR" ignore is going to be nice for loading data anywhere that you don't care if you get all of it. Like a dev environment or for just exploring something. [1]

Improvements to CTE plans are always welcome. [2]

"transaction_timeout" is an amazing addition to the existing "statement_timeout" as someone who has to keep an eye on less experienced people running SQL for analytics / intelligence. [3]

There's a function to get the timestamp out of a UUID easily now, too: uuid_extract_timestamp(). This previously required a user defined function. So it's another streamlining thing that's nice. [4]

I'll use the new "--exclude-extension" option for pg_dump, too. I just got bitten by that when moving a database. [5]

"Allow unaccent character translation rules to contain whitespace and quotes". Wow. I needed this! [6]

[1] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...

[2] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...

[3] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...

[4] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...

[5] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...

[6] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...

nbbaier · 2024-09-26T14:46:16 1727361976

> I hope SQLite3 can implement SQL/JSON soon too. I have a library of compatability functions to generate the appropriate JSON operations depending on if it's SQLite3 or PostgreSQL. And it'd be nice to reduce the number of incompatibilities over time.

Is this available anywhere? Super interested

veggieroll · 2024-09-26T15:08:32 1727363312

No, it's not at the moment. Sorry!

The most useful part is doing set intersection operations on JSON array's. Probably the second is extracting a value by path across both.

It's not crazy to implement, SQLite was the harder side. Just a bit of fiddling with `json_each`, EXISTS, and aggregate functions.

nbbaier · 2024-09-26T20:00:01 1727380801

Might give it a whirl. Is the library pure sql or is it written in something else?

veggieroll · 2024-09-26T21:00:39 1727384439

I've been using Go with Goqu [1] for SQL for a lot of things lately. But, the language shouldn't matter much. The functions are just building the appropriate SQL clauses.

For some of the operations, the method I was using required marshaling the inputs to JSON before sending them over the wire. And that's nicer in a non SQL programming language. But both db's ultimately do have json_build_array/json_build_object for PostgreSQL or json_array/json_object for SQLite3.

[1] https://github.com/doug-martin/goqu

pbronez · 2024-09-26T15:28:02 1727364482

Yeah JSON_TABLE looks pretty cool. Here's the details:

https://www.postgresql.org/docs/17/functions-json.html#FUNCT...

SOLAR_FIELDS · 2024-09-27T02:39:40 1727404780

Looking forward to AWS DMS supporting this nice software release. They were super quick about supporting PG16 so this should be easy, right?

https://repost.aws/questions/QUbIn2WmXgTbiVu_4wua9tUw/dms-wi...

nojvek · 2024-09-26T21:12:01 1727385121

I wish postgres supports parquet file imports and exports. COPY command with csv is really slooooooooow. Even BINARY is quite slow and bandwidth heavy.

I wonder how open postgres is and what kind of pull requests postgres team considers? I'd like to learn how to contribute to PG in baby steps and eventually get to a place where I could contribute substantial features.

ioltas · 2024-09-26T21:57:18 1727387838

There has been a patch to extend the COPY code with pluggable APIs, adding callbacks at start, end, and for each row processed: https://commitfest.postgresql.org/49/4681/.

I'd guess that this may fit your purpose to add a custom format without having to fork upstream.

ssfak · 2024-09-27T09:00:59 1727427659

Using the pg_duckdb[1] is an option, if you can install extensions on your setup.

[1]. https://github.com/duckdb/pg_duckdb

ksec · 2024-09-26T14:54:04 1727362444

With 17, is Vacuum largely a solved issue?

ses1984 · 2024-09-26T15:04:15 1727363055

I’m not up to date on the recent changes, but problems we had with vacuum were more computation and iops related than memory related.

Basically in a database with a lot of creation/deletion, database activity can outrun the vacuum, leading to out of storage errors, lock contention, etc

In order to keep throughput up, we had to throttle things manually on the input side, to allow vacuum to complete. Otherwise throughput would eventually drop to zero.

sgarland · 2024-09-26T15:11:58 1727363518

You can also tune various [auto]vacuum settings that can dramatically increase the amount of work done in each cycle.

I’m not discounting your experience as anything is possible, but I’ve never had to throttle writes, even on large clusters with hundreds of thousands of QPS.

imbradn · 2024-09-26T17:42:23 1727372543

No, vacuum issues are not solved. This will reduce the amount of scanning needed in many cases when vacuuming indexes. It will mean more efficient vacuum and quicker vacuums which will help in a lot of cases.

jabl · 2024-09-26T16:52:36 1727369556

There was a big project to re-architect the low level storage system to something that isn't dependent on vacuuming, called zheap. Unfortunately it seems to have stalled and nobody seems to be working on it anymore? I keep scanning the release notes for each new pgsql version, but no dice.

qazxcvbnm · 2024-09-26T17:26:56 1727371616

I think OrioleDB is picking up from where zheap left things, and seems quite active in upstreaming their work, you might be interested to check it out.

__natty__ · 2024-09-26T20:11:39 1727381499

This release is good on so many levels. Just the performance optimisations and the JSON TABLE feature could be entirely separated release, but we got so much more.

paws · 2024-09-26T16:21:19 1727367679

I'm looking forward to trying out incremental backups as well as JSON_TABLE.

Thank you contributors!

monkaiju · 2024-09-26T15:37:48 1727365068

PG just continues to impress!

miohtama · 2024-09-26T14:46:30 1727361990

The titan keeps rocking.

h1fra · 2024-09-26T15:37:24 1727365044

Amazing, it's been a long time since I have been that much excited by a software release!