I have suggested MERGE as the last resort (due to the the main table having 40(!) circa columns), not as the preferred option – please refer to the last paragraph in my comment. Yes, it is difficult indeed to recommend something more specific without knowing exact details.
MERGE does have a number of its own peculiarities that are database engine specific and the MERGE behaviour is not portable across RDBMS's.
For instance, Oracle simply does not care about the transaction context width in which a MERGE is used, and the transaction will either succeed or fail no matter how long the transaction takes. Whereas MS SQL Server is very, very touchy and will kill off the transaction if, say, an index update caused by a MERGE is taking too long (from the SQL Server perspective). So it is better to minimise the transaction context containing a MERGE for the MS SQL Server and carefully assess the index update performance. I would expect Postgres to have the behaviour closer to that of Oracle, but I have not looked into it.
More generally speaking (and if we ignore MERGE implementation specific details for a moment), remember that the UPSERT pattern has come about as a workaround for the missing MERGE functionality which was added into the specification very late, and vendors were slow to implement it. MERGE covers a few very useful use cases, especially for complex UPSERT scenarios, but it does not obviate UPSERT's in simple scenarios. Both are useful.
My main gripe is the idea that MERGE is somehow better than INSERT ON CONFLICT for upserts, which I don't think is ever true. I don't think I agree that it was added as a workaround - it was added rather as a proper solution, but for a more limited use case which is popular in OLTP workloads: simple thread-safe upsert.
MERGE definitely has its place in complex ETL-style workloads, where you may also want to delete stuff, or change data in the source table as well as in the target table.
Regarding SQL Server - it's not true that SQL Server will kill transactions based on some timeout (unless there's a deadlock detected). I worked with SQL Server extensively, including using MERGE for upserts (btw, you have to use serializable isolation to make MERGE thread-safe, so it's not so easy to get rid of transactions altogether). SQL Server doesn't kill transactions, except when there's a deadlock.
FWIW, in SQL Server community there's a well-known notion of MERGE being buggy (even though it's been supported for like 15 years!) and hard to reason about (especially when there's triggers on the involved columns), see for example: https://www.mssqltips.com/sqlservertip/3074/use-caution-with...
I agree with you that Postgres' MERGE is most likely modeled after Oracle, but haven't looked into it either.
> My main gripe is the idea that MERGE is somehow better than INSERT ON CONFLICT for upserts […]
I, for one, find the MERGE syntax to be easier to read, more flexible and versatile and, most importantly, standardised. It is, effectively, UPSERT++ if you like, as it also allows one to delete rows from a table if there is a condition match. Compare
MERGE
target
USING
source
ON
target.col1 = source.col1 AND
target.col2 = source.col2 AND
…
WHEN MATCHED THEN
UPDATE SET -- or DELETE
target.col3 = source.col3,
…
WHEN NOT MATCHED BY TARGET THEN
INSERT (…) VALUES (…)
with
INSERT INTO target (…)
VALUES
(source.col1),
(…)
ON CONFLICT (target.col1)
DO UPDATE
SET target.col3 = source.col3;
Personally, I prefer the MERGE option, but your mileage may vary.
I also deem MERGE, as a technical term, to be more concise and much closer semantically to the intent it describes as opposed to INSERT ON CONFLICT. Rows are routinely updated in a database (it is its job after all), and a row update operation is not inherently a conflicting update. Conflicts (semantically) confer exceptional situations that have to be dealt with, well, in exceptional ways. But I digress as it is more of a lingustic subject.
> […] it was added rather as a proper solution, but for a more limited use case which is popular in OLTP workloads: simple thread-safe upsert.
As far as SQL (the language and the standard) is concerned, SQL is unaware of threads or the thread safety; SQL is concerned with transactions and with the transactional integrity. The database provides and ensures the ACID behaviour and guarantees, and it may or may not even use threads to accomplish it (as a matter of the fact, we know that most do but it is an implementation detail).
And MERGE is perfectly suitable for a variety of processing scenarios OLTP including. In one of my past projects from a few years back, replacing a series of handrolled UPSERT's with a single carefully written MERGE yielded a 1500% performance increase for real time freight item scan events flowing into the main table (granted, the inability to improve the shoddily designed schema was a hard constraint therefore a certain level of creativity was necessary). It was for a 10+ million processing events per hour scale, which is not huge but substantial.
> MERGE definitely has its place in complex ETL-style workloads
As an aside note, ETL workloads are not expected to modify a database in place which would otherwise make them one of the most vile and reviled integration anti-patterns. The (E) and (L) steps imply a distinct source and a distinct target, and the (T)ransform step takes place outside the DB, either in the integration layer or is done by a specialised ETL tool. That aside, I fail to see what is so ETL specific about MERGE; if there is a complex UPSERT scenario, a MERGE could be a good candidate (or not), the onus is on the engineer to carry out the analysis and make the right choice.
> Regarding SQL Server - it's not true that SQL Server will kill transactions based on some timeout (unless there's a deadlock detected).
It is true (or it was in SQL Server 2012). In the project I have mentioned earlier, I inherited a design and an implementation that were nothing short of a unmitigated dumpster fire, which also included «let's add another few random compound indices because what could possibly go wrong?». That had to be scrapped and re-engineered from scratch, as it turned out that SQL Server had a penchant for allotting a specific time window for a transaction to complete (irrespective of whether the transaction was serialised or not), and if an index update was taking longer than the time window allotted to the transaction, the DB engine would kill the transaction off. Such a peculiar behaviour was so unexpected that it caused multiple catastrophic cascading failures during the first production deployment attempt, and the deployment had to be aborted and rolled back. I had to enlist a DBA to work on the post-mortem and the subsequent redesign to understand the root cause. It turned out to be the documented SQL Server transaction engine behaviour. A painstakingly difficult, lengthy and time consuming low level index performance analysis and a subsequent meticulous complete index redesign solved the problem in the end.
> As far as SQL (the language and the standard) is concerned, SQL is unaware of threads or the thread safety; SQL is concerned with transactions and with the transactional integrity. The database provides and ensures the ACID behaviour and guarantees
When I say thread-safety, I just mean the general notion that running a certain SQL statement or procedure concurrently from multiple processes/threads/users/connections will result in correct execution without race conditions.
SQL is definitely concerned with something that is very close to the notion of thread-safety, namely transaction isolation (I in ACID). It's the property that controls concurrent execution of queries in the database, and it deals with what is called "phenomena" in SQL literature. Phenomena is essentially a set of specific types of race conditions which occur under different isolation levels.
And because default isolation level in most popular DBs is "Read Committed", - that is, a very relaxed isolation level allowing a lot of race conditions, - some of pretty basic operations such as upsert/merge are not thread-safe (or, if you dislike this term, you may say "have race conditions", or "do not avoid certain phenomena").
> It turned out to be the documented SQL Server transaction engine behaviour.
Would appreciate the link - it's either something that I haven't seen, or we're just using different terms for something deadlock-related.
>(T)ransform step takes place outside the DB
In a perfect world probably yes, but in reality there's plenty of cases where you have some staging tables that are then merged with production tables - that's where MERGE is a good candidate, as it can handle all three of insert, update and delete.
MERGE does have a number of its own peculiarities that are database engine specific and the MERGE behaviour is not portable across RDBMS's.
For instance, Oracle simply does not care about the transaction context width in which a MERGE is used, and the transaction will either succeed or fail no matter how long the transaction takes. Whereas MS SQL Server is very, very touchy and will kill off the transaction if, say, an index update caused by a MERGE is taking too long (from the SQL Server perspective). So it is better to minimise the transaction context containing a MERGE for the MS SQL Server and carefully assess the index update performance. I would expect Postgres to have the behaviour closer to that of Oracle, but I have not looked into it.
More generally speaking (and if we ignore MERGE implementation specific details for a moment), remember that the UPSERT pattern has come about as a workaround for the missing MERGE functionality which was added into the specification very late, and vendors were slow to implement it. MERGE covers a few very useful use cases, especially for complex UPSERT scenarios, but it does not obviate UPSERT's in simple scenarios. Both are useful.