Hacker News new | past | comments | ask | show | jobs | submit login

From the Postgres docs,

ON CONFLICT DO UPDATE guarantees an atomic INSERT or UPDATE outcome; provided there is no independent error, one of those two outcomes is guaranteed, even under high concurrency. This is also known as UPSERT — “UPDATE or INSERT”.

https://www.postgresql.org/docs/current/sql-insert.html

What are you referring to?




If I recall correctly (and it has been a while, so I'm not saying I am), the issue was with concurrent transactions inserting a record into tableA and another into tableB which has a foreign key constraint to tableA. The issue was likely specific to `ON CONFLICT DO NOTHING` and not `ON CONFLICT DO UPDATE`.

For example, let's saying you're building an index of open source packages and have two tables: package_type(id, name) and package(id, type_id, namespace, name).

If you receive two concurrent requests for `maven://log4j:log4j` and `maven://io.quarkus:quarkus`, a naive implementation to insert both "maven" and the packages if they don't exist might look something like this:

   WITH type_id AS (
     INSERT INTO package_type(name)
     VALUES (:type)
     RETURNING id
     ON CONFLICT DO NOTHING
   )
   INSERT INTO package (type_id, namespace, name)
   SELECT type_id, :namespace, :name
   FROM type
   ON CONFLICT DO NOTHING;
However, one or both inserts can fail intermittently because the primary key for `package_type` will be auto-incremented and thus the foreign key won't be valid. Also, as mentioned in another comment[0] this won't work if `maven` already exists in the `package_type` table.

[0]: https://news.ycombinator.com/item?id=36031790


Atomicity doesn't mean "doesn't fail", it means "either fails or succeeds, but does not succeed halfway".

There is nothing about what you are describing that is different from the behavior you'd get from a regular insert or update. If two transactions conflict, a rollback will occur. That isn't violating atomicity. In fact, it is the way by which atomicity is guaranteed.

The behavior of sequence values getting incremented and not committed, resulting in gaps in the sequence, is a separate matter, not specific to Postgres or to upsert.


I don't think this has to with concurrency; this query is fundamentally broken (besides typos), in that the CTE won't return anything if the package_type already exists.

You have two options:

(1) ON CONFLICT DO UPDATE, with dummy update:

    WITH
      type AS (
        INSERT INTO package_type (name)
        VALUES ($1)
        ON CONFLICT (name) DO UPDATE SET name = excluded.name
        RETURNING id
      )
    INSERT INTO package (type_id, namespace, name)
    SELECT id, $2, $3
    FROM type
    ON CONFLICT (type_id, namespace, name) DO UPDATE SET name = excluded.name
    RETURNING id;
(2) Separate statements with ON CONFLICT DO NOTHING (could be in a UDF if desired):

    INSERT INTO package_type (name)
    VALUES ($1)
    ON CONFLICT DO NOTHING;

    INSERT INTO package (type_id, namespace, name)
    SELECT type_id, $2, $3
    FROM package_type
    WHERE name = $1
    ON CONFLICT DO NOTHING;

    SELECT id
    FROM package
    WHERE (type_id, namespace, name) = ($1, $2, $3);


I think that actually more to the root of the problem, as other folks have noted, is that `ON CONFLICT DO NOTHING` means the RETURNING clause returns no rows if there is a conflict, which in my experience is rarely what people want. So instead people do `ON CONFLICT DO UPDATE` with a no-op update which has performance/locking implications, otherwise they need to do a complicated query (search stack overflow).

I wish that postgres would add some sort of backwards compatible option like `ON CONFLICT DO NOOP` or `ON CONFLICT DO RETURN` so that you got the semantics of `DO NOTHING` except that the conflicted rows are returned.


It increments the sequence associated with with a "serial" or "bigserial" field—usually used for primary keys. People are often surprised by this because they expect their primary keys to be sequential and sequences are designed to leave gaps in order to avoid a lot of locking.


That just seems like a fundamental misunderstanding of sequences to me. They are guaranteed to be increasing but not necessarily sequential.


I think it's more confusion of how `ON CONFLICT DO NOTHING` works. Incrementing sequences is not "doing nothing", even if it's valid and sensible behaviour, which leads to confusion.


In fairness to Postgres, “DO NOTHING” is accurate within the SQL domain if you squint. It’s reasonable for queries to have side-effects on the underlying implementation as long as the invariants defined by SQL are met. Surprising, yes, but the alternative would be leaking implementation details in the syntax.

On the other hand, it’s so easy to use and rely on autoincrement IDs and to assume they are monotonic and predictable based on naive testing. If we could do it all again I’d fuzz the IDs even during normal operation (occasionally increment by a random extra amount) so that they don’t become such a foot gun for developers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: