I am trying this out and I am still on the edge of whether I like it or not.
Create a table with a json column:
CREATE TABLE Doc (
id UUID PRIMARY KEY,
val JSONB NOT NULL
);
Then later it turns out all documents have user_ids so you add a check constraint and an index:
ALTER TABLE Doc ADD CONSTRAINT check_doc_val CHECK (
jsonb_typeof(val)='object' AND
val ? 'user_id' AND
jsonb_typeof(val->'user_id')='string'
);
CREATE INDEX doc_user_id ON Doc ((val->>'user_id'));
I think the postgres syntax for this is pretty ugly. And if you also want foreign key constraints you still have to move that part of the json out as a real column (or duplicate it as a column on Doc). I am not sure it's even worth it to have postgres check these constraints (vs just checking them in code).
I am also a little worried about performance (maybe prematurely). If that document is large, you will be rewriting the entire json blob each time you modify anything in it. A properly normalized schema can get away with a lot less rewriting?
Just a note about using uuid as a primary key. Typically you will use a b-tree index, which likes to keep things sorted. So something like a serial number works best, because it is already sorted and will be appended at the end. Otherwise inserting a new column will cause traversal the b-tree all over the place which will hurt performance it you do a lot of inserts.
If you really want to use uuid and care about performance you might prefix it with something that's increasing like a date, or perhaps (did not try it) use hash index (need to be PG 10+).
(We're getting way off topic) but I think the problem with auto increment is that it can't be sharded easily since multiple shards can increment to the same value. If you then try to go back to random ids you're now stuck with 8 bytes which will conflict once every billion items or so. I guess it's pretty extreme premature optimization but I think UUID is nicer for future-proofing at the cost of some performance. (I would love to see benchmarks to know exactly how much performance I am giving up though)
By the way uuidv1 is already prefixed by a timestamp! But unfortunately it doesn't use a sortable version of the time so it doesn't work for clustering the ids into the same page. I think it was really designed for distributed systems where you would want evenly distributed ids anyway.
Create a table with a json column:
Then later it turns out all documents have user_ids so you add a check constraint and an index: I think the postgres syntax for this is pretty ugly. And if you also want foreign key constraints you still have to move that part of the json out as a real column (or duplicate it as a column on Doc). I am not sure it's even worth it to have postgres check these constraints (vs just checking them in code).I am also a little worried about performance (maybe prematurely). If that document is large, you will be rewriting the entire json blob each time you modify anything in it. A properly normalized schema can get away with a lot less rewriting?