Hacker News new | past | comments | ask | show | jobs | submit login

Author here with just a quick note that writing this was a pretty great learning experience. Postgres has always been the ultimate black box — it does some amazing things, but I generally have no idea how.

I read a lot of Postgres code to get this finished, and I'm happy to say that for a codebase with so many authors, the quality of Postgres' is very high. Like any complex program it can be a little hard to trace through, but most of the naming is pretty self-explanatory, and it comes with some really amazing comments that walk you through the high-level design. I'd definitely recommend taking a look if anyone is curious.




Thanks for the writeup, it's at the necessary level of depth to convey the important concepts. I didn't have time to read through all of it yet, so maybe this is addressed, but there has always been something about transactions that I don't understand. How do they accomplish queries that rely on indices within a transaction? If they're not keeping a copy of the index for that transaction, but you wanted to do a sorted query that may or may not include some of your newly inserted rows prior to committing, do they simply do the query without the use of indexes? Do they do the query with indexes to create a temporary table, then run it again non-indexed on the temporary table including your new records? It would seem like that would have many edge cases to address.


Not OP, but I think the answer is that PostgreSQL updates the the index right away, which points to the heap tuple (and it's visibility information). This index value can already be accessed by concurrent transactions, but they decide if its visible or not by looking at the xmin/xmax of the heap tuple it belongs to. If the transaction is rolled back, it's up to VACUUM to remove both the dead heap tuple, as well as its index tuples.

AFAIK there are also optimizations that allow queries to avoid having to look up the individual heap tuples for visibility information by marking the entire page on the index as visible (useful for index-only scans). Yet other optimizations exist to not require updating indexes when updating existing rows where none of the indexed columns change (HOT).

Maybe somebody with a more in-depth understanding can comment, but hopefully the above is somewhat correct :).


That is correct.

There is one small inaccuracy in your summary, which is that a structure called the visibility map summarizes if a heap page is all visible; that's how index only scans work. At no point do we mark an index page as all-visible, because a HOT update would invalidate that.


Thanks for the clarification about the visibility map.

That being said, I don't understand why HOT would interfere with a hypothetical mechanism for marking index pages as all-visible. If a tuple gets a HOT update, I think the index values should remaing unchanged? And if they do, shouldn't their visibility on an all-visible index page remain unchanged as well?

Unrelated: Thank you so much for your work on "UPSERT" :)


I guess that I said HOT UPDATE because you talked about HOT. What you describe goes against how visibility information is maintained. There is an interlock between indexes and tables for VACUUM that would have to be considered. It would be hard to set an all-visible bit on a leaf page in a race-free manner.

A much simpler way of putting it would be that deletes would have to touch indexes if summarized visibility info was stored in indexes, even in cases where overall there is hardly any benefit from putting per-page visibility information in index pages. It's not impossible to make something like this work, but it's not obviously compatible with the current design.

I'm glad that you found UPSERT useful.


I wasn't really proposing to change how visibility information is maintained, my grasp on the internals is far to weak for that :). I just tried to make sure I understood your comment, which I think I do now. So thanks again for another insightful reply :)


I think what your looking for is called range lock. When a query sort or filter rows using a range predicate. If using repeatable read isolation. A range lock is created to detect conflict with another transaction inserting new matching row.


That's not how postgres works. You're talking about 2PL. Serializable isolation level in postgres does have something called predicate locks, but those are actually quite a different thing.


No I was talking exactly about Predicate Locking. They detect writes which conflict with earlier reads I dont think this could be called 2PL because it's only used to detect conflict.

Correctness requires that any insert into an index generate a rw-conflict with a concurrent serializable transaction if, after that insert, re-execution of any index scan of the other transaction would access the heap for a row not accessed during the previous execution.


Thanks for writing this up. As somebody who also likes to look into the internals of the PostgreSQL black box from time to time, posts like this are much appreciated :).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: