FWIW, Marginalia Search has had nearly a billion rows in a table. It runs on a single PC with 128 Gb RAM.
It's survived several HN death hugs.
Though I would say there are things you can do with a 20M table that you can't with a 1B table. If the query doesn't have an index, it will take hours. Even SELECT COUNT(*) takes like 10 min.
Generally speaking, why does SELECT COUNT(*) takes so much? I'd expect that the database maintains internal bookkeeping structures with table metadata that contain the number of rows in each table.
I reckon this is probably not true? If so, is it because it keeping a counter like that up-to-date would be inefficient
Edit: I just realized I might be misunderstanding what that query does
While not "SELECT APPROXIMATELY COUNT (*)," both MySQL and PostgreSQL both offer various metadata tables with approximate (albeit completely unfiltered) total row counts.
Many DB systems have some sort of HLL function to provide a similar approximation (although I think you’re overestimating the costs that MVCC impose on large datasets).
Essentially what SELECT COUNT(*) does in InnoDB is choose the smallest index and fully scans that, in parallel if there's no WHERE clause.
Meanwhile the primary key is typically the largest index, since with InnoDB's clustered index design, the primary key is the table. So it's usually not the best choice for counting unless there are no secondary indexes.
As other commenters mentioned, the query also must account for MVCC, which means properly counting only the rows that existed at the time your transaction started. If your workload has a lot of UPDATEs and/or DELETEs, this means traversing a lot of old row versions in UNDO spaces, which makes it slower.
It's survived several HN death hugs.
Though I would say there are things you can do with a 20M table that you can't with a 1B table. If the query doesn't have an index, it will take hours. Even SELECT COUNT(*) takes like 10 min.