Hacker News new | past | comments | ask | show | jobs | submit login

Learn about MVCC storage. You might have multiple running concurrent transactions. Which is the "true" row count? Hint: you have to count.



Makes me wonder if anybody would find a SELECT APPROXIMATELY COUNT(*) useful, which would ignore the impact of current transactions.


While not "SELECT APPROXIMATELY COUNT (*)," both MySQL and PostgreSQL both offer various metadata tables with approximate (albeit completely unfiltered) total row counts.


Many DB systems have some sort of HLL function to provide a similar approximation (although I think you’re overestimating the costs that MVCC impose on large datasets).


Note that HLL is used for counting `unique` things. You don't need HLL for counting the rows of the table.


Doesn't SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED do that? Or maybe a NOLOCK hint equivalent in MySQL?


Seems like you could just count the number of primary key entries in the index.


Essentially what SELECT COUNT(*) does in InnoDB is choose the smallest index and fully scans that, in parallel if there's no WHERE clause.

Meanwhile the primary key is typically the largest index, since with InnoDB's clustered index design, the primary key is the table. So it's usually not the best choice for counting unless there are no secondary indexes.

As other commenters mentioned, the query also must account for MVCC, which means properly counting only the rows that existed at the time your transaction started. If your workload has a lot of UPDATEs and/or DELETEs, this means traversing a lot of old row versions in UNDO spaces, which makes it slower.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: