Hacker News new | past | comments | ask | show | jobs | submit login

You can safely go to a billion rows. Use indexes. Don't do silly things like count(*) or select *, select the small data sets you need using your indexes. If you need a full table count for some reason, use a primary key that auto increments and select the max, or use information_schema for an estimate. Nobody will sweat the slight inaccuracy on a table that size.



Agreed. I have a handful of tables with over 2 billion rows and have experienced no issues. One database is 2.5TB in size. I feel like 20 million rows is a relatively small MySQL table.


Nice! We're at 9TB... I agree that 20M is small enough that I'm not concerned about performance at all.


Could you provide some details, like MySQL version, IOPS and read/write performance?


What types of data with over two billion rows?


It's related to user activity, along the lines of products a user views. The other large tables in the billions of rows are about analyzing how products are performing on other websites over time, helping to determine past and current trends for businesses.


We have two tables at nearly 2 billion rows for line items on invoices (we aggregate data for a substantial portion of a major industry). We have ~20 tables that are over 100 million rows but less than 1 billion. Looks like another ~20 tables over 20 million.

To get this info on your own mysql db: `select * from information_schema.TABLES;`. As previously disclaimed, the TABLE_ROWS here are an estimate generally, see https://dev.mysql.com/doc/mysql-infoschema-excerpt/5.7/en/in...

(same in MySQL 8; we use both 5.7 and 8)


yeah i was also under the 20mil impression initially until building a system that worked with a 500mil row table without any problems (just a beefy RDS instance, nothing crazy distributed or anything). Schema changes became a pain at that point, so i would probably split it if I were to do it again, but reads and writes never gave us much trouble.


check out gh-ost migrations: https://github.com/github/gh-ost


What were you using for schema changes? I’ve been using pt-online-schema-change on InnoDB tables with 200m+ rows for a while, it’s just a bit slow.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: