Is 20M of rows still a valid soft limit of MySQL table in 2023?

marginalia_nu · on May 23, 2023

FWIW, Marginalia Search has had nearly a billion rows in a table. It runs on a single PC with 128 Gb RAM.

It's survived several HN death hugs.

Though I would say there are things you can do with a 20M table that you can't with a 1B table. If the query doesn't have an index, it will take hours. Even SELECT COUNT(*) takes like 10 min.

endorphine · on May 23, 2023

Generally speaking, why does SELECT COUNT(*) takes so much? I'd expect that the database maintains internal bookkeeping structures with table metadata that contain the number of rows in each table.

I reckon this is probably not true? If so, is it because it keeping a counter like that up-to-date would be inefficient

Edit: I just realized I might be misunderstanding what that query does

ddorian43 · on May 23, 2023

Learn about MVCC storage. You might have multiple running concurrent transactions. Which is the "true" row count? Hint: you have to count.

jl6 · on May 23, 2023

Makes me wonder if anybody would find a SELECT APPROXIMATELY COUNT(*) useful, which would ignore the impact of current transactions.

15155 · on May 23, 2023

While not "SELECT APPROXIMATELY COUNT (*)," both MySQL and PostgreSQL both offer various metadata tables with approximate (albeit completely unfiltered) total row counts.

qeternity · on May 23, 2023

Many DB systems have some sort of HLL function to provide a similar approximation (although I think you’re overestimating the costs that MVCC impose on large datasets).

ddorian43 · on May 23, 2023

Note that HLL is used for counting `unique` things. You don't need HLL for counting the rows of the table.

kgwxd · on May 23, 2023

Doesn't SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED do that? Or maybe a NOLOCK hint equivalent in MySQL?

clarkmcc · on May 23, 2023

Seems like you could just count the number of primary key entries in the index.

evanelias · on May 23, 2023

Essentially what SELECT COUNT(*) does in InnoDB is choose the smallest index and fully scans that, in parallel if there's no WHERE clause.

Meanwhile the primary key is typically the largest index, since with InnoDB's clustered index design, the primary key is the table. So it's usually not the best choice for counting unless there are no secondary indexes.

As other commenters mentioned, the query also must account for MVCC, which means properly counting only the rows that existed at the time your transaction started. If your workload has a lot of UPDATEs and/or DELETEs, this means traversing a lot of old row versions in UNDO spaces, which makes it slower.

evanelias · on May 23, 2023

In my direct experience, with small rows and a carefully optimized workload and schema, 1 to 2 billion rows was totally fine in a single MySQL 5.1 table (InnoDB) on SSDs twelve years ago. Granted, that was on bare metal. But hardware has also improved a fair bit since then...

totalhack · on May 23, 2023

Haven't had trouble into the low hundreds of millions on a single RDS server with 16GB of ram. YMMV.

halper · on May 23, 2023

We had about half a billion rows in one table back in 2008, and doing thousands of updates/inserts/deletes per second. Can't remember how much RAN we had, but we sure did have a few spinning disks.

candiddevmike · on May 23, 2023

Is that web scale?

bombcar · on May 23, 2023

You run MongoDB as a backing store. Very web much scale.

devoutsalsa · on May 23, 2023

I prefer/dev/null for write heavy workloads that need long term storage. There’s plenty of data in modern physics to suggest that there’s no information loss from going into a black hole, so there shouldn’t be any problems. Put “architected web scale data singularity with bulletproof disaster recovery plan” in your CV. You don’t need to mention the recovery plan is to interview for new jobs when someone starts asking to access the data.

rakejake · on May 23, 2023

Haha. Good one.

arthurcolle · on May 23, 2023

Literally pays for itself in scale

weard_beard · on May 23, 2023

Anyone who wants to get these references: https://www.youtube.com/watch?v=b2F-DItXtZs

arthurcolle · on May 23, 2023

"Does /dev/null support sharding?" - literally dying over here, so good

TylerE · on May 23, 2023

Someone needs to remake this, just global search and replace “web scale” with “blockchain”.

arthurcolle · on May 23, 2023

This is Google: https://www.youtube.com/watch?v=3t6L-FlfeaI&t=12s

I don't even know how to count that low

brootstrap · on May 23, 2023

same. we've got 3 to 4 hundred million rows in one table and our queries still complete in the 100ms range!

noduerme · on May 23, 2023

Any partitioning?

grumple · on May 23, 2023

You can safely go to a billion rows. Use indexes. Don't do silly things like count(*) or select *, select the small data sets you need using your indexes. If you need a full table count for some reason, use a primary key that auto increments and select the max, or use information_schema for an estimate. Nobody will sweat the slight inaccuracy on a table that size.

Guest9081239812 · on May 23, 2023

Agreed. I have a handful of tables with over 2 billion rows and have experienced no issues. One database is 2.5TB in size. I feel like 20 million rows is a relatively small MySQL table.

grumple · on May 23, 2023

Nice! We're at 9TB... I agree that 20M is small enough that I'm not concerned about performance at all.

pollux1997 · on May 23, 2023

Could you provide some details, like MySQL version, IOPS and read/write performance?

mensetmanusman · on May 23, 2023

What types of data with over two billion rows?

Guest9081239812 · on May 23, 2023

It's related to user activity, along the lines of products a user views. The other large tables in the billions of rows are about analyzing how products are performing on other websites over time, helping to determine past and current trends for businesses.

grumple · on May 23, 2023

We have two tables at nearly 2 billion rows for line items on invoices (we aggregate data for a substantial portion of a major industry). We have ~20 tables that are over 100 million rows but less than 1 billion. Looks like another ~20 tables over 20 million.

To get this info on your own mysql db: `select * from information_schema.TABLES;`. As previously disclaimed, the TABLE_ROWS here are an estimate generally, see https://dev.mysql.com/doc/mysql-infoschema-excerpt/5.7/en/in...

(same in MySQL 8; we use both 5.7 and 8)

twelve40 · on May 23, 2023

yeah i was also under the 20mil impression initially until building a system that worked with a 500mil row table without any problems (just a beefy RDS instance, nothing crazy distributed or anything). Schema changes became a pain at that point, so i would probably split it if I were to do it again, but reads and writes never gave us much trouble.

newtom · on May 23, 2023

check out gh-ost migrations: https://github.com/github/gh-ost

charrondev · on May 23, 2023

What were you using for schema changes? I’ve been using pt-online-schema-change on InnoDB tables with 200m+ rows for a while, it’s just a bit slow.

makmanalp · on May 23, 2023

Just off the top of my head I can think of a few cases where we easily have 100-1000x that on a single table in shared node on kube, with an ebs volume (point being, bog standard, if a tad large, commodity hardware). If you properly index things and your workload isn't too crazy in a way that needs access to all of this at once (e.g. point lookups with some recency bias), classic RDBMSs have this use case covered - and it's achievable by anyone!

pollux1997 · on May 23, 2023

Thanks for repost the blog! Some crazy guys have already told me that they have mysql table >1.4B rows (~180GiB), underestimated mysql's power lol

mh- · on May 23, 2023

My current employer has a single InnoDB table that is 14-15 TB.

spudlyo · on May 23, 2023

How long does it take you to do a schema change?

hu3 · on May 23, 2023

Big MySQL deployments typically use pauseless online schema changes tooling like gh-ost:

https://planetscale.com/docs/learn/online-schema-change-tool...

mh- · on May 23, 2023

This is what we do. So "how long" is a difficult thing to put a number on, since you're intentionally throttling it to avoid impacting the database's performance (or building up slave lag).

pbreit · on May 23, 2023

How many rows?

bombcar · on May 23, 2023

Three. But many columns.

arthurcolle · on May 23, 2023

Trying to solve the three-body problem numerically? Haha you fools this was done years ago in FORTRAN

mh- · on May 23, 2023

Several billion.

nonethewiser · on May 23, 2023

what is it?

mh- · on May 23, 2023

Events that are enriched off of other data in the same DB.

The system design predates me, but it is solid (albeit difficult to operate at this scale - usual stuff like schema changes, replication bootstrapping).

numbsafari · on May 23, 2023

What is your mean time to recovery like when you have to restart your system?

mh- · on May 23, 2023

We keep hot masters (typical MySQL master-master-slave(s) setup) on standby.

Bootstrapping one from scratch (like if we need to stand up a new replica)? We restore a disk image on GCP from point in time snapshots, then let it catch up on replication. So it'll depend on how far behind it is when it comes online.

TLDR: a few hours in the worst case. ~Zero downtime in practice.

Sparkyte · on May 23, 2023

I bet queries take minutes.

koolba · on May 23, 2023

An indexed read should be no different than on a table with on the order of 100K rows. And that’s even with good ol’ spinning rust.

It’s all about logical I/O operations which, for an indexed read, would only be a handful regardless of size as it’s operating on a btree: https://en.m.wikipedia.org/wiki/B-tree

Now creating that index from scratch might take a while though…

brootstrap · on May 23, 2023

have worked on 20TB+ tables before. proper schema / index / query, shit is still fast as hell when you do it right

mh- · on May 23, 2023

Yep, was just sharing this experience in the comments as well. By far the most painful part is schema migrations, and that's well-solved with pt-osc or gh-osc.

duskwuff · on May 23, 2023

With MySQL 8.0+, you might not even need pt-osc. A lot of schema changes can be performed online now.

https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-op...

SteveNuts · on May 23, 2023

And still my coworkers who 20 years ago were burned by some MySQL minor issue will laugh at the mere suggestion that we could probably use MySQL.

People still think it's a toy to this day and they've struggled to shake that perception.

icedchai · on May 23, 2023

10+ years ago I worked at a smallish company and we had a table with over 500 million rows. This was on a high end system with several spinning rust disks in a raid 10 and 256 gigs of RAM. Billions in modern hardware seems quite reasonable.

pbreit · on May 23, 2023

Isn't 20m rows super tiny?

hedora · on May 23, 2023

Yes. MySQL has been laughing at that for over 20 years (especially if it fit under 2GB, since CPUs used to be 32 bit).

whalesalad · on May 23, 2023

very cool repo from the author full of all kinds of tests and scripts: https://github.com/gongyisheng/playground/blob/main/mysql/ro...

might need to fork/run this on psql to compare performance.

andrewmcwatters · on May 23, 2023

I love real meat articles like this where software engineers actually explain why particular limits exist, or why technical things are they way they are. Good read. Thanks for sharing.

forgetfreeman · on May 23, 2023

When was this ever a thing?

dijit · on May 23, 2023

there used to be some hard and fast rules about scaling vertically in sysadmin circles.

You can see where they come from when you consider some pretty static numbers in systems: like that page sizes are 4k, block sizes are usually pretty standard (512b or 4k) and network interfaces (at least the throughput) haven't increased for a decade or more.

Some of those rules need to be challenged as the technologies or systems change over time. 20M rows in a highly updated table should be an old "rule" though, at the latest from 2010, when mech drives were popular, because having your entire table index fitting on a few sectors on physical disk made a pretty substantial performance difference. Indexes need to flush to disk on update. (or they used to)

ericbarrett · on May 23, 2023

Love the article, but one criticism: I've found gp3 to be better than gp2 in all aspects, especially consistency—and it's cheaper. For a long time you could only use RDS with gp2 but I know of no reason to choose it in 2023.

davgoldin · on May 23, 2023

We've got MySQL 8 running on high-end-ish consumer hardware, with 150 tables ranging from thousands to billions of rows. Largest table is a bit over 5 billion rows and 520GiB. Parallel network backup takes about 3 hours.

hashim · on May 27, 2023

What does that table do?

tbrownaw · on May 23, 2023

...yes, performance will drop when your set of active data exceeds available memory and your system needs to reference slower storage. I assume this would apply to anything.

nonethewiser · on May 23, 2023

but that threshold is surely not generally 20M rows? I mean wouldn't it be completely dependent on how much memory you have?

irjustin · on May 23, 2023

it's obviously based on the data size itself, memory available etc etc.

Pretty clearly people ran the exact same test, fill a table with a bunch of empty/meaningless data and see where performance degrades - then write a catchy headline title/post.

In the end, it's more nuanced than that but i think the overall theme is know your tuning once you get to certain data sizes.

bagels · on May 23, 2023

You should be able to handle a billion for the right (simple) table and hardware. It really depends on the data and queries.

jve · on May 23, 2023

I see no problems of benchmark apart from you'd be avoiding table scans in prod for tables of those sizes - that kind of benchmark kind of pointless. Well, it shows a point, but not something you would want to use anyway.

Once data is cached, using indexed lookups are fast, 0.5ms.

encoderer · on May 23, 2023

We get sub-ms response time on a table with over 250 million rows. We have a covering index for all high-volume queries, running on rds (aurora) with 16gb ram.

frodowtf · on May 23, 2023

Do you need to do aggregations/scans on the table as well or is it mostly direct row accesses?

encoderer · on May 23, 2023

Yeah, we do a lot of those and they do take a lot longer, and with much wider variance. On the order of 500ms-5s. These are all done async so it’s not a problem but we have begun dual writing to clickhouse to get us out of the aggregations business.

jerryjerryjerry · on May 23, 2023

In general, NewSQL (distributed database) may still be a good solution to address scalability limit encountered by a single server/node database like MySQL/PG. Currently, there are quite a few options in this domain: Spanner, PG-compatible (CockroachDB, YugaByte), MySQL-compatible (TiDB, MariaDB xPand), and other middleware products to handle sharding.

AtlasBarfed · on May 23, 2023

B+ trees scale vertically very well, until they don't.

Sparkyte · on May 23, 2023

Hardware has limits as well as software. You would need a completely different architecture to accomodate a 20M+ database. Usually schemeless the database overhead is typically what is responsible how many rows it handle and because your client interacts with the MySQL engine it is at limit and performance of that engine. So yeeeee-no. Time to use a different database.

Dylan16807 · on May 23, 2023

> You would need a completely different architecture to accomodate a 20M+ database.

The article shows that nothing notable happens at 20M, so what limits are you talking about?

One more level of depth on the B+ tree is very easy to deal with.

fastball · on May 23, 2023

Postgres is also a relational DB and rows per table are "limited by the number of tuples that can fit onto 4,294,967,295 pages"

Sparkyte · on May 23, 2023

Yep, it is all different. Software and hardware. But what it comes down to isn't the number but the performance and we often have to watch these metrics closely.