I wonder what's the standard of "big" in these context--for example I always thought that some million rows is a very big, but only recently learned that it's a size that RDBMS such as PostgreSQL can handle it with no problem.
Whatever big is defined as, it needs to at least be that your data can't fit into RAM on a high end server.
There's also the threshold where your indexes don't fit into RAM. And the threshold where your data no longer fits into PCIe SSDs on a single server. (The combined bandwidth of the SSDs will rival the RAM, but with more latency.)
These days I’d probably describe “big” as “doesn’t make sense to use SQL anymore”.
Qualitatively, I think it becomes “big” when you have to leave the space of generic “it just works” technologies and you have to start doing bespoke optimizations. It’s amazing how far you can get these days without having to go custom.
These days terabytes is a medium-sized database. A trillion rows of indexed data will fit on a single cloud VM and be reasonably performant.
I think a good definition of "large" is "several times larger than will fit on a practical modern server". Servers with a petabyte or more of fast attached storage are increasingly common, so that threshold is pretty high. Machine-generated data models (sensing, telemetry, et al) routinely exceed this threshold though.
As someone that does it all the time on 10b+ record tables, not really? if you don't have extra resources to build the occasional index your DB is under-provisioned and you're close to falling over, cluster or not.
ah I forgot grandparent comment said trillion. yeah that's an order of magnitude I would distribute across a cluster if we were in active development and indexes were changing etc. if you're storing that much data you should be able to afford it :)