Hacker News new | past | comments | ask | show | jobs | submit login

I wonder what order of magnitude of rows they deal with and how they optimize indices for querying that data.



Like everyone else?


I didn't downvote you, but you were downvoted for this. So maybe I'll explain further, instead, to stir conversation.

As you deal with more and more rows, it becomes imperative that all of your where clauses hit indexes/indices, but even beyond that, with large enough row sizes, it's not enough to provide fast responses.

I suspect they deal with this in the form of some sort of caching outside of MySQL, but I haven't read into it.

I'd be curious about how you respond to such challenges at the billions or trillions of rows orders of magnitude. Millions can be difficult enough, but beyond that you may have queries that effectively never return if you do not plan.

Related reading:

[1]: https://dba.stackexchange.com/questions/20335/can-mysql-reas...

[2]: http://webscalesql.org


> I suspect they deal with this in the form of some sort of caching outside of MySQL, but I haven't read into it.

It is handled by TAO, https://engineering.fb.com/2013/06/25/core-data/tao-the-powe...


Thank you for sharing this!


Facebook is heavily sharded, which keeps the size of each physical table at reasonable levels.

Similar story at nearly every large tech company using MySQL, aside from some more recent ones that go for the painful "shove everything in a huge singular AWS Aurora instance" approach :)


Thanks for that insight! Sounds like maybe you've seen an interesting thing or two given your background.


> the painful "shove everything in a huge singular AWS Aurora instance" approach :)

I haven't used Aurora - what's painful about this approach (apart from potentially being locked-in to AWS / Aurora)?


Aurora lets you grow your db storage up to 128 TB, but things become difficult and slow long before this point. This is true operationally (e.g. schema changes, restoring from a backup, etc) as well as potentially for application / query performance issues.

That said, sharding can be very painful too in other ways. But at least it permits infinite horizontal scaling, and it makes many operational tasks easier since they can be performed more granularly on each smaller shard.

Besides, once you get anywhere near Aurora's storage limit, you'll need to shard anyway. So it really just buys you some time before the inevitable.


Interesting, thanks for the insights




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: