Hacker News new | past | comments | ask | show | jobs | submit login

> no JOINs

Sharded environments still have some JOINs. Typically all data for a single user/customer/whatever should be placed on a single shard, and it's still very useful to join across tables for a single user.

> If you constrained the features on the non-sharded databases, you'd achieve the same net result.

No, that's not the main reason for operational benefits. Rather, it's simply because the shards all have a uniform schema, uniform query workload, and relatively small data size (as compared to a large monolithic DB). You can perform operational maintenance at better granularity -- i.e. on a single shard rather than the entire logical data set. And if a complex operation is fine on one shard, it's very likely fine on the rest, due to the uniformity.

For example, performing a DBMS major version upgrade on a large monolithic DB is a nightmare. It's an all-or-nothing affair. If you encounter some unforeseen problem with the new DB version only in prod, you can expect some significant application-wide downtime. Meanwhile for a sharded environment it's both easier and safer from an operational perspective, assuming your team is comfortable automating the process once it has been proven safe. You can upgrade one shard and confirm no problems after a week in prod before proceeding to the rest of the fleet. If unforeseen problems do occur, worst-case it only impacts a tiny portion of users.

> it also restricts your ability to perform other operations (more complicated queries or reports are ~impossible). It is not without its tradeoffs.

Yes, this is why I said above "There are definitely major downsides to sharding, but they tend to be more on the application side in my experience." OP claimed the downsides were operational (e.g. more complex failures or larger downtime), which I do not agree with.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: