Hacker News new | past | comments | ask | show | jobs | submit login

I'm unfamiliar with vitess, but what exactly is the achievement here?

If you have a shared-nothing architecture, you can keep indefinitely adding nodes to get more throughput. You can easily do a billion qps if you wanted to.

The downside is that you have to eliminate relations (or at least don't expect them to be performant/scalable).

Am I missing something?




You understand the point. It's a real point. Show me a modern relational database that can scale this predictably on the cloud with this level operability.


A relational database without relations is an oxymoron. As folks pointed out, you also have to throw ACID away. So what's left of the original database, SQL-dialect? I bet that gets limited too.

Look, I get it, you have to sell your product. Some folks want semi-infinitely scalable storage, and they don't understand that the only way to achieve it is turning their DB into a key-value store. As a side effect they would have to rewrite their whole application from scratch, but they would only realize it after they get vendor locked in.

You can advertise your solution as MySQL-compatible. And I can claim that it's dishonest.


> A relational database without relations is an oxymoron.

OK. You're the only one talking to this straw man though. :-) Every Vitess user that I'm aware of has a pretty typical 2NF/3NF schema design. A small sampling of them being listed here: https://vitess.io

You setup your data distribution/partitioning/sharding scheme so that you have data locality for 99.9999+% of your queries -- meaning that the query executes against a data subset that lives on a single shard/node (e.g. sharding by customer_id) -- and you live with the performance hit and consistency tradeoffs for those very rare cases that cross shard queries cannot be avoided (Vitess does support this). You should do this even if the solution you're using claims to have distributed SQL with ACID and MVCC guarantees/properties. There's no magic that improves the speed of light and removes other resource constraints. In practice most people say they want perfect security/consistency/<name your desired property here> but then realize that the costs (perf, resources, $$, etc) are simply so high that it is not practical for their business/use case.

I know MySQL fairly well (I started working at MySQL, AB in 2003) and you can certainly claim that "MySQL-compatible" is dishonest but I would offer a counter claim that either you don't know this space very well or you're not operating in good faith here.


To be fair, I skimmed through your docs and did misread them initially: I thought you don't allow foreign keys, but you actually don't allow foreign key constraints.

If you are still allowing JOINs within a shard, then I need to apologize.


And Vitess supports shard local foreign key constraints, even if Planetscale disallows them. They don't work with all the cool online ddl tech.


We do allow joins within shard.


To pile on your answer a bit the manual bucketing you describe is exactly how ClickHouse works in most cases. It won't allow joins / IN on multiple distributed tables--i.e., sharded/replicated tables--unless you explicitly set a property called distributed_product_mode. [0] This is to prevent you from shooting yourself in the foot either by bad performance or improperly distributed data.

This constraint will eventually be relaxed but most apps are able to work around it just fine. The ones that can't use Snowflake.

[0] https://clickhouse.com/docs/en/operations/settings/settings/...


"Relational" actually refers to tables, not foreign keys. https://en.wikipedia.org/wiki/Relation_(database)


Indeed. Unfortunately the terminology is so easy to confuse. I have heard the term "relationship" used to describe foreign keys to distinguish it from "relation". But usage is inconsistent and everyone who has heard of "relational databases" but has not heard of "relational algebra" and "relational calculus" will think "relational" refers to foreign keys.

Foreign keys are a form of strong typing.


More to "relational algebra" of which joins are a part. (But not necessarily "foreign keys")


When you have to eliminate data relationships for it to scale, you no longer have a "relational" database.


You would be surprised then. Most SaaS companies can easily shard by customer. All the customer data stays together, relational and all.

Cross-customer queries will be somewhat slower.


Sure, but what you are describing is no longer a multi-tenant application/database. It's essentially a single-tenant deployment of your tech stack per customer. Which is not very cost effective.


It’s multi tenant —- you have 10,000,000 customers with 1,000,000 each on 10 servers, for example.

Or I don’t understand what you mean…


I would still consider it a multi-tenant system. It's a single database to manage, which distributes your customers using their identity as a partitioning key.


No, a single database manages multiple customers but the customers are distributed among multiple databases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: