Hacker News new | past | comments | ask | show | jobs | submit login

Interesting. We've implemented a metadata layer for HDFS and YARN using NDB (MySQL Cluster) - that also supports READ COMMITTED transactions. Do you support:

- row-level locking

- independent transaction coordinators at data nodes

- pruned index scans

- network-aware transactions (with user-defined partition keys for tables)

- any asynchronous/event API

?




- row-level locking -> yes we use MVCC and take a row level write lock when necessary for consistency

- independent transaction coordinators at data nodes -> we have a tier called "aggregators" that act as transaction coordinators. These are the nodes you connect to. Under the hood leaf nodes in memsql also manage transactions.

- pruned index scans -> Do you mean information retrieval? Our indexes support seeks and range scans if that's what you mean.

- network-aware transactions (with user-defined partition keys for tables) --> yes, we have user-defined partition keys (shard keys) and transactions work across multiple nodes on the network.

- any asynchronous/event API --> no, we don't have an event API Most of our use cases are "pull" oriented which scales very well with MemSQL


Great. Lots of good stuff there. Pruned index scans are index scans where the data is located on a single shard and the index scan doesn't flood all nodes in the DB. I'll definitely be looking into MemSQL.


MemSQL partitions data across nodes by hash, not by range, so partition prunning is less applicable. However, in a case when it can be applied MemSQL does apply it. [1]

Within each node, for column store tables in MemSQL we do use segment elimination very aggressively, which is effectively the same thing as partition pruning. [2] [3]

[1] http://docs.memsql.com/latest/concepts/distributed_sql/#inde...

[2] http://docs.memsql.com/latest/concepts/columnar/#query-effic...

[3] http://docs.memsql.com/latest/concepts/columnar/#maintenance...


Shard key matching is effectively partitioned pruning - which is great. This is a feature not many people are aware of, but is super important when scaling to large clusters and when you have "session-oriented" (or in our case inode-oriented) data spread across different tables.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: