Eric, one of the cofounders, here. happy to answer any questions on MemSQL 4 and...

mbesto · on May 20, 2015

Hi Eric!

Not sure if you remember me, but we spoke several (5?) years ago when you guys first started. I was the SAP HANA guy and I think we were talking about the landscape of in-memory solutions back then. First off, congrats on the success so far. Second, a few questions:

- How is MemSQL comparing to HANA and Vertica? My understanding is that MemSQL provides the same infrastructure (columnar in-memory based storage) of those solutions but will run on commodity hardware (HANA for example is hardware-vendor locked).

- One of the interesting topics that has come up in the HANA space is that it's expensive to maintain and scale. Specifically, provisioning new servers for data growth and archiving old data out of memory. Are these issues present at all in MemSQL?

- Lots of your customers seem to be using it for company-specific strategic solutions. Are any using it for operations? (like financial close reporting, or as a transactional DB)

ericfrenkiel · on May 20, 2015

Of course we remember you. Please stop by our new office!

You are right about the commodity hardware. The other difference with HANA is that MemSQL rowstores are in memory for high throughput applications and columnstores can be stored or flash or disks. So it's economical to scale MemSQL to very large datasets.

- MemSQL is very easy to scale. It comes with an ops dashboard that lets you add nodes with just a few clicks.

- There are a lot of different use cases. Some companies use us for operational reporting, end of day financial reporting, high throughput counters, real-time risk analysis, etc

mbesto · on May 20, 2015

I might take you up on that! Shoot me your contact details so I can set up (just did a search on my emails and can't find anything). My contact is in my profile.

Thanks!

nikita · on May 20, 2015

eric at and nikita at memsql.

mbesto · on May 21, 2015

Cheers!

rjonesx · on May 20, 2015

"but will run on commodity hardware"

We run memSQL 4.0 on 18 machine cluster, all commodity hardware. It is awesome.

jamesblonde · on May 20, 2015

Interesting. We've implemented a metadata layer for HDFS and YARN using NDB (MySQL Cluster) - that also supports READ COMMITTED transactions. Do you support:

- row-level locking

- independent transaction coordinators at data nodes

- pruned index scans

- network-aware transactions (with user-defined partition keys for tables)

- any asynchronous/event API

?

ericfrenkiel · on May 20, 2015

- row-level locking -> yes we use MVCC and take a row level write lock when necessary for consistency

- independent transaction coordinators at data nodes -> we have a tier called "aggregators" that act as transaction coordinators. These are the nodes you connect to. Under the hood leaf nodes in memsql also manage transactions.

- pruned index scans -> Do you mean information retrieval? Our indexes support seeks and range scans if that's what you mean.

- network-aware transactions (with user-defined partition keys for tables) --> yes, we have user-defined partition keys (shard keys) and transactions work across multiple nodes on the network.

- any asynchronous/event API --> no, we don't have an event API Most of our use cases are "pull" oriented which scales very well with MemSQL

jamesblonde · on May 21, 2015

Great. Lots of good stuff there. Pruned index scans are index scans where the data is located on a single shard and the index scan doesn't flood all nodes in the DB. I'll definitely be looking into MemSQL.

SkidanovAlex · on May 20, 2015

MemSQL partitions data across nodes by hash, not by range, so partition prunning is less applicable. However, in a case when it can be applied MemSQL does apply it. [1]

Within each node, for column store tables in MemSQL we do use segment elimination very aggressively, which is effectively the same thing as partition pruning. [2] [3]

[1] http://docs.memsql.com/latest/concepts/distributed_sql/#inde...

[2] http://docs.memsql.com/latest/concepts/columnar/#query-effic...

[3] http://docs.memsql.com/latest/concepts/columnar/#maintenance...

jamesblonde · on May 21, 2015

Shard key matching is effectively partitioned pruning - which is great. This is a feature not many people are aware of, but is super important when scaling to large clusters and when you have "session-oriented" (or in our case inode-oriented) data spread across different tables.

aemadrid · on May 21, 2015

What is the replication picture for the community version? I can see that Enterprise has HA features but I have to guess that there is some form of safety if one node goes down in Community.

darkxanthos · on May 20, 2015

What's the catch here? :)

ericfrenkiel · on May 20, 2015

http://docs.memsql.com/latest/faq/#what-is-memsql-not-for

jhugg · on May 20, 2015

It seems like the improvements here are OLAP focused, and welcome ones at that, but the docs and product, if not the marketing, seem to be moving away from operational workloads.

From my interpretation of the docs, there are no "transactions" in the Jim Gray / ACID sense of the word. MemSQL offers transactional semantics with READ COMMITTED isolation. This is not just not SERIALIZABLE, it's also not REPEATABLE-READ or SNAPSHOT-READ.

For example, imagine a two statement transaction where statement 1 reads a counter value and statement 2 increments it. If two users run this transaction at the same time, the counter could lose an increment. This example is trivial and probably could be done in a single statement, but many other read-then-write operations could cause such an inconsistency.

Unless I'm misunderstanding something.

nizamsql · on May 20, 2015

Hi @jhugg, a performant implementation of a counter usually does not read and update the value in separate statements within a transaction. Generally, people use UPDATE or INSERT...ON DUPLICATE KEY UPDATE (upsert) to implement this workload. In fact, transactional, high-throughput counters is an extremely common use-case for MemSQL [1].

As a matter of fact, even Oracle and MS SQL Server offer READ-COMMITTED as the default isolation level. Moreover, there are known issues with using SERIALIZABLE isolation in Oracle [2].

[1] - http://blog.memsql.com/high-speed-counters/

[2] - http://stackoverflow.com/questions/11826368/oracle-select-im...

jhugg · on May 20, 2015

Yes. I know there are other ways to do simple counters. The counter-example broadly applies to multi-statement operations that feed the output of reads into writes, i.e. in general transactions.

And yes, the defaults on many systems are low, but you can turn them up if you have a transactional workload. Read-committed might be fine for a Drupal backend, but it's not truly transactional.