Hacker News new | past | comments | ask | show | jobs | submit login

Eric, one of the cofounders, here. happy to answer any questions on MemSQL 4 and the community edition. Some new features in MemSQL 4:

- fully distributed joins

- native geospatial index and datatypes

- lots of new SQL surface area

- concurrency improvements

- analytic optimizer

- Spark, HDFS, and S3 connectors




Hi Eric!

Not sure if you remember me, but we spoke several (5?) years ago when you guys first started. I was the SAP HANA guy and I think we were talking about the landscape of in-memory solutions back then. First off, congrats on the success so far. Second, a few questions:

- How is MemSQL comparing to HANA and Vertica? My understanding is that MemSQL provides the same infrastructure (columnar in-memory based storage) of those solutions but will run on commodity hardware (HANA for example is hardware-vendor locked).

- One of the interesting topics that has come up in the HANA space is that it's expensive to maintain and scale. Specifically, provisioning new servers for data growth and archiving old data out of memory. Are these issues present at all in MemSQL?

- Lots of your customers seem to be using it for company-specific strategic solutions. Are any using it for operations? (like financial close reporting, or as a transactional DB)


Of course we remember you. Please stop by our new office!

You are right about the commodity hardware. The other difference with HANA is that MemSQL rowstores are in memory for high throughput applications and columnstores can be stored or flash or disks. So it's economical to scale MemSQL to very large datasets.

- MemSQL is very easy to scale. It comes with an ops dashboard that lets you add nodes with just a few clicks.

- There are a lot of different use cases. Some companies use us for operational reporting, end of day financial reporting, high throughput counters, real-time risk analysis, etc


I might take you up on that! Shoot me your contact details so I can set up (just did a search on my emails and can't find anything). My contact is in my profile.

Thanks!


eric at and nikita at memsql.


Cheers!


"but will run on commodity hardware"

We run memSQL 4.0 on 18 machine cluster, all commodity hardware. It is awesome.


Interesting. We've implemented a metadata layer for HDFS and YARN using NDB (MySQL Cluster) - that also supports READ COMMITTED transactions. Do you support:

- row-level locking

- independent transaction coordinators at data nodes

- pruned index scans

- network-aware transactions (with user-defined partition keys for tables)

- any asynchronous/event API

?


- row-level locking -> yes we use MVCC and take a row level write lock when necessary for consistency

- independent transaction coordinators at data nodes -> we have a tier called "aggregators" that act as transaction coordinators. These are the nodes you connect to. Under the hood leaf nodes in memsql also manage transactions.

- pruned index scans -> Do you mean information retrieval? Our indexes support seeks and range scans if that's what you mean.

- network-aware transactions (with user-defined partition keys for tables) --> yes, we have user-defined partition keys (shard keys) and transactions work across multiple nodes on the network.

- any asynchronous/event API --> no, we don't have an event API Most of our use cases are "pull" oriented which scales very well with MemSQL


Great. Lots of good stuff there. Pruned index scans are index scans where the data is located on a single shard and the index scan doesn't flood all nodes in the DB. I'll definitely be looking into MemSQL.


MemSQL partitions data across nodes by hash, not by range, so partition prunning is less applicable. However, in a case when it can be applied MemSQL does apply it. [1]

Within each node, for column store tables in MemSQL we do use segment elimination very aggressively, which is effectively the same thing as partition pruning. [2] [3]

[1] http://docs.memsql.com/latest/concepts/distributed_sql/#inde...

[2] http://docs.memsql.com/latest/concepts/columnar/#query-effic...

[3] http://docs.memsql.com/latest/concepts/columnar/#maintenance...


Shard key matching is effectively partitioned pruning - which is great. This is a feature not many people are aware of, but is super important when scaling to large clusters and when you have "session-oriented" (or in our case inode-oriented) data spread across different tables.


What is the replication picture for the community version? I can see that Enterprise has HA features but I have to guess that there is some form of safety if one node goes down in Community.


What's the catch here? :)



It seems like the improvements here are OLAP focused, and welcome ones at that, but the docs and product, if not the marketing, seem to be moving away from operational workloads.

From my interpretation of the docs, there are no "transactions" in the Jim Gray / ACID sense of the word. MemSQL offers transactional semantics with READ COMMITTED isolation. This is not just not SERIALIZABLE, it's also not REPEATABLE-READ or SNAPSHOT-READ.

For example, imagine a two statement transaction where statement 1 reads a counter value and statement 2 increments it. If two users run this transaction at the same time, the counter could lose an increment. This example is trivial and probably could be done in a single statement, but many other read-then-write operations could cause such an inconsistency.

Unless I'm misunderstanding something.


Hi @jhugg, a performant implementation of a counter usually does not read and update the value in separate statements within a transaction. Generally, people use UPDATE or INSERT...ON DUPLICATE KEY UPDATE (upsert) to implement this workload. In fact, transactional, high-throughput counters is an extremely common use-case for MemSQL [1].

As a matter of fact, even Oracle and MS SQL Server offer READ-COMMITTED as the default isolation level. Moreover, there are known issues with using SERIALIZABLE isolation in Oracle [2].

[1] - http://blog.memsql.com/high-speed-counters/

[2] - http://stackoverflow.com/questions/11826368/oracle-select-im...


Yes. I know there are other ways to do simple counters. The counter-example broadly applies to multi-statement operations that feed the output of reads into writes, i.e. in general transactions.

And yes, the defaults on many systems are low, but you can turn them up if you have a transactional workload. Read-committed might be fine for a Drupal backend, but it's not truly transactional.

Related and neat post:

http://www.bailis.org/blog/understanding-weak-isolation-is-a...

One of the relevant points Peter makes is that weaker isolation may work ok at low contention and low scale, which matches most DB workloads, but probably not the ones people on HN care about.



Well if MemSQL supports locks, then you can implement any stronger isolation model using both locks and READ COMMITTED transactions. Do they support row-level locking?


Yes, but then you losing the performance benefits.


seg-fault: Yep. That's me. You could take what I say with a skeptical eye because I work on a competing system, but it increasing appears that that's not actually true.

VoltDB for transactions and ingestion-time analytics and MemSQL for deeper analytics might be a neat combo system. YMMV.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: