I've actually written a compiler from database queries to native code (JIT'ed by...

vasilizo · on June 27, 2012

@Nitramp: Fortunately for all of us this world is full of unsolved problems many of which exist outside of consumer web.

There are use cases where 50ms is way too long, and where jitter associated with disk I/O is not acceptable. Capital markets for example. You are looking for sub-millisecond latency to be in the intraday post-trade game, and single digit milliseconds for quote-to-trade. The number of instructions in the hot code path (and cycles per instruction) actually starts to matter. Lock-free sturcutres utilized for storage are also important. This combination gives MemSQL some rather unique low-latency properties that certain folks can exploit to their delight.

To your other point, random reads and key-value lookups is where MemSQL shines. Since you never wait on I/O and don't take locks, optimizing execution makes a lot of sense actually. All that's left is execution and network.

chris_wot · on June 27, 2012

Since you never wait on I/O and don't take locks, optimizing execution makes a lot of sense actually. All that's left is execution and network.

I'm not sure where you are getting this information. According to the MemSQL documentation, they have one and only one isolation level, READ COMMITTED. But READ COMMITTED takes write locks and blocks reads, but doesn't take any read locks. Blocking behaviour still (and should!) still occur.

It sounds like you are looking for dirty read behaviour, but from the literature MemSQL doesn't support READ UNCOMMITTED.

Update: Read a later FAQ on the MemSQL website. They are using MVCC, so write locks are irrelevant. My apologies to the MemSQL team.

vasilizo · on June 27, 2012

Chris, MemSQL utilizes versioning to implement READ COMMITTED. In this implementation readers are never blocked.

It is addressed in this FAQ: http://developers.memsql.com/docs/1b/faq.html#c1-q4.

chris_wot · on June 27, 2012

OK, didn't see that you are using multiversion concurrency control.

I didn't realise you had a new FAQ up on the website - I'll have a look.

tintor · on June 27, 2012

@Criss Isolation levels are used to specify semantics only. DBMS is free to implement them in different ways. Take a look at this VLDB 2012 paper about optimistic concurrent control in main memory databases, like MemSQL, which also describes implementation of isolation levels: http://arxiv.org/pdf/1201.0228v1.pdf

chris_wot · on June 27, 2012

Ah. Sorry, didn't realise that you were using MVCC. That puts a different light on things.

gbog · on June 27, 2012

Get by id queries are majority when the database is used as a key-value store, and the only reason to such an underuse of RDBMSes is precisely because under heavy load some of them have hard time doing proper queries, with joins, subqueries and all the stuff databases are designed for.

Nitramp · on June 27, 2012

It's not "underusing" an RDBMs, those are just the lions share of queries in many applications. Also note that a join say from some parent item by ID to children who have a parent_id makes essentially two lookups by ID in two indices, so the same reasoning applies; this is not just SELECT * FROM bla WHERE id = :x.