Hacker News new | past | comments | ask | show | jobs | submit login
RediSQL – A Redis module that provides a functional SQL database (github.com/redbeardlab)
90 points by geospeck on Dec 16, 2016 | hide | past | favorite | 54 comments



I think that what the module does is actually not clear here in the comments I'm reading. It's mostly a nice "Hello World" module that shows how Redis can act as a proxy, but it's just a small glue-module that calls the SQLite library with the string passed as query. So you are running queries on a local sqlite server, proxied by Redis.

While this can be a good programming example for newcomers, I suggest to look at modules that are really implementing new DB paradigms on top of Redis. For instance this module builds a graph DB on top of Redis data types, with a query language and so forth:

https://github.com/swilly22/redis-module-graph


I completely agree. As someone who uses redis sql heavily I understand both have their use cases and wouldn't want to merge both to begin with.

On top of that, this looks like a simple proxy, so it's only making things less efficient.

OP, I'd suggest explaining the benefits better. Maybe I missed it but I don't really get this at all.


The modules is born just to fulfill one of our needs: we had several micro-services using redis and no DB at all, but then we find ourselves in the situation where we needed SQL capabilities, redis module where being announced and we give them a shoot.

What you mean by efficiency ? Operation at the second? Sure it is less efficient than Postgres or of SQLite embed in your application.

However if you mean convenience and simplicity this solution may (arguably) be efficient enough.

This module sit between a "full" client-server DB, a redis instance (that doesn't provide SQL capabilities) and SQLite that is not client-server.

Being completely honest, how HN teach us, build a great technical product without showing it as soon as it provide just a tiny amount of value is not desirable.

The comment in this thread indicate that there is some interest and I got some valuable suggestions and feedback on what to develop next.


Hello siscia. What's not clear to me if this is intended for production or if it's an example. I understand the general problem of proving SQLite with a network interface/API, and I guess, but I don't know, there are many solutions for that. If you are trying to solve the problem of SQLite not having a network interface by default, allowing the access from Redis via the Redis protocol, then the implementation is not optimal AFAIK because it does not use any threading but stops Redis while executing the queries in SQLite. Redis modules allow for non blocking commands doing the bulk of the work in threads, so I would suggest at least this approach if this is what you are doing. IMHO also there should be a more sensible conversion of data types from the SQL reply to the Redis protocol.

If Redis has to be stopped during SQL queries IMHO it's better to use another network layer with SQLite.


Ciao Salvatore,

I claim that the module is alpha code.

There is a non-blocking implementation on the way. The problem that I am facing is the tradeoff between using a single thread or multiple threads, but I came to the conclusion that it depends on the use cases. (SQLite on write lock a whole table no a row at the time, so I am expecting a single lockless thread to be faster on write intensive workload and multiple thread be faster on read intensive).

What you suggest as conversion of data types from SQL reply to redis ? Right now a select return a nested array, the outer for the rows the inners for the columns.

Thanks


Hey, glad to know you are going multi-threaded on that, since I believe it will be very easy. To start you could just use a thread per request (not just to start, probably even in the long run), immediately block the client via the module API to do this, fire the thread, wait for the thread to complete the work and accumulate the reply, and re-surrect the blocked client, feeding the reply. AFAIK this can work very well.

About the reply, returning a nested array as you are doing may actually be enough after all, even if the reply need to be parsed accordingly to the exact query. The alternative would be to annotate the reply with field names and things like that.

Another thing that probably could be improved: to provide direct support (by passing multiple arguments) to quote strings to pass to SQL, in order to allow the user to have an easy path to avoid security issues.


I tried to spawn a new thread for each request and it was embarrassedly slow, maybe I was doing something wrong from my side, but I believe I will settle for a pool.

About the reply the only thing that I see manageable is to include the name of the columns as first row, I was considering the idea but I am not completely sold yet.

I was thinking to bind a particular statement to a key in order to provide the possibility to have standard queries between different clients

  REDISQL.CREATE_STATEMENT insert_user "INSERT INTO users VALUES($1, $2, $3)" 
  > OK
  REDISQL.EXEC_STATEMENT insert_user name, password, email
  > OK
or with multiple insert in the same transaction

  REDISQL.CREATE_TRANSACTION
  > "REDISQL.TRANSACTION_12345"
  REDISQL.EXEC_STATEMENT "REDISQL.TRANSACTION_12345" insert_user name1 password1 email1
  > OK
  REDISQL.EXEC_STATEMENT "REDISQL.TRANSACTION_12345" insert_user name2 password2 email2
  > OK
  REDISQL.EXEC_STATEMENT "REDISQL.TRANSACTION_12345" insert_user name3 password3 email3
  > OK
  REDISQL.COMMIT_TRANSACTION "REDISQL.TRANSACTION_12345"
  > 3 # as the number of statement executed


Just to throw my two cents into the mix. The idea of being able to _query_ a single key in a SQL-like fashion is interesting, but I can't think of a production implementation that would be useful enough.

Instead, I'd look at using redis in front of something like MySQL so some operations go to redis and other right to the server.

Having to write SQL statements as redis EXECs isn't elegant IMO.


> Just to throw my two cents into the mix.

You have no idea how much those 2 cents are appreciated, really, I mean it.

> The idea of being able to _query_ a single key in a SQL-like fashion is interesting

Not sure I understood what you mean here. Would you mind to provide some kind of example or to explain it further?

> Instead, I'd look at using redis in front of something like MySQL so some operations go to redis and other right to the server.

Extremely interesting, how you discriminate though ? If it is going to be a "regular" cache you don't need a module, redis can already do it by itself. What use case you were think of?

> Having to write SQL statements as redis EXECs isn't elegant IMO.

I do completely agree, the API is going to get more structured for sure.


I've been working on a secondary indexing module for redis with a subset of SQL WHERE for selection predicates, if anyone's interested. It's not stable yet, but it's coming along nicely.

The idea was actually NOT to create an SQL like database, but just proxy ordinary redis commands via secondary index. So you select ids, and perform HGET or HGETALL on them, etc. And the same goes for indexing - you perform something like HMSET and "through" the index, and it performs the operation while indexing the hash automatically.

Also, indexes can be used as direct data types with no automation, you just insert ids and tuples of data into them, and you can query them and get ids back. https://github.com/RedisLabsModules/secondary


I tried to do the same (before great Lua support, using PHP), but overall performance was similar to MySQL (much more reliable than my experiment) so I abandoned it.

So would be very interesting to see benchmarks and comparisons with MySQL - I think right now we have more ways to implement it better.


Really interesting to hear reasons of downvoting. You think I'm lying or there is comparison?


PHP.


PHP wasn't a bottleneck in this case, most of the time has been spent on communication with DB (as usually in web apps). I think (and hope) this solution can show better performance because of less communications required, but numbers are more interesting than assumptions.

upd: comment about proxying sqlite eliminated this hope.


Author here, if you have any question please feel free to ask.

Also I have introduced the module in a blog post here: http://redbeardlab.tech/2016/12/13/redisql


Which would be a use case, from a technical perspective?

The blog mentions:

    If you are writing microservices to avoid bothering the master of DB with your need ephemeral needs of structured database
and

    I see this module perfect for small dataset of unimportant data.
In other words, why would one use this module in place of a straight SQLite database (possibly in-memory, if there is really such need for speed)? Such module doesn't simplify anything (as far as I understand), since SQL is used anyway.

I would understand this as an experiment (something like "ReDoom"), but since there will be a paid version, I'm trying to position the product technology.


SQLite gives you a lot of simplicity over client-server databases such as Postgres or MySQL.

A straight SQLite database doesn't quite give to you the same kind of possibility, imagine you have multiple machine that want to access the data.

It is a middle ground between "full" SQL databases and simpler solution.

It does use Redis for the client-server communication which gives reliability and is already well know in most companies.

My suggestion to use it for small, unimportant data is because there is still not persistency on disk, I am working on it right now.


Well, SQLite has to be accessed over a filesystem. This sounds like it would proxy to a sqlite db located on the Redis server. Still sounds dicey but I could see the appeal if you didn't have the DBA skills necessary to set up a decent PostgreSQL or other SQL server.


Well, although the requirements which would lead to such setup are clearer now, it's still smelly to me.

Under this conditions there is still a false dichotomy between RediSQL on one side, and "complex RDBMS" on the other.

What about simple RDMSs like H2? Such solution would meet the requirements:

    - network-accessible RDBMS
    - no configuration
The only requirement I can think of which would make this solution inadequate would be "No JVM on the system", which sound quite far-fetched.


Have you thought of ways to implement queries involving LIKE operations ? I worked on a similar project some time ago and one of the limitation for such implementation I encountered was use of glob-style patterns in redis.


Didn't quite get your question.

SQLite does implement the LIKE operator.

From the module:

  127.0.0.1:6379> REDISQL.EXEC "CREATE TABLE text_search(t TEXT);"
  OK
  127.0.0.1:6379> REDISQL.EXEC "INSERT INTO text_search VALUES('hello');"
  OK
  127.0.0.1:6379> REDISQL.EXEC "INSERT INTO text_search VALUES('banana');"
  OK
  127.0.0.1:6379> REDISQL.EXEC "INSERT INTO text_search VALUES('apple');"
  OK
  127.0.0.1:6379> 
  127.0.0.1:6379> REDISQL.EXEC "SELECT * FROM text_search WHERE t LIKE 'h_llo';"
  1) 1) "hello"
  127.0.0.1:6379> REDISQL.EXEC "SELECT * FROM text_search WHERE t LIKE '%anana';"
  1) 1) "banana"
  127.0.0.1:6379> REDISQL.EXEC "INSERT INTO text_search VALUES('anana');"
  OK
  127.0.0.1:6379> REDISQL.EXEC "SELECT * FROM text_search;"
  1) 1) "hello"
  2) 1) "banana"
  3) 1) "apple"
  4) 1) "anana"
  127.0.0.1:6379> REDISQL.EXEC "SELECT * FROM text_search WHERE t LIKE 'a%';"
  1) 1) "apple"
  2) 1) "anana"


This is just a bridge to in-memory sqlite accessible from redis connection – it does not emulate sql for redis data - both databases are completely independent, unfortunately.


Ah, I was under impression that it emulates SQL nature for redis data (kind of like an extension to InnoDB Memcache architecture). Anyways nice work @siscia, thanks for sharing !!


That's not hard to do, I did that in Thredis:

  redis 127.0.0.1:6379> set foo bar
  OK

  redis 127.0.0.1:6379> sql "select * from foo"
  1) 1) "key"
     2) "val"
  2) 1) "1"
     2) "bar"


This paragraph is what I needed to see in the project readme. Thanks.


Why you say unfortunately?

What are your uses cases?

It is possible to have in SQLite the keys from redis, however it does require a level of complexity that must be justified.


Can I use this to atomically delete thousands of existing keys? Is there a way to map the new SQL tables to existing keys? Or is this strictly for new SQL data tables?


This is a SQLite in-memory database running inside the redis service. It doesn't interface with the redis database.


As the other question, for now I am simply using Redis as support for SQLite.

You are talking about something definitely feasible but that require a really carefully implementation.

For now, it is strictly for new SQL data tables.

However I must say that put all your keys inside the SQL table is not difficult.


Redis and NoSQL data stores in general are for merge-on-write not merge-on-read. That's why most of them are non relational (no relations, foreign keys or joins).

This project encourages merge-on-read which is exactly the problem k/v stores try to solve in another way (merge-on-write).

The problem with relational data stores is that once you have relations then you need consistency and then you need moving operations to the data store itself (stored procedures), and transactions, etc... and suddenly you need an army of certified DBAs and a $1+ million dollar budget on datastores alone.

It's not the right way to go. Feel free to have SQL but adding joins is a bad idea.


The problem of consistency is related to supporting multi-key, read-write transactions of any sort, and has nothing to do with support for joins, or relations. You're not escaping from it somehow by using a NoSQL database.


In a relational database you are almost certainly going to maintain data about the same "entity" distributed across different tables. And that is encouraged (hence the term "relational databases").

In k/v datastores, e.g: document oriented, you can just put all the data related to one entity on one record and that's it... and dealing with multi-key operations is discouraged rather than encouraged.

Why discouraged? because it's hard to load balance, creates the need for synchronization and locking at a broader level, adds complexity to the maintenance operations, and so on, so forth.

The SQL thing exists because dBase, one of the first database products, had it. And then every product that entered the market had to go with it to diminish friction.


Okay, so your solution to consistency is "grab a big ol' lock over all possible related data, and hope like hell that you got all of it and it can live in one key." That doesn't really work when the amount of related data you have gets large enough, and it's really easy to have an efficient SQL database behind the lock (that's essentially how VoltDB works). Behind the lock, you can still have multiple tables, joins, and everything people associate with RDBMSes, without any of the consistency issues you mention.

Your discussion about what's "encouraged" or not is frankly irrelevant, as key/value stores are super easy compared to general relational databases and most RDBMSes are already quite good key/value stores (by contrast, most key/value stores make lousy RDBMSes). The simple fact of the matter is: if your key/value store doesn't support multikey read/write transactions, it has strictly less functionality than a database that does, regardless of how you use it, and if it does support multikey read/write transactions it'll have all the same performance issues you associate with RDBMSes (well, that's a slight oversimplification because it may provide a more limited API).


It's not even that in this case - it's just SQLite with a network layer, no persistence and each operation blocks redis completely until finished.


I did something similar a while ago (2012) (it even made it to the top of HN).

http://thredis.org/

Thredis also added multi-threading to Redis. The code is here: https://github.com/grisha/thredis


It's a cute project, I remember you posted it before. I remember somebody else had a go at this a few years back

https://groups.google.com/forum/#!topic/redis-db/TLfwGfldgUU


It is a little different from Threadis since, in my understanding, threadis is more a redis fork. While rediSQL is a redis module.

Using a redis module get you A LOT of thing from free and you really stand on the shoulder of giants.


> threadis is more a redis fork. While rediSQL is a redis module.

Thredis author here - Redis did not support modules back then, had I done this today it prolly would have been a module (though it's been a while since I hacked on Redis and I don't know how modules work)


Hi :)

sure, I took a deep look at your project and really appreciate you open source it.

Do you have any feedback to share?

Cheers,

Simone


Where does it save the data? Just in a blob in Redis? Does it support replication? How about persistence - - can I use standard Redis persistence? I'm kinda new to modules in Redis.

Definitely could use this for migration off SQLite to a solid DB.


This is the source: https://github.com/RedBeardLab/rediSQL/blob/master/rediSQL.c

> Where does it save the data?

It doesn't

>> in-memory database.

> Does it support replication?

That's really, really unlikely, because you can't do that on the sqlite VFS layer.

> migration off SQLite to a solid DB.

This is SQLite (in redis).


Replication is always tricky do to it right.

The very first and naive solution would be to simply stream all your statements to the slaves.


Just update the readme.

For now it works in memory, after Saturday it will save data in a standard SQLite file.

Replication and SQL are fairly tricky beast together. What is your use case?


My use case is: one SQLite DB for every client, scaling by distributing clients across servers. I would like to have all clients in one DB and make scaling easier.


That is quite interesting and achievable in reasonable bound of complexity.

I could associate each DB to a redis key and stream all the UPDATES on each db.

At this point you could simply listen to the stream and apply those updates to a master DB.

What you think?


see actordb (does the first not second)



It may. The real question in my opinion is: What kind of replication we need?

We want to replicate just to being sure that the data is safe? Then really you could simply stream to all the replicas what happens to the master.

You want to increase read performance? Then you first start caching result, then you start stream to slave and only make read from them.

You want to increase write performace? That is quite a bad beast. You start by making the single instance as fast as possible (96k insert per second [1]) then you pretty much hit the ceiling in my opinion.

You could partition the SQL tables, but then what happens to joins and select? They must go over the network (high latency), spot distributed deadlock (highly variable latency) and it start to become an huge complex project.

[1]: http://stackoverflow.com/questions/1711631/improve-insert-pe...


Hello siscia, do you have any benchmark against a sqlite Db on Ramdisk?


Hi Azureel, benchmark are a little tricky in this situation.

You are sending data to the redis instance, that read your statement and pass it to the module, the module execute and finally reply.

And everything really depends on your network and load on redis itself.

However, to just give some number on my machine I do 1000 insert on 0.6 seconds, the longest insert took 0.01 second and overall I can claim a total of 1000 / 0.6 => 1600 insert per second. Consider that each insert is a isolated transaction.

The test I run is exactly the one you can find here: https://github.com/RedBeardLab/rediSQL/blob/381e3796ad31c231...

Of course, if you start to insert JOINS everything is slowing down.

Also please consider that, at the moment, it is a BLOCKING version, while executing the statements redis itself is waiting.

Finally I believe there are possibilities to improve this numbers.


It'd be really interesting to use redis modules like this in the future to have redis itself be both the main interface to the database for programs and a caching layer.


Fully in-memory SQL can also be accomplished using IMOLTP via MS SQL Express on Linux now as well. And you get a fairly feature complete engine along with multithreaded execution, no blocking, and some great client options. The only limits right now are 4 cores and ~350MB of in-memory data per database.


My memory is a bit fuzzy right now, but isn't mysql itself providing that since forever too? Or do you fully in memory db - with a hard copy on the hard drive?


I think most RDBM systems have a way to pin a table or database in memory. The difference with the Microsoft solution is that it is a separate engine designed explicitly for memory. So there is no locking or blocking of pages. There are some other neat features as well, like hash indexes, but I have been able to push an 8 core box to ~80,000 inserts per second from a remote client, and over 200,000 per second for a test I ran on the host. I have not yet ran tests for express edition, but I will be interested to see how it performs.

To answer your other question, you have the option to either keep it all in memory (and lose on reboot) or commit the data to disk. In my testing I only saw a ~10% performance penalty for committed data to disk (granted we have extremely fast SSDs).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: