I think that what the module does is actually not clear here in the comments I'm reading. It's mostly a nice "Hello World" module that shows how Redis can act as a proxy, but it's just a small glue-module that calls the SQLite library with the string passed as query. So you are running queries on a local sqlite server, proxied by Redis.
While this can be a good programming example for newcomers, I suggest to look at modules that are really implementing new DB paradigms on top of Redis. For instance this module builds a graph DB on top of Redis data types, with a query language and so forth:
The modules is born just to fulfill one of our needs: we had several micro-services using redis and no DB at all, but then we find ourselves in the situation where we needed SQL capabilities, redis module where being announced and we give them a shoot.
What you mean by efficiency ? Operation at the second? Sure it is less efficient than Postgres or of SQLite embed in your application.
However if you mean convenience and simplicity this solution may (arguably) be efficient enough.
This module sit between a "full" client-server DB, a redis instance (that doesn't provide SQL capabilities) and SQLite that is not client-server.
Being completely honest, how HN teach us, build a great technical product without showing it as soon as it provide just a tiny amount of value is not desirable.
The comment in this thread indicate that there is some interest and I got some valuable suggestions and feedback on what to develop next.
Hello siscia. What's not clear to me if this is intended for production or if it's an example. I understand the general problem of proving SQLite with a network interface/API, and I guess, but I don't know, there are many solutions for that. If you are trying to solve the problem of SQLite not having a network interface by default, allowing the access from Redis via the Redis protocol, then the implementation is not optimal AFAIK because it does not use any threading but stops Redis while executing the queries in SQLite. Redis modules allow for non blocking commands doing the bulk of the work in threads, so I would suggest at least this approach if this is what you are doing. IMHO also there should be a more sensible conversion of data types from the SQL reply to the Redis protocol.
If Redis has to be stopped during SQL queries IMHO it's better to use another network layer with SQLite.
There is a non-blocking implementation on the way. The problem that I am facing is the tradeoff between using a single thread or multiple threads, but I came to the conclusion that it depends on the use cases. (SQLite on write lock a whole table no a row at the time, so I am expecting a single lockless thread to be faster on write intensive workload and multiple thread be faster on read intensive).
What you suggest as conversion of data types from SQL reply to redis ?
Right now a select return a nested array, the outer for the rows the inners for the columns.
Hey, glad to know you are going multi-threaded on that, since I believe it will be very easy. To start you could just use a thread per request (not just to start, probably even in the long run), immediately block the client via the module API to do this, fire the thread, wait for the thread to complete the work and accumulate the reply, and re-surrect the blocked client, feeding the reply. AFAIK this can work very well.
About the reply, returning a nested array as you are doing may actually be enough after all, even if the reply need to be parsed accordingly to the exact query. The alternative would be to annotate the reply with field names and things like that.
Another thing that probably could be improved: to provide direct support (by passing multiple arguments) to quote strings to pass to SQL, in order to allow the user to have an easy path to avoid security issues.
I tried to spawn a new thread for each request and it was embarrassedly slow, maybe I was doing something wrong from my side, but I believe I will settle for a pool.
About the reply the only thing that I see manageable is to include the name of the columns as first row, I was considering the idea but I am not completely sold yet.
I was thinking to bind a particular statement to a key in order to provide the possibility to have standard queries between different clients
REDISQL.CREATE_STATEMENT insert_user "INSERT INTO users VALUES($1, $2, $3)"
> OK
REDISQL.EXEC_STATEMENT insert_user name, password, email
> OK
or with multiple insert in the same transaction
REDISQL.CREATE_TRANSACTION
> "REDISQL.TRANSACTION_12345"
REDISQL.EXEC_STATEMENT "REDISQL.TRANSACTION_12345" insert_user name1 password1 email1
> OK
REDISQL.EXEC_STATEMENT "REDISQL.TRANSACTION_12345" insert_user name2 password2 email2
> OK
REDISQL.EXEC_STATEMENT "REDISQL.TRANSACTION_12345" insert_user name3 password3 email3
> OK
REDISQL.COMMIT_TRANSACTION "REDISQL.TRANSACTION_12345"
> 3 # as the number of statement executed
Just to throw my two cents into the mix. The idea of being able to _query_ a single key in a SQL-like fashion is interesting, but I can't think of a production implementation that would be useful enough.
Instead, I'd look at using redis in front of something like MySQL so some operations go to redis and other right to the server.
Having to write SQL statements as redis EXECs isn't elegant IMO.
You have no idea how much those 2 cents are appreciated, really, I mean it.
> The idea of being able to _query_ a single key in a SQL-like fashion is interesting
Not sure I understood what you mean here. Would you mind to provide some kind of example or to explain it further?
> Instead, I'd look at using redis in front of something like MySQL so some operations go to redis and other right to the server.
Extremely interesting, how you discriminate though ? If it is going to be a "regular" cache you don't need a module, redis can already do it by itself. What use case you were think of?
> Having to write SQL statements as redis EXECs isn't elegant IMO.
I do completely agree, the API is going to get more structured for sure.
I've been working on a secondary indexing module for redis with a subset of SQL WHERE for selection predicates, if anyone's interested. It's not stable yet, but it's coming along nicely.
The idea was actually NOT to create an SQL like database, but just proxy ordinary redis commands via secondary index. So you select ids, and perform HGET or HGETALL on them, etc. And the same goes for indexing - you perform something like HMSET and "through" the index, and it performs the operation while indexing the hash automatically.
Also, indexes can be used as direct data types with no automation, you just insert ids and tuples of data into them, and you can query them and get ids back.
https://github.com/RedisLabsModules/secondary
I tried to do the same (before great Lua support, using PHP), but overall performance was similar to MySQL (much more reliable than my experiment) so I abandoned it.
So would be very interesting to see benchmarks and comparisons with MySQL - I think right now we have more ways to implement it better.
PHP wasn't a bottleneck in this case, most of the time has been spent on communication with DB (as usually in web apps). I think (and hope) this solution can show better performance because of less communications required, but numbers are more interesting than assumptions.
upd: comment about proxying sqlite eliminated this hope.
Which would be a use case, from a technical perspective?
The blog mentions:
If you are writing microservices to avoid bothering the master of DB with your need ephemeral needs of structured database
and
I see this module perfect for small dataset of unimportant data.
In other words, why would one use this module in place of a straight SQLite database (possibly in-memory, if there is really such need for speed)? Such module doesn't simplify anything (as far as I understand), since SQL is used anyway.
I would understand this as an experiment (something like "ReDoom"), but since there will be a paid version, I'm trying to position the product technology.
Well, SQLite has to be accessed over a filesystem. This sounds like it would proxy to a sqlite db located on the Redis server.
Still sounds dicey but I could see the appeal if you didn't have the DBA skills necessary to set up a decent PostgreSQL or other SQL server.
Have you thought of ways to implement queries involving LIKE operations ? I worked on a similar project some time ago and one of the limitation for such implementation I encountered was use of glob-style patterns in redis.
127.0.0.1:6379> REDISQL.EXEC "CREATE TABLE text_search(t TEXT);"
OK
127.0.0.1:6379> REDISQL.EXEC "INSERT INTO text_search VALUES('hello');"
OK
127.0.0.1:6379> REDISQL.EXEC "INSERT INTO text_search VALUES('banana');"
OK
127.0.0.1:6379> REDISQL.EXEC "INSERT INTO text_search VALUES('apple');"
OK
127.0.0.1:6379>
127.0.0.1:6379> REDISQL.EXEC "SELECT * FROM text_search WHERE t LIKE 'h_llo';"
1) 1) "hello"
127.0.0.1:6379> REDISQL.EXEC "SELECT * FROM text_search WHERE t LIKE '%anana';"
1) 1) "banana"
127.0.0.1:6379> REDISQL.EXEC "INSERT INTO text_search VALUES('anana');"
OK
127.0.0.1:6379> REDISQL.EXEC "SELECT * FROM text_search;"
1) 1) "hello"
2) 1) "banana"
3) 1) "apple"
4) 1) "anana"
127.0.0.1:6379> REDISQL.EXEC "SELECT * FROM text_search WHERE t LIKE 'a%';"
1) 1) "apple"
2) 1) "anana"
This is just a bridge to in-memory sqlite accessible from redis connection – it does not emulate sql for redis data - both databases are completely independent, unfortunately.
Ah, I was under impression that it emulates SQL nature for redis data (kind of like an extension to InnoDB Memcache architecture). Anyways nice work @siscia, thanks for sharing !!
Can I use this to atomically delete thousands of existing keys? Is there a way to map the new SQL tables to existing keys? Or is this strictly for new SQL data tables?
Redis and NoSQL data stores in general are for merge-on-write not merge-on-read. That's why most of them are non relational (no relations, foreign keys or joins).
This project encourages merge-on-read which is exactly the problem k/v stores try to solve in another way (merge-on-write).
The problem with relational data stores is that once you have relations then you need consistency and then you need moving operations to the data store itself (stored procedures), and transactions, etc... and suddenly you need an army of certified DBAs and a $1+ million dollar budget on datastores alone.
It's not the right way to go. Feel free to have SQL but adding joins is a bad idea.
The problem of consistency is related to supporting multi-key, read-write transactions of any sort, and has nothing to do with support for joins, or relations. You're not escaping from it somehow by using a NoSQL database.
In a relational database you are almost certainly going to maintain data about the same "entity" distributed across different tables. And that is encouraged (hence the term "relational databases").
In k/v datastores, e.g: document oriented, you can just put all the data related to one entity on one record and that's it... and dealing with multi-key operations is discouraged rather than encouraged.
Why discouraged? because it's hard to load balance, creates the need for synchronization and locking at a broader level, adds complexity to the maintenance operations, and so on, so forth.
The SQL thing exists because dBase, one of the first database products, had it. And then every product that entered the market had to go with it to diminish friction.
Okay, so your solution to consistency is "grab a big ol' lock over all possible related data, and hope like hell that you got all of it and it can live in one key." That doesn't really work when the amount of related data you have gets large enough, and it's really easy to have an efficient SQL database behind the lock (that's essentially how VoltDB works). Behind the lock, you can still have multiple tables, joins, and everything people associate with RDBMSes, without any of the consistency issues you mention.
Your discussion about what's "encouraged" or not is frankly irrelevant, as key/value stores are super easy compared to general relational databases and most RDBMSes are already quite good key/value stores (by contrast, most key/value stores make lousy RDBMSes). The simple fact of the matter is: if your key/value store doesn't support multikey read/write transactions, it has strictly less functionality than a database that does, regardless of how you use it, and if it does support multikey read/write transactions it'll have all the same performance issues you associate with RDBMSes (well, that's a slight oversimplification because it may provide a more limited API).
> threadis is more a redis fork. While rediSQL is a redis module.
Thredis author here - Redis did not support modules back then, had I done this today it prolly would have been a module (though it's been a while since I hacked on Redis and I don't know how modules work)
Where does it save the data? Just in a blob in Redis? Does it support replication? How about persistence - - can I use standard Redis persistence? I'm kinda new to modules in Redis.
Definitely could use this for migration off SQLite to a solid DB.
My use case is: one SQLite DB for every client, scaling by distributing clients across servers. I would like to have all clients in one DB and make scaling easier.
It may. The real question in my opinion is: What kind of replication we need?
We want to replicate just to being sure that the data is safe? Then really you could simply stream to all the replicas what happens to the master.
You want to increase read performance? Then you first start caching result, then you start stream to slave and only make read from them.
You want to increase write performace? That is quite a bad beast. You start by making the single instance as fast as possible (96k insert per second [1]) then you pretty much hit the ceiling in my opinion.
You could partition the SQL tables, but then what happens to joins and select? They must go over the network (high latency), spot distributed deadlock (highly variable latency) and it start to become an huge complex project.
Hi Azureel, benchmark are a little tricky in this situation.
You are sending data to the redis instance, that read your statement and pass it to the module, the module execute and finally reply.
And everything really depends on your network and load on redis itself.
However, to just give some number on my machine I do 1000 insert on 0.6 seconds, the longest insert took 0.01 second and overall I can claim a total of 1000 / 0.6 => 1600 insert per second. Consider that each insert is a isolated transaction.
It'd be really interesting to use redis modules like this in the future to have redis itself be both the main interface to the database for programs and a caching layer.
Fully in-memory SQL can also be accomplished using IMOLTP via MS SQL Express on Linux now as well. And you get a fairly feature complete engine along with multithreaded execution, no blocking, and some great client options. The only limits right now are 4 cores and ~350MB of in-memory data per database.
My memory is a bit fuzzy right now, but isn't mysql itself providing that since forever too? Or do you fully in memory db - with a hard copy on the hard drive?
I think most RDBM systems have a way to pin a table or database in memory. The difference with the Microsoft solution is that it is a separate engine designed explicitly for memory. So there is no locking or blocking of pages. There are some other neat features as well, like hash indexes, but I have been able to push an 8 core box to ~80,000 inserts per second from a remote client, and over 200,000 per second for a test I ran on the host. I have not yet ran tests for express edition, but I will be interested to see how it performs.
To answer your other question, you have the option to either keep it all in memory (and lose on reboot) or commit the data to disk. In my testing I only saw a ~10% performance penalty for committed data to disk (granted we have extremely fast SSDs).
While this can be a good programming example for newcomers, I suggest to look at modules that are really implementing new DB paradigms on top of Redis. For instance this module builds a graph DB on top of Redis data types, with a query language and so forth:
https://github.com/swilly22/redis-module-graph