From an information retrieval perspective, this is embarrassing, and the benchmarks aren’t close to being apples and oranges—it’s more like apples and Mack trucks. The linked benchmark is for a toy data set of 5.6GB, which we all know Redis will store in RAM; yet they don’t say whether they set “index.storage.type: memory” on ElasticSearch (guessing they didn’t).
At the same time, ElasticSearch/Lucene puts considerable effort into analysis at both indexing time and querying time that goes into ranking the search results. RediSearch’s ranking is “user provided”—how does that work exactly? What does that even mean? Ranking is at the heart of information retrieval—of what value is it to allow for 4x more queries to be run on a cluster when the result sets are terrible?
ElasticSearch could be better at scaling up on nodes to handle more query operations on a given cluster. However, if you care about full text search, this won’t do the job.
It’s amazing that you reached the conclusion that it won’t do the job after reading a benchmark for 2 minutes!
The benchmarks are quite realistic (5m docs, 25m products) for document search, and the settings reflect real usage. There is no “memory” storage type for ES, it has “mmapfs” which the documentation explicitly says is dangerous and might be removed. The default `hybridfs` already has the ability to use memory mapped files as an optimization. Forcing mmap would actually make it an unrealistic benchmark.
> ElasticSearch could be better at scaling up on nodes to handle more query operations on a given cluster
Based on? They are both running with 5 shards for the benchmark.
Well, you have MemoryIndex, which is different proposition (it's used for the percolate feature). I can get that redis search can be faster than Lucene based search engines, but actually running the search is the easiest part, there's a lot of thought and effort on making a decent search engine, and the article doesn't tell me if I can get facets, order by any field, make both fulltext and geospatial queries, and many more things important when you make a search engine.
I rememeber long ago when SQL Server included full text, we thought that our time fiddling with lucene.net had become to an end, but when we tried it failed miserably because the full text engine and the relational one where like 2 different parts, and if you wanted to make a full text search and order by a numeric field, it would have to make a temporal table with all the results of the full text and order that. Those are the things tha lucene solves so well, that I'm reticent to think that redis search has managed to make them ok at the first try. So, not apples to oranges benchmark, but if you have already redis and the search capabilities that you want to add are fullfiled by redis search, it can be a good product.
> I'm reticent to think that redis search has managed to make them ok at the first try.
Your example doesn't really match the reality of Redis. Modules in Redis can bring their own data types and algorithms and don't need to resort to the same kind of hack you mentioned. The ecosystem inside Redis is designed for modularity and clear, minimalistic interfaces between components.
RediSearch might not be perfect, but any problem will stem from a different set of causes, not because it tries to "emulate" a data model.
The whole point of Redis is not having to emulate data types and algorithms.
> Modules in Redis can bring their own data types and algorithms and don't need to resort to the same kind of hack you mentioned.
To emphasize this:
One thing people might misunderstand about Redis is that Redis extension developers aren't expected to fit their data structures to the needs of a storage-engine. It's not like Cassandra, where your data structures must 'boil down' to key-value pairs; nor is it like an RDBMS, where your data structures must 'boil down' to tuple-sets; nor like a graph DBMS, where your data structures must 'boil down' to EAV triples. Redis data structures aren't "implemented in terms of" any other simpler 'canonical' data structure.
Instead, when you look at something like Redis Streams or the Redis Graph module, the whole complex data structure for each stream/graph is a big opaque in-memory thing dangling off a single key. It doesn't need to be broken down into parts "legible" to Redis. It can just be what it is. "Objects" in the Redis keyspace (the things keys are holding) can't hold pointers to one-another, so the core Redis commands can blindly manipulate them (e.g. deallocate them.) For everything else, you go through the module's commands, which walk the internals of the data-structure as what it is: a plain-old in-memory C struct, defined in your module's header files.
The canonical representation of a data structure in the Redis AOF (WAL log) is just the sequence of commands used to build it; not the data-structure itself. So, as a module developer, to get AOF persistence of your module's types, you don't need to do a thing, other than ensuring that your module's commands are deterministic.
You do need to do a bit of work to get your module's types to serialize into Redis RDB snapshots. But it's completely up to you how to define your types' serializations. Redis just provides an API for writing and reading scalar types from the RDB file stream. How your module uses them to save/load a value of a type is up to you. (And you can just skip this if you like; RDB persistence is used far less often than AOF persistence, so support for RDB persistence it's not even a highly-demanded feature for modules. If you don't bother, then loading your module just disables RDB persistence.)
The point is not that the benchmarks were setup incorrectly or disputing that RediSearch is faster. It’s thst it’s doing far far less because it’s not ranking, and full text search without ranking is... less than useful.
I think the usual way to "level the playing field" between in-memory and disk-backed based storage engines for clearer comparison, is to back the disk-backed storage engine with a tmpfs volume.
I believe that the combination of tmpfs + file access — and especially tmpfs + mmap(2) — results in a fast path that almost resembles direct memory access. (If the file or memory-region is opened read-only, the kernel can expose/share the relevant memory pages directly into the process.)
That's completely untrue, it comes with a few scoring algorithms out of the box (https://oss.redislabs.com/redisearch/Scoring/) and results can also be ranked by the value of arbitrary numeric or short text fields. On top of that, there is a plugin API to provide custom scorers written in C/C++.
BTW It's funny though that in the end most of the use of ElasticSearch is for analyzing logs and all the IR scoring tricks are not even used.
Search is hard. When ElasticSearch isn't enough, I reach for Algolia. For basic search, this should be good, but ranking is very important... how's RediSearch's fuzzy search results? The second most important aspect of search.
I’m trying to imagine why I would not use existing, mature search tech (Solr/Elastic) with existing mindshare I can hire.
Why would I pay to be off on an island very locked into a proprietary search tech? Even if it is a little faster with the redis brand attached to it? It doesn’t seem quite turnkey as Algolia or as deeply featured like Lucidworks Fusion...
You don't have to pay for it. The original license was completely Free As In Speech and even today it costs only for the distributed version. There is even a fork of it preserving the original license.
It attaches to existing redis hash keys and allows you to add an index to your already existing data for example.
Disclaimer - I'm the original author of RediSearch (started in 2016), before the switch to a non FOSS license. I'm not affiliated with it anymore and haven't followed its development for the past couple of years. But I can tell you that our first users were people who couldn't get ES working for their workloads, or had good redis infrastructure in place already and were able to utilize it for this as well.
What I was aiming for originally beyond speed, was a simple API and an intuitive query language (https://oss.redislabs.com/redisearch/Query_Syntax/) that scales from simple textual queries to structured queries, and a big emphasis on real time incremental indexing as the core features. There is no batch vs. incremental indexing - you feed it a document and it's searchable immediately, which I thought was very important for an in memory database (it comes at the cost of avoiding some nifty index compression techniques, the compression is pretty straightforward).
But TBH it started as a demo dogfooding project for the redis module system - something to demonstrate how powerful the API is, and at the same time test it, find bugs and design problems with it, and gather real developer asks. It became the reference project for the modules API and a lot of the module system's features were designed to accommodate it (most notably async execution of slow commands).
But then a few people started using it (which all of the sudden gave redsiearch itself user asks and testing and all that), requesting features and being happy with it when these features got implemented, and it slowly started to be a thing, with people even contributing code. Then we wrote the distributed version requested by enterprise users, and it became a product, which is now developed by a team. I left it a bit over two years ago.
I think it's not terrible - redis itself is just brilant !
But wow a search-redis ? I think they have a lotttt of ground to make up to compete in the search space. The amount of options/analysis/tokenization/other-search features you get out the box from anything Lucene based (solr,elastic) is just staggering.
Nice work, it’s pretty awesome to be able to plug a search engine as easily as adding a plugin. I just have one question about the recently published benchmarks, are the latency based on the two-word search "hello world"?
I was under the impression that Redis was intended for short-lived data, but a full text search suggests otherwise. Are people using Redis, with persisted storage, as their database?
thats not particularly expensive, but a dealbreaker for hobby project on which i would want to use it.
sadly, there is no non-commercial licence either.
I have no problem paying for it, but what I miss is a clear recommendation how to get started if you have the rather common case of your own AWS Account using ElastiCache.
If I buy into the Redis Enterprise Cloud, what are it's preferred usage scenarios, how do I secure the connection properly, etc. etc. etc.
This is just shoving a buy button in my face and I have no idea how to decide if what I'm buying is actually feasible for me, for example from a compliance pov.
Great.
How?
Is that recommended?
What kind of user rights I need to give redis to manage that instance in my vpc?
How to lock that down to the minimum without breaking it?
In that configuration, what risks do I have to control for otherwise?
...
You can use RedisSearch on Redis Cloud Essentials with our free plan.
We currently support AWS/Mumbai (ap-south-1) and we plan to gradually make it available in other regions as well.
Yes. Unless you use something like Redis Flash or KeyDB that store data in flash too. You should check the data structures and functions being done on them, if it can be efficient to also work with disk. Ex: increment in redis is fast, but in another db may end up as Rocksdb get+update which is not.
I wish Redis would focus a bit more on the standard clustering options. Cluster has some real drawbacks and Sentinel is about the most brittle service I’ve seen. We run all of our peer discovery for Squawk[1] on Redis but recently made the switch to KeyDB and have been thrilled with the resiliency. We’re definitely not Redis experts so I’m sure we’re doing some things wrong, but for anyone else looking for a rock solid HA solution, I highly recommend it.
> Cluster Support and Commercial Version: RediSearch has a distributed cluster version that can scale to billions of documents and hundreds of servers. However, it is only available as part of Redis Labs Enterprise.
This is a bummer because high availability is really important for many search uses cases. For e.g. think about e-commerce where search literally prints money.
EDIT: it seems like the open source version supports a read-only replica for failover but my overall thoughts about not crippling/compromising the clustering story in open source version still stands.
While I understand the rationale for this move, unfortunately not having HA in a non-starter.
I had a similar temptation when open-sourcing Typesense (https://github.com/typesense/typesense) and thought long and hard about keeping clustering as part of a closed source commercial edition but eventually decided against it. I understand that commercialising certain features is a necessarily evil and trade-offs must be made. However, I think there are still many avenues to do that without keeping clustering closed source.
Apart from that, I am happy to see the search space heating up with a lot more interesting options.
> A counter-point to yours would be how Amazon's search is horrible
Citation needed as well :) What I actually meant is that for a class of use cases like e-commerce, search is an important feature with a direct impact on revenue. For example, search in e-commerce enables product discovery. So a downtime hits your revenue directly.
Redis Modules created by Redis Labs (e.g. RediSearch, RedisGraph, RedisJSON, RedisML, RedisBloom) are licensed under the Redis Source Available License (RSAL).
Basically you can't use it for a database product, caching engine, stream processing engine, search engine, indexing engine or ML/DL/AI serving engine. I feel just about any web app does at least one of these things.
> Licensor hereby grants to You a non-exclusive,
royalty-free, worldwide, non-transferable license during the term of this Agreement to:
...
(b) use the Software, or your Modifications, only as part of Your Application, but not in connection with any Database Product that is distributed or otherwise made available by any third party.
The license is basically to stop AWS using it as part of elasticache (or any other cloud providers)
Not fully correct, you can change the code and even redistribute it as part of your application, as long as your application is not a "Database Product" as defined in the license
If you resell Redis that’s a DB product. If you sell widgets and use Redis for search it’s not. There’s obviously some grey areas but I thought it was relatively clear?
Good for them on building a business around redis.
That said, fuck paying for some bolt-on approximation of a search engine with no real durability. I'll use elastic or solr.
They're free, and have massive communities and user-bases. Issues are surfaced quickly. Compatibility is a priority. And all the other benefits of network effects.
I mean speed is nice, but it is not the primary thing I am concerned with in a search engine, as long as it is acceptable it is not in the list of requirements - things that would be on my list - not necessarily in this order but close
1. What human languages does it support.
2. In these human languages how does stemming and decompounding work in your implementation.
3. how is word importance determined in your index - TF-IDF? other algorithm? Are least important words automatically dropped from queries?
4. Do you have ability to rank on both the stemmed/decompounded query/results and exact matches? So something like raw field access.
5. Can I create my own semantics - I remember seeing a post on here recently where someone had created a search engine (in Rust I think) that was faster than ElasticSearch but from what I could see you couldn't create your own field names so you were stuck searching in title, description, body, creationDate and a couple other fields which really decreases the usefulness.
I mean these are the things that right away spring to mind to ask about when someone tells me they have a new search engine, and when they show me look at my speed benchmarks I'm thinking "what am I supposed to do with this?"
on edit: formatting
on second edit: So I guess as in most things I am interested in how the product actually fulfills what should be its primary functionality, so how does the search engine function as a search engine, I suppose my questions could be answered with quick - our search engine has feature parity with ElasticSearch / Solr where features A, B, and C are concerned - features D and E will be supported in the future.
> 3. how is word importance determined in your index - TF-IDF? other algorithm? Are least important words automatically dropped from queries?
>4. Do you have ability to rank on both the stemmed/decompounded query/results and exact matches? So something like raw field access.
Thanks, I guess I was more taken with answering on the link to benchmarks on the sub-thread which seemed not what I would consider pertinent. That said everything looks pretty nice.
That link is really well hidden in the article IMO. I read your comment and re-read the article and still had to ctrl+F to find it. (It's the 4th link under "blog posts")
Elastic search adds a huge overhead over Lucene. I suspect the same is true for RediSearch. The test is not testing the engine, but rather the implementation of its distributed aspect.
Of course. What I mean is that the performance of distributed FTS is mostly related to its distributed aspect, not FTS by itself.
If I were @aphyr, I would say that performance and correctness are competing, so a more performant distributed system is less correct, unless proved otherwise.
One of the things I find so confusing with Elasticsearch is querying. Constructing a complex JSON document for a simple query just doesn't fit my brain. I have tried to put this down to bad/confusing documentation, but I'm not sure anymore.
It seems Redis is moving a bit in the same direction - although not as complex as ES has done it.
Being able to run this inside a Redis instance is a big win, although I suspect few/none of the cloudproviders are willing to pay RedisLabs for the privelige of using the module.
For my standard use-cases using the query_string query is "good enough", it is the only thing I regularly implement and expose to users. That + some pre-defined aggregations gets you very far. Query String Query can do things like boolean operations (AND, OR, NOT), phrase search, fuzzy search, range search, date math, regex search, boosting: https://www.elastic.co/guide/en/elasticsearch/reference/curr...
I'm just getting ready to push instant search as a major feature on a client's web application, as both public-facing & admin-side. The MeiliSearch team has been great in terms of communication regarding issues, new features, the roadmap, etc. They're also moving quickly, constantly adding new functionality that has been useful as I've been working on the integration.
Depending on what you're planning on using it for, make sure to review the documentation, and check the GitHub issues/discussions. For example, I've had to use workarounds to handle some lacking auth/permissions support, but they're currently working on improving it.
Best part by far is the performance. And memory requirements is completely reasonable, depending on the size of the database.
How is this multiple years old add-on module a loss of direction for Redis or Redislabs (who always have done exactly that: building things around redis) after the departure of Antirez, who wasn't in charge of setting direction at Redislabs?
I am a bit dated but I dont have a high confidence of redis with persistent storage such as a typical ACID database. How does this fit into an architecture?
It seems painful to have to write code to reload the search db if it fails.
How long is redis search going to exist and be supported?
If this is delivered as a module, what guarantees do I have that the module interface wont break and leave redis search in a broken state?
> but I dont have a high confidence of redis with persistent storage
Anything that kills redis persistence is also going to corrupt whatever else database you'd have used instead. In fact redis persistence is such a simple model, I'd be surprised if most RDBMs weren't more likely to corrupt data.
I've been running a redis database since 2013 and haven't had a case of lost data once and never even had to restore from a backup.
At the same time, ElasticSearch/Lucene puts considerable effort into analysis at both indexing time and querying time that goes into ranking the search results. RediSearch’s ranking is “user provided”—how does that work exactly? What does that even mean? Ranking is at the heart of information retrieval—of what value is it to allow for 4x more queries to be run on a cluster when the result sets are terrible?
ElasticSearch could be better at scaling up on nodes to handle more query operations on a given cluster. However, if you care about full text search, this won’t do the job.