I mean speed is nice, but it is not the primary thing I am concerned with in a search engine, as long as it is acceptable it is not in the list of requirements - things that would be on my list - not necessarily in this order but close
1. What human languages does it support.
2. In these human languages how does stemming and decompounding work in your implementation.
3. how is word importance determined in your index - TF-IDF? other algorithm? Are least important words automatically dropped from queries?
4. Do you have ability to rank on both the stemmed/decompounded query/results and exact matches? So something like raw field access.
5. Can I create my own semantics - I remember seeing a post on here recently where someone had created a search engine (in Rust I think) that was faster than ElasticSearch but from what I could see you couldn't create your own field names so you were stuck searching in title, description, body, creationDate and a couple other fields which really decreases the usefulness.
I mean these are the things that right away spring to mind to ask about when someone tells me they have a new search engine, and when they show me look at my speed benchmarks I'm thinking "what am I supposed to do with this?"
on edit: formatting
on second edit: So I guess as in most things I am interested in how the product actually fulfills what should be its primary functionality, so how does the search engine function as a search engine, I suppose my questions could be answered with quick - our search engine has feature parity with ElasticSearch / Solr where features A, B, and C are concerned - features D and E will be supported in the future.
> 3. how is word importance determined in your index - TF-IDF? other algorithm? Are least important words automatically dropped from queries?
>4. Do you have ability to rank on both the stemmed/decompounded query/results and exact matches? So something like raw field access.
Thanks, I guess I was more taken with answering on the link to benchmarks on the sub-thread which seemed not what I would consider pertinent. That said everything looks pretty nice.
That link is really well hidden in the article IMO. I read your comment and re-read the article and still had to ctrl+F to find it. (It's the 4th link under "blog posts")
Elastic search adds a huge overhead over Lucene. I suspect the same is true for RediSearch. The test is not testing the engine, but rather the implementation of its distributed aspect.
Of course. What I mean is that the performance of distributed FTS is mostly related to its distributed aspect, not FTS by itself.
If I were @aphyr, I would say that performance and correctness are competing, so a more performant distributed system is less correct, unless proved otherwise.