Hacker News new | past | comments | ask | show | jobs | submit login

Related: Meilisearch v1.0.0 release two days ago: https://news.ycombinator.com/item?id=34707727

I have been following these two libraries (Manticore and Meilisearch) very closely. Their simplicity, portability and performance gains over Elasticsearch are impressive.

Since two days ago, I am creating Python bindings for the core search engine of each of these two libraries, starting with https://github.com/AlexAltea/milli-py. Getting extreme performance, but as an embedded/self-contained package (basically same goals as SQLite).




Regarding performance, hope it's not the same as Graphana's Loki.

Grapana Loki advertises lower resource requirement, but it's just a disk storage system. Any query will read everyrhing from disk.

The Elasticsearch has big RAM requirements if you create a lot of indexes of course. You can't have something more quick than indexes, and you can't have lower resource requirements without having fewer indexes.


What do you mean by "any query will read everything from disk"? Is that when you do text search or even when you lookup by labels Prometheus-style?


Tags in loki are things like host, application, and environment. When searching by those tags and a time interval, it will read everything from disk. So any query that filters by ex. SessionId or a keyword from the log like Exception will read all the logs from disk. This can take ages if you have a lot of logs and a big time frame. Compare that with Elasticsearch which can index anyrhing, like SessionId/log message and return the result in an instant, without even reading the disk.


> When searching by those tags and a time interval, it will read everything from disk

That's what I'm asking, actually. Isn't Loki's proposition that it only indexes the tags and time interval? Do you mean that even filtering by that there's still a lot of data to go through?

Because it seems like you're saying it always fetches everything from disk.


> Isn't Loki's proposition that it only indexes the tags and time interval? Do you mean that even filtering by that there's still a lot of data to go through?

Yes

> Because it seems like you're saying it always fetches everything from disk.

If you specify a tag, like environment, it will not read the disk for data from other environments. But the tags like environment/host/timeframe are not enough if you want to query for something like error/exception/sessionid, and you might have to wait minutes/hours for a query which covers a lot of data.


Loving it. I'm interested in milli-py.

What can be a cool feature, it's auto backup to S3, or load from S3.


That looks awesome, kudos! I've been looking for a way to do local-first high-quality FTS.


I wonder how lasting will the support be for such libraries


Give Xapian a go also.


Xapian is a library but is licensed under GPL, so you can't build on it without making your whole app GPL.

You can get around that by having the search happen in a separate process or something, maybe. But this is a huge issue for something that one might want to embed.


that is a slight misunderstanding of how open source licensing works.

the GPL bleed only happens if you distribute your application, meaning to sell or give away binary packages for customers to install. if your product is a hosted api that you do not distribute, you do not invoke that clause.

also, a lot of open source projects handle this by having things like the core engine licensed on a copy-left friendly license (GPL,AGPL). however, the language connectors and bindings are licensed under the slightly less restrictive apache license. unless you are offering a saas service of the product itself, it is more likely you are actually interacting with the connectors anyways. mongodb is a classic example of this model.


Yes, if you don't distribute it, the license doesn't matter. That is more a flaw than the intention, but that is correct.

The point is that I can use SQLite, Tantivy, RocksDB, ... in my app no problem. I can make it open core, I can make it AGPL, BSD, MIT, not problem. Because those things are meant to be embedded. But I almost definitely can't use Xapian.

Let's be honest, if I want a search solution for use in my SaaS, I will grab Elasticsearch or an equivalent, I have no need for a library. It seems to me that the only use case where Xapian could really shine is crippled by their license. That is a shame.


It can be a problem if you intended to make an embeddable search engine within applications meant to be executed by your end-users (as is the case with milli-py above).


github 404 fyi


Fixed, apologies! I had finished the PoC last night but the repo was still marked as private.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: