Hello, I'm the Meilisearch CEO, your issues could be because you sent your data without configuring your index, and it's what I read from your comment. Just change your setting to not index URLs.
Hey thanks for letting me know! I did add a few fields that weren’t directly searched on, because I wanted to be a bit more fair across other search engines (postgres is holding the whole document and the indexes).
I’m going to change the configuration and see how that goes.
One thing I’d love help with (that would make an awesome recipe section for your docs site) are the best practices around bulk insertion! I couldn’t tell if there was an actual benefit to using addDocuments() vs addDocumentsInBatches().
If you remove the URLs from indexation, it'll generally save a ton of place and will be much, much faster to index. We are thinking about not indexing URLs by default; you can help us by explaining your use case here -> https://github.com/meilisearch/product/discussions/553
Just a detail, if you're making a `du -sh` on your computer, the size on the disk will stay unchanged because we are doing soft deletion ;). Don't worry. It will be physically deleted after a while if you need it in the future.
Thanks! I removed the URLs and now the searchable attributes are only title, description and some author fields!
> Just a detail, if you're making a `du -sh` on your computer, the size on the disk will stay unchanged because we are doing soft deletion ;). Don't worry. It will be physically deleted after a while if you need it in the future.
Ah I was just wildy undershooting the size I gave the PVC! I just gave it much more and it's fine -- right now it's resting around 19Gi of usage, which is actually a bit of a problem considering the data set was only like 4GB or something like that originally. That said, disk is really not an issue so I'll just throw more at it, maybe leave it at 32GB and call it a day (it's around 1.6MM documents out of ~2MM), so shouldn't be too much more.
Thanks for this, I'll keep this in mind -- so I could actually pass off HUGE chunks to Meilisearch.
It seems like the larger the chunk the more efficient? There didn't seem to be much of a change in how much time it took to work through a chunk of documents, more just that having lots of smaller chunks would go slower. I started off with 10k in a batch, then went to 1k then back to 5k, maybe I should go to 100k docs in a batch and see the performance.
There's a blog post waiting to be written in here...
Thanks! Was this something someone requested? Is there a tangible benefit (were there some customers that didn't want to split up the payloads themselves)? Because it seems like unnecessary cruft in the API otherwise.
Check out the doc: https://docs.meilisearch.com/reference/api/settings.html#upd... With the payload: `["title", "descriptionHTML"]`
It will change everything!