Scaling Indexing and Search – Algolia New Search Architecture

dreyfan · on Oct 15, 2021

Algolia is insanely expensive. Stay far away unless you have very few customers, hardly any data, or no plans to scale. They also offer zero protection against bots, so if someone decides to scrape your site, you get to pay for that privilege too.

jabo · on Oct 15, 2021

Disclaimer: I work on an open source alternative to Algolia called Typesense [1]

If I had a dollar for every time I've heard Algolia is expensive! And if I had a dollar for every time I've said that statement!

If cost is an issue, definitely checkout Typesense. You can self-host it for free, or use the hosted Cloud version [2]. There are no per-record or per-search charges, which makes it very economical even for high volume use cases. In either case, I've seen savings of up to 96% in some cases compared to Algolia's pricing.

[1] https://typesense.org, https://github.com/typesense/typesense

[2] https://cloud.typesense.org

FridgeSeal · on Oct 16, 2021

Tested out TypeSense when we were evaluating search options for work. Performance and features were amazing, are there any plans to add support for larger-than-RAM indexes?

jabo · on Oct 16, 2021

Great to hear that! No plans at the moment to support larger than RAM indices - that’s the trade-off we had to make for performance.

Mind if I ask how large of an index size you’re looking at?

ddlutz · on Oct 16, 2021

While you probably aren't using B+-tree indexes like databases, can you use a setup like them were you can create a "paged" data structure and use some buffer pool to bring pages in and out?

karterk · on Oct 16, 2021

Certainly that's possible but doing fast type ahead fuzzy search requires keeping large parts of the indexing data structure in-memory.

There are, of course, many use cases where one does not need that speed. Eventually, like Redis we will probably have an option to use SSD storage smartly for these non-latency sensitive use cases. That's not our immediate focus though.

FridgeSeal · on Oct 17, 2021

We have fairly large variation in index size, from ~100MB through to 30-50GB, and probably larger still in future. In the order of thousands of indexes as well.

Application level sharding/merging for more consistent index size is on the list of things to do, but given the complexity of that it’s a “not before we really need to” kind of task haha.

pixard · on Oct 16, 2021

Agreed. I built a project on Algolia and while it generally works well I regret it. After about a year of production use they introduced new pricing which basically increased the price ten-fold, but I'm happy for this apparent "average" user that will see a decrease in price (according to the email they sent at the time).

The project is still "grandfathered" into the old pricing but of course that doesn't really help for any new projects, so all the time/knowledge invested into learning Algolia is wasted as I have no intention of ever touching it again (since it is just not economical).

A move to Meilisearch/Typesense is already in the planning phase.

karterk · on Oct 16, 2021

Happy to help with the transition to Typesense! Email and Calendly link in my HN profile.

danielvaughn · on Oct 15, 2021

Can you recommend any good alternatives?

dreyfan · on Oct 16, 2021

It’s not particularly difficult to setup for full text search backend + API: solr, elasticsearch, typesense, or couchbase are good document oriented solutions that pretty much work out of the box. Postgres or mysql have decent support for full-text search if you’re already on either, though you’ll need to integrate some sort of API layer.

There’s plenty of managed versions of the document oriented too. The biggest issue with Algolia, imo, is they charge per search request. e.g. if you enable autocomplete, every single key press costs you money and they don’t offer a particularly unique product.

nikita2206 · on Oct 16, 2021

If your dataset is small enough or you have plenty of resources, and you don’t need any fancy customizabilty and multitenancy (always searching only a subset of all documents, filtered by tenant ID) then Typesense. Otherwise if Typesense can’t fit the index in your RAM it will not work, if you need to filter every search it will become slow. If you need lots of customizability of how you index your documents and how you search, what you prioritize in the search, facets, nothing beats Elastic here. But it will need plenty of resources or otherwise it will be slow. If you need fast but absolutely non customizable search that can live off a lot less than 1GB of RAM (less than 100MBs even) then you might have some success with https://github.com/valeriansaliou/sonic If you’re constrained on resources but sonic is too limiting, then finally you might have some success with Manticore search. It’s featureful but using it with different languages can be a lot of work (I’m not sure why they don’t ship a distribution with all language plugins enabled and configured), the docs will be good enough to get you started. It can live off a few hundred MBs of RAM with large indexes, and will still be faster than Elastic.

jabo · on Oct 16, 2021

Typesense supports multi-tenant indices via Scoped Search API Keys: https://typesense.org/docs/0.21.0/api/api-keys.html#generate...

ewalk153 · on Oct 16, 2021

Does this give independent term weighting per tenant, or is that common across the group?

karterk · on Oct 16, 2021

Terms will be common across the group. The scoped API key helps you guarantee that you can never accidentally access another customer's search data since the filter condition will be baked into the query itself and you also use a separate key per customer.

sidi · on Oct 16, 2021

Disclaimer: founder here

We offer a hosted/managed app search experience on top of Elastic or OpenSearch with appbase.io.

You can leverage the best of Elasticsearch (state-of-the-art) engine while getting Algolia like features like search relevance control plane, query rules, out-of-the-box analytics / insights, caching, UI components across all FE platforms (React, Vue, React Native, Flutter).

Sightline · on Oct 16, 2021

https://github.com/meilisearch/MeiliSearch

FridgeSeal · on Oct 16, 2021

TypeSense and MeiliSearch are top of my list.

A little bit more exotic/early are Lnx and QuickWit

zkldi · on Oct 15, 2021

Slightly off topic but I signed up for Algolia to try it out a month or so ago and it's done nothing but spam my inbox daily with so many things.

I tried clicking the unsubscribe link in the email and it just takes me to a blank page.

truemped · on Oct 15, 2021

Interesting, but it’s not really new?! Solr‘s replication mechanism works like this since forever.

supermatt · on Oct 16, 2021

Are there any resources in how something the scale of google/bing works?

not looking to build google, lol, just would like to understand architecture of that sort of search engine.

skunkworker · on Oct 15, 2021

Interesting article, but sad that site doesn't support https. Even adding it to the url redirects to you http