Hacker News new | past | comments | ask | show | jobs | submit login
Scaling Indexing and Search – Algolia New Search Architecture (highscalability.com)
79 points by yarapavan on Oct 15, 2021 | hide | past | favorite | 22 comments



Algolia is insanely expensive. Stay far away unless you have very few customers, hardly any data, or no plans to scale. They also offer zero protection against bots, so if someone decides to scrape your site, you get to pay for that privilege too.


Disclaimer: I work on an open source alternative to Algolia called Typesense [1]

If I had a dollar for every time I've heard Algolia is expensive! And if I had a dollar for every time I've said that statement!

If cost is an issue, definitely checkout Typesense. You can self-host it for free, or use the hosted Cloud version [2]. There are no per-record or per-search charges, which makes it very economical even for high volume use cases. In either case, I've seen savings of up to 96% in some cases compared to Algolia's pricing.

[1] https://typesense.org, https://github.com/typesense/typesense

[2] https://cloud.typesense.org


Tested out TypeSense when we were evaluating search options for work. Performance and features were amazing, are there any plans to add support for larger-than-RAM indexes?


Great to hear that! No plans at the moment to support larger than RAM indices - that’s the trade-off we had to make for performance.

Mind if I ask how large of an index size you’re looking at?


While you probably aren't using B+-tree indexes like databases, can you use a setup like them were you can create a "paged" data structure and use some buffer pool to bring pages in and out?


Certainly that's possible but doing fast type ahead fuzzy search requires keeping large parts of the indexing data structure in-memory.

There are, of course, many use cases where one does not need that speed. Eventually, like Redis we will probably have an option to use SSD storage smartly for these non-latency sensitive use cases. That's not our immediate focus though.


We have fairly large variation in index size, from ~100MB through to 30-50GB, and probably larger still in future. In the order of thousands of indexes as well.

Application level sharding/merging for more consistent index size is on the list of things to do, but given the complexity of that it’s a “not before we really need to” kind of task haha.


Agreed. I built a project on Algolia and while it generally works well I regret it. After about a year of production use they introduced new pricing which basically increased the price ten-fold, but I'm happy for this apparent "average" user that will see a decrease in price (according to the email they sent at the time).

The project is still "grandfathered" into the old pricing but of course that doesn't really help for any new projects, so all the time/knowledge invested into learning Algolia is wasted as I have no intention of ever touching it again (since it is just not economical).

A move to Meilisearch/Typesense is already in the planning phase.


Happy to help with the transition to Typesense! Email and Calendly link in my HN profile.


Can you recommend any good alternatives?


It’s not particularly difficult to setup for full text search backend + API: solr, elasticsearch, typesense, or couchbase are good document oriented solutions that pretty much work out of the box. Postgres or mysql have decent support for full-text search if you’re already on either, though you’ll need to integrate some sort of API layer.

There’s plenty of managed versions of the document oriented too. The biggest issue with Algolia, imo, is they charge per search request. e.g. if you enable autocomplete, every single key press costs you money and they don’t offer a particularly unique product.


If your dataset is small enough or you have plenty of resources, and you don’t need any fancy customizabilty and multitenancy (always searching only a subset of all documents, filtered by tenant ID) then Typesense. Otherwise if Typesense can’t fit the index in your RAM it will not work, if you need to filter every search it will become slow. If you need lots of customizability of how you index your documents and how you search, what you prioritize in the search, facets, nothing beats Elastic here. But it will need plenty of resources or otherwise it will be slow. If you need fast but absolutely non customizable search that can live off a lot less than 1GB of RAM (less than 100MBs even) then you might have some success with https://github.com/valeriansaliou/sonic If you’re constrained on resources but sonic is too limiting, then finally you might have some success with Manticore search. It’s featureful but using it with different languages can be a lot of work (I’m not sure why they don’t ship a distribution with all language plugins enabled and configured), the docs will be good enough to get you started. It can live off a few hundred MBs of RAM with large indexes, and will still be faster than Elastic.


Typesense supports multi-tenant indices via Scoped Search API Keys: https://typesense.org/docs/0.21.0/api/api-keys.html#generate...


Does this give independent term weighting per tenant, or is that common across the group?


Terms will be common across the group. The scoped API key helps you guarantee that you can never accidentally access another customer's search data since the filter condition will be baked into the query itself and you also use a separate key per customer.


Disclaimer: founder here

We offer a hosted/managed app search experience on top of Elastic or OpenSearch with appbase.io.

You can leverage the best of Elasticsearch (state-of-the-art) engine while getting Algolia like features like search relevance control plane, query rules, out-of-the-box analytics / insights, caching, UI components across all FE platforms (React, Vue, React Native, Flutter).



TypeSense and MeiliSearch are top of my list.

A little bit more exotic/early are Lnx and QuickWit


Slightly off topic but I signed up for Algolia to try it out a month or so ago and it's done nothing but spam my inbox daily with so many things.

I tried clicking the unsubscribe link in the email and it just takes me to a blank page.


Interesting, but it’s not really new?! Solr‘s replication mechanism works like this since forever.


Are there any resources in how something the scale of google/bing works?

not looking to build google, lol, just would like to understand architecture of that sort of search engine.


Interesting article, but sad that site doesn't support https. Even adding it to the url redirects to you http




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: