We're using Algolia (the free version) for all Facebook open source project websites (React, GraphQL, Yarn, ...) and it's been nothing short of awesome.
You just include a js file and add an input where you want the search and it just works(tm).
It has required 0 maintenance, it didn't go down, the api didn't change, nothing to tweak on the backend... Users have gotten a quality and super fast search since then.
> You just include a js file and add an input where you want the search and it just works(tm).
Reading this I get the impression that they do the indexing themselves or something but in the Quick Start the first thing you need to do is importing your data. What am I missing?
Algolia is an amazing service and an absolute joy to use. They deserve all the praise they're getting and then some.
However, it's easy to exceed their record limits, especially since you need to duplicate your index every time you want to 'sort by' something.
For instance, if I wanted an option to sort my results by date, and to then search these, I'd need to create 2 new slave indexes for date ascending and descending respectively. Sorting by anything else, like price, means creating yet more indexes and suddenly it's easy to turn 30K records into 150K. This happened to me and I ended up having to roll something custom instead (Vuejs frontend and Sphinx Search backend), since my client balked at the extra cost.
But, if you have a small dataset, or are fine with the costs, then Algolia is spectacular.
Yes.. this is something our startup is working on fixing. How you sort and how you prioritize your fields to be searched are all configurable at query time.
If you pay for a million records, you should be able to store a million records.
Thanks ! Actually 40% of our current beta users are existing Algolia customers. Maintaining separate indexes for every sorting / ranking option and intentionally restricting the application to stay under limit is a drawback that many complain about.
Our On-Premises option is something which a few potential customers have been interested in.
Yes, we are currently hosted only in NY. Can you please ping beta.searchera.net and check the latency from your location ?
Once we are out of beta we will be offering distributed datacenters. West coast USA and Europe to start with. The option to install in your own servers / cloud provider is another option.
If you would like to try it out, I can always bring up a host quickly next to your location on digitalocean or aws. Please send me an email on hello@searchera.io
Ours is a custom index written mostly in 'C' and bit of x86 assembly. It is very lightweight and extremely fast even without the use of replica indexes for every sort order.
Thanks for signing up.. Will get you started as soon as we have our additional servers up.
especially if you have records that not even change often.
If you have 300,000 items in an index that you want to sort in 4 ways and want to update eg the price daily, you already consumed 36 million operations of the biggest non enterprise plan that includes 50 million operations.
Just by testing and tweaking the index every other day, we already use up to 500,000 operations.
But then again setting up search infrastructure in different countries and synching it in realtime also comes at a hefty price. So we will stick with Algolia for now, the speed is breathtaking and we will never be able to achieve 20ms responses with eg an Elasticsearch cluster.
(n.b: I am an engineer at Algolia) Algolia's engine computes relevance at indexing time by design, allowing us to deliver optimal search performance at query time. As a result, each new `sort` - by price, by name, by date added - requires several indices containing identical data.
To make this easy to implement, we provide a way to create index replicas, read-only indices that can have different settings from the master index.
When using replicas, every record added to a master index gets also added to the replica index. Same goes for deletion and update operations.
All indexing operations done on replicas are not billed.
By using replicas, you can adjust your calculate by removing the factor of four you included for each index,
meaning that 300K*30 days = 9million operations/month. This assumes you update the entire index daily,
whereas you could also only update the prices that changed, which would in turn further reduce the number of operations.
IME, I haven't been able to replicate Algolia's search responsiveness with ElasticSearch, even with good hardware. I don't think ES/Lucene was ever designed for that use case. IIRC, Algolia was designed to perform well even on mobile phones. I wouldn't dream of getting Lucene to run performantly on a mobile phone.
I'd love to see if someone has done any of the "realtime" Algolia demos backed by ElasticSearch.
In any case, ES excels at very different use cases - I've only seen Algolia provide "basic" search.
I think ES can get there, but depends a lot on what hardware you deploy (SSDs!), how you build your index, and whether you can geographically distribute your search engine close to your users.
We have one ES cluster with hundreds of queries per second that gives median 9ms response times and 99th percentile around 160ms. Another cluster with 100x more data that gets 20-25ms median response times and 99th percentile at 360ms.
Now both of these are just the ES response time, there is additional overhead in responding to an API request and then you also start to get into where your data centers are located relative to the end users.
Totally agree with you. Algolia's speed is pretty amazing, but its price is pretty hefty also. We ended up switching over to Elasticsearch which is much cheaper and more flexible in certain ways
Oh, that's the Hacker News site search... Works fine for me, except I always switch from "results by Relevance" to "Results by date", as in "Has this been posted already...?"
Did not know Google is discontinuing its custom search engine. Looks like there may be a business opportunity here.
If you want to know if something has already been posted on hacker news and are using chrome, I've made an extension for that purpose. It's a reverse lookup that uses algolia search. It's not perfect but I use it daily and solves this problem
https://chrome.google.com/webstore/detail/hacker-news-lookup...
Why does the extension need browser history permissions? I see it states it does not use it, so curious what it's needed for. Cool idea regardless, just curious.
It's not used. I just need access to the current tab url and the ability to open tabs, but I am not sure how to require less permission... The manifest file looks like this:
I want to do "search comments by popularity" a lot but it doesn't seem like they're actually ordering it -- does Algolia have access to comment ratings? Does HN expose that to anyone?
The problem here is consolidation: all big customers (successful websites) will end up being part of mega-corporations (Amazon, Microsoft, Google, Salesforce, etc.). And all these companies have their own search engine (which they use for other machine learning, etc. - not just search). For example, Twitch will probably switch to Amazon A9.
So the question here is: what is the game plan to return this $53M? Acquisition?
Algolia is better so they need to stick with it. But Amazon will get there... But if Amazon finds out that writing code is harder then writing a check they will write the check. But they are still trying.
I wouldn't assume that any SaaS requires the big 4 to be their customers to qualify for a high evaluation. Users are used to search that's incredibly smart thanks to Google and it is an extreme technical challenge to roll your own smart search. Many mid size and small companies would save tons of money by buying it from someone else.
What I notice with their results is that they start strong, but fall off very rapidly into partial gibberish.
For example, say I want to make a battery-powered project. I search "lipo." The results start out promising; charging boards, including a few options tailored to specific popular boards. But by the time we get to the end of page 1, only every other result or so is actually relevant. Useful results like batteries and connectors are interspersed with random microcontrollers, LEDs, motors, etc.
Still, it works and you can always refine by category if that sort of thing really bothers you. It seems like a pretty solid solution, and the issues I'm mentioning are probably caused by the implementation, which I'll bet uses the full text description of entries to weight things towards not excluding a potentially-relevant result.
Ideally you'll get everything you need in the first page anyways but generally speaking if you have complex part numbers or product series, it's harder to get right. I think that machine learning is essential to improve relevance over time without having to resort to manual tweaks.
Adafruit does have it tough; they try to bridge sites like Amazon and Digikey by having a very broad selection of products specific to electronics, but they don't actually have many parts in each of those categories. They focus more on supporting what they have with documentation and software and whatnot.
But sites like Digikey/Mouser/etc have searching down. If I want a capacitor, first I pick what kind (electrolytic, ceramic, tantalum, etc,) and then I am presented with dozens of menus representing specific attributes that I care about. Capacitance, temperature coefficient, size/pakaging, manufacturer- whatever you could possibly want to select on, you can.
Sometimes I wish that other digital distribution platforms would take inspiration from that 'catalog' model. Discovery is difficult, these days.
Sounds more like a design decision - don't exclude documents, where there is a low probability of relevance. For various reasons it might be better to have more results with some irrelevant ones, instead of 7 highly relevant ones.
I've actually been avoiding using Algolia after I used IndexTank and then they were acquired by LinkedIn. (I always said grats diego + team at IndexTank) but it was a frustrating experience to need to role up everything and pack up shop.
Seeing a huge raise like this actually makes me feel better about the possibility of using Algolia rather than rolling my own Solr containers.
Thanks. We tried our best to transition our customers to Searchify but not everyone wanted to go with a bootstrapped company. More funding is always reassuring, I agree.
This is great for something like a public website. What about a situation where I want some of my users to only be authorized to view some of my data based on a set of rules I set in my backend application. Can Agolia accomplish this?
You could just as easily set up Solr/Elasticsearch locally and index your data for searching. It'll give you the advantage of being able to do more complex queries, faceting, grouping, stemming, synonyms, and things that you expect from a modern search experience.
I really enjoyed using http://searchkit.co/ a while back. If i were to do it again, I'd probably roll my own, as IIRC state management with searchkit was a bit of a black box, but it is a pretty nice drop-in lib .
We've been using Algolia as we build out our custom database of ~1.5million US-based nonprofits for patronage.org and I have become their #1 fanboy. The search is blazing fast, super customizable, and support has been incredibly helpful. It's one of those products that seems to provide "if you can dream it, you can do it" functionality. We went from a slow search that used EINs to a search that accounts for typos, alternative names of charities, EINs, locations, and is customized for the logged-in user since we can pass queries to Algolia clientside.
My only complaint, if I had to make one, is that pricing is a bit steep for our use case but I can't imagine how much time I'd need to spend to get ElasticSearch running comparably.
I miss the old HN search. The new one has some nice features, but it's missing basic search engine commands. I used to do search stuff like "ai OR artificial intelligence OR deeplearning..." to see if any articles on a certain subject have been posted today.
There's indeed no OR, as you'd be able to type those three queries with instant results.
If you really wanted to do that, you could use the optionalWords feature at query time using the API. :)
They are one of the very few companies in that space which still develop their own search technology instead of just adopting and mixing available open source packages.
Most other search software these days runs on some flavour of Lucene and rightfully so IMO because Lucene is a rock solid piece of software.
This means that creating a defensible USP in that space isn't easy. Algolia did it by essentially developing their own search software from the ground up. It isn't necessarily better than competitors like Elasticsearch in every measure but it certainly has some interesting properties and it can be pretty fast, especially for things like autocomplete.
plus, from the reading I did, the founders have been working in the search space for a long time and really seem to know what they're doing.
Lucene is great, but, it does have limitations: tough to embed in non-JVM applications and poor support for tokenizing things like emojis are two immediate things that come to mind.
Elasticsearch is maybe my favorite data storage product in the last 15 years, but the Algolia folks have taken a different approach that is also quite interesting.
the lucene standard tokenizer doesn't offer support for tokenizing emojis: it strips them out. I'm not aware of any lucene tokenizer that does support emojis tokenization. This isn't a huge deal, as lucene is open source and tokenizers can be written, but I'm just highlighting it as a shortcoming that's illustrative of the time and place lucene evolved from.
I would say it is the libraries in various languages and ui helpers integrated with their engine.
We have been working on a more flexible alternative with additional features and also have a few current Algolia customers who are trying it in beta. The number one feedback has been how easy it is to get started with Algolia..
My team switched to Algolia, and it's been a game changer for us. Great product, simple yet powerful API and a really helpful team. We use it both for our public facing properties (i.e https://tubitv.com/search/schwarzenegger) and also to power search for our internal video CMS
I am using Algolia with React Native and Firebase to search users by name, bio, phone, and more, and all I needed was a single cloud function file in Firebase to link the two completely after I had imported my data with a simple node script.
Just their typo acceptance alone makes Algolia, imo, the best 3rd party search service available currently.
We've been using Algolia in production for 8 months and moved to self-hosted elasticsearch.
Small dataset, 2,5million records in 10 indexes IIRC, data structures and filtering were quite complex.
Algolia Pros:
* very nice UI/dashboard, stats, lots of options, flexible
* documentation is nice, 'onboarding' was not an issue
* it worked reliably most of the time
Cons:
* lock-in/proprietary. Think twice if you want to base your business on it.
* you have trust them with very sensitive customers data
* it is expensive. We reduced costs from $800/month to 40$/month by moving to self-hosted open source solution
with same level/quality of customer experience.
* guys at Algolia like to rewrite client libraries and make them completely backwards incompatible.
Good luck rewriting almost everything.
* support is incompetent and honestly it was useless in our case. I don't want to share details here in public as lots of people were involved
in the case, but their support is disaster and for us that was major reason to migrate away.
We had major issues for 2 weeks and in the end had to debug&fix problem in their client library by our developer.
Long story, but that one was disaster.
In short, it wasn't extremely horrible experience overall, product is nice in general, but elasticsearch is very very good and Algolia just couldn't justify price tag, sorry.
I am sad to see you had a bad experience with Algolia and I can assure you that we put a lot of effort on backward compatibility:
* we have never discontinued a feature in the API since the launch
* We never broke our API clients, we proposed a new version when a new feature required a big change but we kept the previous version (and this happened only on two API client in 5 years)
For the support, this is our engineer's team working on the product that does the support and we put a lot of effort to make sure all our customers are satisfied and get the relevant answers.
Then if you got the same customer experience with a $40 machine, you have probably not used all the feature/power of the engine. I am sad to see such a feedback and you can make me accountable to make sure we will do everything we can to satisfy all our users
In any case, I would be happy to get your detailed feedback by email (julien at algolia.com). I see it as a good opportunity for us to improve ourselves
I love Algolia. I worked with them for a week to make DocSearch accessible to screen readers and had a lot of fun. I'm especially grateful for their commitment to open source. Go Algolia!
Algolia is amazing. They are growing like crazy and somehow they still have time to help out startups. Both the CEO and CTO have been giving us advice. Great guys, amazing tech.
We use Algolia a lot at StackShare. Stack News[0] is powered by Algolia's client-side library. It's easy to integrate in your project and runs really fast. We love it! If you're looking for a way to add search for your project give Algolia a shot, there really is nothing like it on the market.
Algolia is a great company! I am using their docsearch but my friend used it in a mobile commerce app. He was super impressed with the integration and the end result was a great user experience.
highly recommend checking them out. Congrats to the Algolia team!
This is a place where there are tons of unstructured data: emails, slack, atlassian, code base. Search seems like a useful tool for employees. Permission control is another essential feature.
Yeah, definitely. It's a product category called, somewhat prosaically, "enterprise search." Algolia's appropriate for this in that we allow you to define data structures on a per-content source basis, and can show the most relevant results for each content source. Usually, enterprise search providers provide lots of out of the box connectors — with us, you'd just shove JSON into the proper index using one of the API clients.
Security is super important, but we can provide encryption-at-rest and have a secured API key [1] feature that allows you to segment your user base.
Any thoughts on all the comments around steep pricing because of needing to add a sort, or test? Sounds like a bait and switch as the rise in cost is quite unexpected for most people?
For testing, we propose free accounts.
For the different sort, we did the choice to emphasis quality over cost, on purpose :)
In practice, it means we need to duplicate the data for each sort in order to do has much as possible at indexing time. We have seen few users that were not ready to pay but the big majority see the value and this is aligned with our cost.
Custom-made C++ engine - baked into an nginx plugin - no elasticsearch nor lucene in the stack, except to gather analytics on the side.
It has been made to solve user-facing search only, where elasticsearch couldn't meet the same speed because it's trying to solve a lot of other things.
You just include a js file and add an input where you want the search and it just works(tm).
It has required 0 maintenance, it didn't go down, the api didn't change, nothing to tweak on the backend... Users have gotten a quality and super fast search since then.
I highly recommend it!