Hacker News new | past | comments | ask | show | jobs | submit login
How MDN's Autocomplete Search Works (hacks.mozilla.org)
227 points by oedmarap on Aug 3, 2021 | hide | past | favorite | 69 comments



We took a similar approach for our documentation search. [0]

You can see the "inverted index" is rendered inline in the page, since everything is generated at build time.

When you type something that matches a key in the index, we fetch that index key and add it to the results. [1] [2]

Obviously we could do a lot better in terms of relevancy, but it's simple and fast.

[0] https://docs.fastcomments.com/

[1] https://docs.fastcomments.com/index-ublJLBnXgz88.json

[2] https://github.com/FastComments/fastcomments-docs/blob/main/...


Relevancy is the huge game-changer. MDN uses pageviews analytics to determine was a "popular" age is.


Indeed, that's a great idea.


I remember doing something similar a few years ago, I needed autocomplete for a shipping ports field, the data was too big though so I ended up using a csv file in an aws lamda function that filters based on the selected country and returns a much smaller subset. It lazy loaded after the user selected the country. To keep response times low I had to do a binary search on the raw csv bytes. It felt like I was reinventing databases but I liked the idea of it being self contained in a function.


Have you looked into s3 select api, you can do sql against a flat file and return just the result all from s3, not too shabby to build a poor mans serverless database.


My favorite autocomplete library is an ancient version of bootstrap-typeahead.js by Twitter. A single file with less than 400 lines of Javascript. They don't make these anymore :)

I use it everywhere where I need autcompletion. For example on the Music-Map:

https://www.music-map.com

I made a fork of the code which is available here:

https://www.gibney.org/0g-typeahead


Getting accessibility right is hard. We very much care about that. One of the strong reasons for why we're using Downshift.


I did my first autocomplete search UI with that library.

These days, due to the rest of the project, I've been using Angular and Material's Autocomplete component, which I've found very easy to customise for in-memory indexes or hits to a remote ElasticSearch 'suggester' proxy endpoint.


It seems to be missing the most obvious feature: keyboard navigation. My guess is that accessibility is not well implemented either. Probably why other dropdown implementations are bigger code-size wise.


It has keyboard navigation. It is just not enabled on the Music-Map.

Gnoosic uses the same library, and here you can navigate with the cursor keys:

https://www.gnoosic.com/faves


They hi-jacked the browser's `/` key to focus the field, which is something I hate. As a user, I want `/` to bring up Firefox's quick search bar, especially when reading documentation.

They should have just had the search field focused automatically but that would have done away with their "clever" hack to lazy-load the DB containing every page name.

Also, I'm confused, I thought https://mdn.dev/ was the new thing because Mozilla was stepping back from MDN. Is it a fork? They both carry Mozilla logos, so what's going on there?


Yeah discourse does the same. Sometimes i want to search within a post for some keyword. But ctrl+f redirects you to the global search... that global search only helps if you want to find interesting posts, but it does not support searching inside one, nor does it allow limited search within a thread. So I started using / in discourse discussions. Then that one was being overridden as well. I've heard the recommendation that you turn js off, which gives you a saner experience.


I hate this behaviour in discourse as well, but it hadn't occurred to me to try using it sans JS altogether, since it seemed to be pretty dependent on it. Will give that a shot for sure.


> But ctrl+f redirects you to the global search

Press ctrl-f twice.


Oh thanks for that trick. It violates the principle of least surprise so much but it does what I want. Thanks again!


I sometimes use Ctrl+G (Find Again) to get around sites hijacking / or Ctrl+F when I want to use the browser’s search functionality: if the current search doesn’t match anything or if there is nothing to search for, then it opens the browser’s find bar. If that was hijacked too, you could still focus browser chrome (e.g. Ctrl+L, Ctrl+K, Alt+D and sometimes even others to focus the address/search bar) and then press Ctrl+F.


I knew the existence of "/", but never figure out why I should use this instead of Ctrl+F. What's the difference (other than have fewer features)?


This seems to be a good introduction to Quick Find.

https://www.tenforums.com/tutorials/120679-enable-disable-qu...


Ok, so the difference is:

1. It disappears after a few seconds.

2. It has no "next/previous/highlight all" etc. buttons (it still have these features, just no clickable buttons)

It still makes no sense to me.

I guess maybe a small portion of people would find the auto-disappearing thing useful, even though in normal Ctrl+F all you need to do is pressing Esc.

But the second "feature" totally baffles me. It's not like Ctrl+F is some expensive GUI to launch, why would I want to not have these buttons? Even if you don't need them at all (I don't), you can simply not click them, there is no downside by having them.


Does the usual Ctrl+F GUI support filtering down to links only?


It doesn't, but "/" does not do that either. It's "'" that works that way (and obviously useful). I just don't know about "/".


You can use the single quote character to search only links


And this has collision with my vim extension


The only difference I know of is that "/" focuses links. So when you press return, it loads the link instead of jumping to the next result.

It's quite nice for keyboard-only web navigation.


> "/" focuses links.

It's ' to trigger quick find in links only mode.


I had the same confusion with his comment but I think what he meant was that when you highlight a result in a link, pressing enter causes you to follow that link (which is true). You are correct that ' focuses on only searching within links though.

Enter never goes to the next result though, so I am not sure if that is just something different between his setup and mine. I have to use F3 to go to the next result.


When I talked about enter going to the next result I was comparing it to the Ctrl+F search bar.

Sorry if that wasn't clear. (English is not my first language)


Firefox lets you disable keyboard overrides on a per-site basis, if that's something you're interested in

Page Info -> Permissions -> Override Keyboard Shortcuts


> They hi-jacked the browser's `/` key to focus the field, which is something I hate.

You're not the first one to point it out. Please join github.com/mdn/yari to raise your voice. It's an Open Source project after all.

> They should have just had the search field focused automatically

Why? There's a lot of JS to load to make that work. If you never need to do a search (e.g. from a Google search) it would be a potential waste.

> Also, I'm confused, I thought https://mdn.dev/ was the new thing because Mozilla was stepping back from MDN. Is it a fork?

That domain is just an alias we don't currently use. It's still the old MDN from Mozilla. No fork.


> Why? There's a lot of JS to load to make that work. If you never need to do a search (e.g. from a Google search) it would be a potential waste.

Confused by what this comment is meant to say exactly, but just in case its not known already, seems this situation is what the autofocus attribute is for @ https://developer.mozilla.org/en-US/docs/Web/HTML/Global_att..., no JS needed


the op is saying "if we autofocussed, we'd need to load all the JS involved in performing the search"


> They should have just had the search field focused automatically

No, this would be extremely wrong: it’d open on-screen keyboards automatically on platforms that use them, mess with screen readers by dropping them in the search box rather than at the head of the page, and break keyboard functionality, most significantly things like arrow keys and Space for in-page navigation.

The autofocus attribute sometimes seems like a good idea, but it’s actually almost never desirable.


> Also, I'm confused, I thought https://mdn.dev/ was the new thing because Mozilla was stepping back from MDN. Is it a fork? They both carry Mozilla logos, so what's going on there?

It seems to me that mdn.dev is intended to be the future home of MDN web docs since it is collaborative now, and no longer exclusively managed by Mozilla. But they haven't actually made the transition yet, as any link on mdn.dev points back to the old (current) site at developer.mozilla.org


GitHub and GitLab do this too. Is there a way to prevent web pages from hijacking this key? I almost never want to use their search engine and when I do, I'm fine with clicking on the input box.


> They hi-jacked the browser's `/` key to focus the field, which is something I hate.

This is one of the two comments on the article (the other being the rationale as a response for why they used the '/' key to do that behavior.)


Great, even if I wanted to use this, I couldn't. On QWERTZ keyboards, you reach / by pressing Shift+7 which triggers the quick search bar but not the MDN field. Many programs use shortcuts like that and it really sucks.


imho, they should've opted for CMD/CTRL + K, which Algolia's Doc search uses

> They should have just had the search field auto-focused automatically but that would have done away with their "clever" hack to lazy-load the DB containing every page name.

this would steal away the focus and is not good for accessibility (unless you're building a search engine)


Now I am curious if, in the real MDN production site, serach-index.json loading is triggered by the execution of /static/js/autocomplete.js, when their download should really be started in parallel by the shim.

Many websites leave a lot of performance on the table because of such behaviors.

My hypothesis is that, since this is easier for the developer, and works good enough, not many people really care. But these things add up, and the web becomes slower and slower.


MDN has 2 search things: 1. client-side only which downloads a complete list of all titles. 2. full-text search on everything with Elasticsearch.


First of all, great write up and interesting solution, thank you for that!

I think GPS question was rather whether the page loads the start autocomplete script on focus, and the script triggers download of the json data, as in the pseudo code, or whether the real code triggers downloading of both in parallel (the script and the json data)?


I like mosra's search, implemented in m.css for magnum. He wrote a blog post on it here: https://blog.magnum.graphics/meta/improved-doxygen-documenta... and you can try it on the magnum docs site: https://doc.magnum.graphics/magnum/#search

Fast and can be served from a static site.


Are they using react for just this one thing on this page!? Honestly it wasn't very clear to me, but they seemed to indicate that. I wonder if that's just because it's the pattern at MDN, but I feel like shipping react along with the JSON has got to be huge.


React's only 100kb (or ~30kb gzipped). You're not likely to notice that if it's fetched in the background.


100kb is a lot in the context of making one specific component work. If it's just one thing you could do it will far less overhead. Maybe preact or a native web component or something. If I'm not misunderstanding, then it does seem like how people would include jQuery just to select a few elements and change their classes or make an ajax call.


As an aside here... MDN docs are pretty awesome. I've been learning pure vanilla Javscript on this site more than anywhere else.

I used to automatically include jQuery in every project as a habit/reflex.

Now, because of MDN, I never do that anymore, unless It makes total sense. Kudos guys!


I think adding search to the HTML standard makes more sense overall. The thing I hate about search like this is that they don't work with JS turned off (e.g. terminal browser). Why not just add a JSON search component to HTML itself?


Because there's no one-size-fits-all solution for search. It's arguable that such a thing wouldn't even suit the needs of a small fraction of folks who were interested in a similar feature.

Plus, if it was added to the spec, then Safari users wouldn't be able to use it.


There is though.. hence why they call them web standards. Nothing I hate more than each website implementing a non-standards way of doing things. If Safari doesn't want to stay current with web standards then so be it. That is their prerogative. If a client-side JSON search component was added to web standards then I have a feeling that eventually they'd have no choice but to adopt.


In the code snippet they show the `startAutocomplete()` function checks for the "started" variable being true; but never actually sets it to true.


It's pseudo code. The real code is TypeScript React and looks very different and it wouldn't serve the article to take snippets from that code to explain how it works.


Of course, otherwise any amateur would be able to see looping network requests right? Just saying that it'd been good to have accurate pseudocode on an MDN article.


i'm wondering how much kb it loads before ready to search?

update: 144KB for JSON file

a little bit worrying, given their scale and potential bandwidth requirements


Yeah I would think this file size will increase well over time. Maybe a part 2 of the article can go over how updates to the file are made when new content is published and possible scaling solutions.


For content like this, it's much easier to download the entire search-index.json and run the auto-complete against that.

Rather that than hitting a search endpoint (after typing a certain amount of characters).


Sadly in 2021 adding 140KB to a page isn't a big deal (given how heavy the rest of the page probably is) - but it really should be.

A large chunk of the world's population still pays a locally-expensive rate for mobile bandwidth, and we're increasingly leaving them behind - or worse, pushing them into zero-rating internet plans which mean they can only use Facebook and WhatsApp while avoiding the rest of the web: https://en.wikipedia.org/wiki/Zero-rating


It's only added if the user shows an intent to search. And if you want to search, 144kb is a decent price to pay for instant search once it's downloaded


Oh I'd missed that - yeah loading it on-demand the first time they attempt to search is a much better strategy.


I thought this was going to be about advanced usage of <datalist>: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/da...


<datalist> is awesome! But I find it works better for short options. See https://www.peterbe.com/plog/datalist-looks-great-on-mobile-...


Typos and fuzzy searches cause <datalist> to break.


I can't wait until FlexSearch reaches 1.0.0. Reading the source code is like reading great literature.


(author here) We're still on FlexSearch 0.6 and the new 0.7 is a big refactor. I hope we can upgrade some time.


Any reason for not using semantic versioning?

I see in your npm page that despite compatibility "some adjustments" might be needed, aka you broke compatibility. If you did a breaking change, you need to up that major version my man.

Please stop with this sentimental versioning, it just causes issues for the rest of us who want to rely on npm's ability to not upgrade stuff on breaking changes, now everyone's gonna have to lock you package version to 0.6 so they don't get your breaking stuff from 0.7.


Semver allows for breaking changes with major version 0: https://semver.org/#spec-item-4


I miss the old search that let me narrow things down by category.


What do you miss about it? Can you not find what you're looking for?


The new search tends to give me the CSS property when I want the JavaScript property and the SVG attribute when I want the CSS property etc etc. It’s always choosing the wrong category.


so, "144KB over the network" - how much does that equate to in memory?

And that's per page where the search has been activated, correct? With no sharing of that dataset between each page?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: