Autocomplete with Redis

RyanMcGreal · on Sept 13, 2010

>Too bad that I'm forced to play the game of the SE0 L00z3r

There's building a link farm, and then there's giving your article a descriptive title.

antirez · on Sept 13, 2010

I agree in general terms, but it's sad that search engines are not advanced enough to do a reasonable job even if your title is not very descriptive. My non descriptive titles are lame, like "Redis weekly update number ...", but there are many artistic, paradoxical, weird-in-an-interesting way titles in the writing tradition that may disappear because search engines are not good enough.

danielnicollet · on Sept 13, 2010

Thanks for your good post.

I actually think it's reassuring that there will always be a creative and unexpected title that titillates the brain in a way that search engine automation algorithm will never equate. By the way, that why most of the CMS systems allow for an alias used in SEO-type descriptors like search engine friendly URLs or meta titles.

If we leave search engines to choose for us the most creative, interesting, and talented content creators, we are in deep trouble I think. Until then, you can keep surfing randomly once-in-a-while and finding out how far Google is from guessing what's most interesting on any subject. Google and other search engine always tend to return lowest common denominator results on any subject, rather than a new creative and rare perspective on that same subject. It's the nature of automation for the masses.

Search automation such as an autocomplete is useful nevertheless. I also agree it can be vastly improved. Autocompletes are there to help you improve the quality of how you formulate your query, help you find content faster, and help you find related, misspelled, or otherwise relevant content. Here is a paper I wrote on advanced autocomplete features of which most have yet to be implemented by Google for instance: http://www.exorbyte.com/index.php/White-Papers/ajax-incremen...

RyanMcGreal · on Sept 13, 2010

I understand your concern, but I don't think those things will disappear. Not everyone cares about maximizing page views, and quirky, artistic, esoteric websites with narrow appeal will persist even in the face of SEO.

smoody · on Sept 13, 2010

One nice side effect of a descriptive title is that it makes it much easier for those of us who use bookmarking services and things like the Hacker News bookmarlket to tag your article since we don't have to come-up with a descriptive title. People are lazy and lazy people around the world are rejoicing! :-)

ergo98 · on Sept 13, 2010

You're defending the ridiculous, and really the first two paragraphs of your entry are just unnecessary defensive noise that encourage readers to hit back.

Your title is now vastly better for everyone, from search engines to casual readers.

antirez · on Sept 13, 2010

Anyway what I did was switching to better titles :)

I too think it's a good idea in general btw. Don't want to defend my lame titles, but clearly there is a technical limit in our technology. An human will easily recognize that a given article is about something regardless of the title. Just hoping that in the future we'll be ok even with lame titles, but the right thing to do now is indeed switching to saner, descriptive titles.

ergo98 · on Sept 13, 2010

To some degree I think that which you seek is already available -- everyone is going to link to your post with link text about autocomplete, and that will add the semantic meaning to it (in the same way that a googlebomb works).

antirez · on Sept 13, 2010

good point, thanks for the hints.

antirez · on Sept 13, 2010

Update: the Ruby script was broken! Thanks to Pedro Melo for finding the bug, now it's fixed.

geuis · on Sept 13, 2010

How do the memory and speed requirements of the described technique compare to other systems like couchdb or solr? We use solr at work for a range of projects, including autocompletes. Specifically, I believe it has lexicographic features built in so we don't have to populate every version of a potential search

antirez · on Sept 13, 2010

I'm not sure about performance and memory efficiency of the other systems you mentioned. In Redis it is possible in theory to add a command to perform this task against a sorted set, without adding prefixes. Otherwise it's also possible to run binary search directly using ZRANGE, against the sorted set. This way the sorted set can only contain the actual words without prefixes (the memory complexity will be the same but with a smaller constant factor).

I don't think we'll add such a command, nor that a more complex implementation is a good idea, because this algorithm is pretty straightforward and very fast, and can be modeled using the pre-existing API. Also in most completion systems you want to complete only frequent items, so I'm not sure if it's a memory bound problem in practice.

That said I expect the proposed solution to be able to deliver at least 10k completions per second in a small virtual machine.

bl4k · on Sept 13, 2010

I benchmarked it. My binary search using ZRANGE was fast, but not as fast as your normalized method:

    Ran 10000 queries in 26.52s using bsearch
    Ran 10000 queries in 3.87s using normalized method

the advantage in using the bsearch is that you have a clean data set, incase you use that set in other parts, and it is easier to manage the data.

antirez · on Sept 13, 2010

Hello! like in most Redis benchmarks, without concurrent clients you are measuring latency in this way, not real performances! You need to spawn N threads, and if you do it the wrong way (N ruby scripts for instance) you'll not be ale to actual meter the performance as the clients will use all the CPU...

I think the normalized method is faster indeed but it's not possible to tell from this benchmark.