Hacker News new | past | comments | ask | show | jobs | submit login
Applying BERT models to Search (blog.google)
165 points by moultano on Oct 25, 2019 | hide | past | favorite | 109 comments



Maybe they're getting better at natural language, but for me Google searches have been getting gradually worse year after year. I want Google, not AskJeeves.

The "frustration" is "increasing" "when" I "have" to "quote" nearly every "word" to get Google to actually return results with what I searched for instead of what it thinks I meant to search for.

And there's the frustration, computing today tries as hard as it can to figure out what it thinks I actually meant. I don't know if it is worse that a person who knows what they want can't get it when the computer disagrees or if the computer is actually mostly right and its algorithms start to push your desires in it's own direction and whatever motive.

Facebook already does this really, radicalizing people by engineering the most dopamine-driving content to the top either towards self-obsession or an us v. them bubble.

In other words, I just want a fucking regular expression instead of our new data-science overloads ruining our minds with artificial non-intelligence (for profit).


I hear you, but if you look at the examples in the link, the before and after the update results, you'll see that there's far less of that kind of ignoring of keywords.

For example a search for "can you get medicine for someone pharmacy" used to just show generic information about getting a prescription filled, skipping over the "for someone" bit.

The new results understand what the query is actually asking, which is pretty impressive.

I'm kinda with you, I grew up with a ctrl-f Google so I sort of prefer that behaviour, I think because I don't want to rely on an unreliable NLP AI.

...I was going to say "but" but.. no I think I just don't want to rely on an unreliable NLP AI. It's so frustrating when it doesn't work, which is often.


When you quote the words, do you find what you're looking for?

In my experience, the times google seems to have totally missed the point of what I'm looking for, it's usually the times that the answer I'm looking for isn't anywhere on the web. Things like "datasheet JK45690DFS" or "Types of asphalt available for local delivery today".

I wish Google had some way to understand your query and the results well enough to just be able to say "The answer isn't available on the internet".


Yes, usually it's because there's maybe 4 documents which match the search query. But if there are only 4 documents which match what I asked for then I want to see a page with those 4 documents! I don't want to see a page with 10 results which I have to manually scan through to see that actually more than half of them are irrelevant rubbish because they aren't hits for what I was searching for.


I usually fail to get hits on part numbers. Part of the issue is that part numbers seem to change, even while referring to the same device. Something that might have {8, 12, 16, 24} channels will have shared documentation, but the generic part number that documentation is under will be different than what’s on the package. If google really wants to show that they know better than me then they could identify these cases and show me what I want from a “bad” search term.


I find some luck quoting individual words.


If you can remember any of the queries that failed, I'd be happy to pass them along to debug. If you have it turned on you can look in your search history here: https://myactivity.google.com/myactivity?product=19


I can't at the moment, but as a hint, if I search for X Y Z then I expect the first results to contain all those terms, not 2 out of 3. Like others complain, it seems necessary to quote each on "X" "Y" "Z" to get what I want.

If you have to have a clever read-my-mind search, then instead of blending it into the main search, have 2 search types of 'intelligent' and 'precise'.


Google already offers a "Verbatim" search type just as you describe -- it's the same as quoting all words automatically.

In Chrome, you can set your search engine to "https://www.google.com/search?q=%s&num=100&tbs=li:1" to get this behavior by default.


google silently replacing "interesting" with "cool" brings up advice on cooling things (temperature) instead of keeping things "interesting" as in "interesting"


Yikes, that sounds like a software architecture problem. The "infer which words are more likely to get good results" layer and the "search for things using words" layer are blind to each other's behaviors.

How do Google's engineering teams get away with this obvious error? For how much they're paid, you'd expect them to be better about things that passive observers readily notice. Don't they use their own product, anyway?


Do you remember the full query?


how to keep an aquarium interesting

how to keep a cat interesting

looks like that pattern generally


Awesome examples! Thank you!


Duck Duck Go seems to handle that much better, although the first result is still about cooling. I assume it's synonym replacement


Wow..that does fail spectacularly! The first few pages for me are all temperature-related.


Here's one:

when are bonuses usually paid

I had great trouble finding results which were not advice for business owners.


>I had great trouble finding results which were not advice for business owners.

This, so much. There are so many search queries nowadays that have been SEO'd to hell, and there are just pages upon pages of crappy 4 paragraph articles on B2B company blogs just parroting the same information over and over again.


Thanks! Did you finally find a good page?


I never really found the kind of thing I was looking for.

If you're looking for more search feedback, my email is in my profile.


Maybe this is a result of optimizing for most of the population, which probably decreases performance for tiny minority who search for niche things that are hard to optimize for because of lower amount of data.


The ML layer is probably getting in the way of the end user getting to the smaller samples.

Used to be you'd get the best matches from the meta data on a page.

Now there's linear algebra both trying to determine what the meta data means and what the question means, so it's going to have grouping biases.

And do things like exclude seemingly random strings of numbers, because in the training data, that's usually trash, but for you, it's a part or serial number that you're looking for


I don't think that fully captures it, Google is an advertising company, and so its incentives are all out of whack. For instance, Google probably benefits from having the top results be slightly less useful as that makes the ads at the top more likely to get clicks.


Yup, this is a huge problem with google search these days. It keeps trying to figure out what your intent was, and although it works about 70% of the time. the other 30% is a complete disaster.


Especially when it comes to "less acceptable" things. Suppose, for instance, one listened to a rap song and then googled "lean and sprite" (codeine cough syrup and soda, commonly referenced by rappers as a way to get high).

Instead of returning search results of relevant rap songs, you get a bunch of links on drug abuse, rehabilitation centers, etc. Thanks for assuming I'm a harrowed drug addict, Google, but really I was looking for music.


Yeah, when i used google more often in the past, my experience had been more like 60% (google works) vs 40% (disaster). Now that i use duckduckgo as my primary search engine, the resultant hits haven't really improved, but at least i feel like i have more privacy. ;-)


Yea, personally I would prefer google to have predictable results so I can fine tune my search. When I put something in quotes now they can still expand it by synonyms.

But my mom doesn’t type keyword searches like I do, she types out sentence/phrase questions. Maybe the average user benefits from this stuff?


What i want from the search engine isn't an easy life. I want a tool that would give me, the knowledgeable searcher(hopefully) the biggest possible advantage over non-experts.

On one hand, it seems that Google, by aiming for the least common denominator - greatly reduces that gap.

But not really. I still think there's a good advantage to having search knowledge. I hope it stays that way.


This is why I really like the DDG system of !bang operators. Because each search engine works differently, if you're knowledgeable about the different engines, you can pick the best one for the job.

Just for fun, I thought I would try to figure out when the next solar eclipse would be: I'll use the simple phrase "Next solar eclipse"

I'd probably try Wolfram Alpha. On DDG I would type "Next solar eclipse !wa". It returns a date "Thursday, December 26, 2019 (2 months from now)" Nice.

Next I look at plain DDG: I knew it probably wouldn't be useful, but first result is a website that calculates the next solar eclipse. It requires another click, but it gets "Dec 26, 2019" right at the top of the page. Not bad.

Now for Google, who I knew wouldn't be sure (since their interpretation of the very exact phrase would be fuzzy): In big, prominant and confident letters it reads: "July 2, 2019".... thanks Google. Wrong.

The skill in searching is no longer in using the search engine to the best of it's ability, it's in picking the right place to look. This has sort of always the case, but has become more of a skill as Google has strayed away from improving for the knowledgeable searcher.


It's worth to note that the web has grown a lot recently. The number of web pages is more than 100 times larger than 10 years ago thus the size of information to get the exact page that you want should increase as well. Although Google did some works on this area, but it's possible (and very likely) that the speed of web growth simply outpaced Google's algorithmic improvements. I don't know if a single universal search engine can be the answer without very deep personalization plus pervasive tracking; maybe domain specific search engines can do a better job?


This is what ruined Twitter for me.

They hired a thousand engineers to guess what I like when I already followed the people I want to see content from and I don’t (always) want it in some random order they think is best, mixed with liked tweets they never intended to publicly share, I just want a chronological list of actual Tweets from the people I followed again.

Not a dice roll the AI will get it right 51% of the time.


Personally I was surprised how good the "intention search" algorithm is, most times I do not know the right exact terms to search for, especially when looking up things in a domain and the algorithm figures out what I actually wanted quite well. For the cases where I do need exact word match, like you said, quotes work fine.


This is because in this example you are expecting Google to be Ctrl-F for Internet (and not even that because you alse expect it to weed out what you think is spam somehow in the process, which is not a feature of typical Ctrl-F).

This update addresses the other side of the search spectrum which is meaning. Google has a very tough job of moving the slider between exact keywords and meaning every time someone makes a search. This is a step in the right direction, but the fundamental problem still remains - Google interface is optimized for ad conversion, not user experience.


Such is life when we optimize for the majority. We end up optimizing to only about the top 80% of searches, which are sometimes the flavor of "what is that website where everyone is friends with each other and I can see my grandkids' photos?" or "word for ice in latin".

It would be great if the system could infer the level of specificity associated with the query. Some people are just exploring a topic while others want to get to a more detailed document sooner.


is there a reason not to have a setting "i m not the majority" somewhere?


I am getting more and more annoyed as Google increasingly corrects my language to the wrong thing.


Have you calculated how much compute time running a single regular expression over all content on the internet would cost?

$$$$$$'s I'd bet... One big tech company used to let employees do it on an internal internet mirror... It wasn't cheap.


I say regular expressions tongue-in-cheek really meaning I want more mechanical and predictable machines that do what I tell them to in a straightforward way.


BERT is truly amazing. Almost all inovation in NLP uses BERT and transformers somehow. ALBERT will be the next HUGE thing for the next months, as it show results better than BERT with a small fraction of parameters.

We did a "Semantic Similarity search" for some documents, where we represent a document as a vector using BERT, and had to look for documents close to a reference document.

The results where breathtaking. It really returned semantically similar documents. You can do it now using ElasticSearch(But you really should do it using Vespa.ai, it is much faster https://github.com/jobergum/dense-vector-ranking-performance )


The first project I ever put together involving (extremely trivial) ML used BERT, and something about seeing it just work opened my eyes to the ML world and got me excited to work in the space.

If anyone is interested in hacking around with BERT, I work on an open-source project called Cortex that handles model deployment, and we have full tutorial for deploying a sentiment classifier using BERT quickly and easily: https://github.com/cortexlabs/cortex/tree/master/examples/se...


That's very interesting! If you have the time for it, you should consider experimenting with swapping in SpanBERT[1] instead of BERT in your usecase. They train on full length length segments instead of masked half segments (as in BERT). I suspect that this, besides the improvements that SpanBERT brings over BERT should enable you to feed in bigger chunks (more sentences) to the model before the averaging step, leading to fewer vectors to average and as a result, perhaps better clustering.

[1]: https://arxiv.org/abs/1907.10529


Thank you, I will read and try it. Looks very interesting!


I agree that it works very well for "more like this" document recommendations! But not great for user queries.


but the question is how better they are compared to existing similarity measures. E.g. for documents in a domain, even simple cosine is pretty good.


I did not understood. BERT it is not a similarity measure. For our use case we did use a simple cosine similarity to find the similar documents. But we have to represent those documents in a vector space. From our test representing the document as a MEAN of BERT Embeddings we got some very good results. Much better than BoW, Glove or the Lucene "More like this"


Dude, you are probably releasing trade secrets ;-)


Nah, lots of people are trying stuff like this :)


yes sorry. i meant a more naive, count-based vector representation instead of BERT embedding


I've been working on using BERT for search, for research and training development, with not so great results.

Note the quote "when it comes to ranking results, BERT will help Search better understand one in 10 searches". This is because of the "keywordese" point they noted earlier in the article. Most searches are 1 or 2 words - there isn't enough to grab onto for meaningful ranking with short queries and a similarity function for longer text documents.

Also, try keeping the systems afloat to handle search like this. BERT is not practical to use for search results by anyone without the scale of a company like Google. You need to have a server farm of GPUs to translate all your documents into tensors - and then keep them around somehow! A document of 10k text will balloon to ~1MB when converted to a multitoken vector representation. BERT uncased has 768 features - thats 768 floats per token you need to keep around. If you compress it using PCA or averaging across tokens, you lose all the juicy context that you need for the matching and ranking. Also, there currently isnt a good way to keep this stuff around yet (though there are active projects ongoing to get this into Lucene [1],[2])

I think this is definitely a great achievement in NLP - but it needs breakthroughs in other areas to be useable by product teams implementing search, with any reasonably large content size.

[1] https://arxiv.org/abs/1910.10208 & https://github.com/castorini/anserini/blob/master/docs/appro... [2] https://github.com/o19s/hangry


Distillation is usually used today to tame its resource problems at scale - you run BERT to squeeze out maximum signal from your training data and then distill the model e.g. into cheap CNN for inference.


Distillation reduces accuracy and removes the contextual precision. For example reducing a whole document to some N (1k or so) dimensions have worked very poorly in my experiments for short queries - typically making the relevance worse than basic keyword search.


You seem to be talking about dimensionality reduction, that's not what I was meant. Distillation is training a different model with a cheaper architecture (CNN, LSTM) on the outputs of an expensive teacher model like BERT. This has nothing to do with dimensions.


You might try vector quantization (instead of PCA) if you just need your 768 features to be smaller. ML features tend to be robust to some perturbation.


Well it’s one problem or another. If you compress too much you lose the value, and if you leave it too large you have the size problem.

Inverted indices are very efficient. How much of that can you give up at what trade off? If I’m only going to be better for 10% of queries, is that a cost effective solution? What if I spend the same amount of time tuning a traditional engine a bit more and get better accuracy for 5% of queries? Tradoffs rule the world of practical search implementations.


Just an idea: Maybe you could either train a model or use heuristics to translate from keywordese to English?


Haha I wish! Too much fidelity has been lost already. The model would just be guessing.

The sniff test is if a person can’t do it, then a model can’t either. Lots of queries look fine for matching, but you really have no idea what the intent or information need of the searcher is.


No, I mean before you feed it into the model.


I’m not sure what you mean. Keywords are keywords. The meaning behind what the user wants is in their head. You cant turn keywords into a sentence without guessing what they meant.


Search and a lot of the AI based systems these days feels like talking to a hard of hearing grandparent; as long as you’re saying about what they expect then it’s fine, but if there’s any nuances or homonyms it turns into a comedy routine.


An interesting snippet from TFA - "..with this release, anyone in the world can train their own state-of-the-art question answering system (or a variety of other models) in about 30 minutes on a single Cloud TPU, or in a few hours using a single GPU." https://ai.googleblog.com/2018/11/open-sourcing-bert-state-o...


How long till NLG becomes good enough that it can answer questions factually? I think integrating it with a knowledge graph might just make search obsolete.


I might be mistaken, but I believe that's already what Google is doing for a lot of factual knowledge (as a matter of fact they do call it knowledge graph).

Try for example to type "when was jfk born" in Google, you should see a factual answer fished from a kg.


> In fact, that’s one of the reasons why people often use “keyword-ese,” typing strings of words that they think we’ll understand, but aren’t actually how they’d naturally ask a question.

Funny thing, I've seen people mention right here on HN that for DuckDuckGo you need to adjust the style of your queries. This notion puzzles me, probably because I kept the habit of ‘keywordese’ from the olden days. Most of the time, results are about the same for me in Google as they are in DDG and even in Yandex—with the exception that Google is better at grouping related or similar results, and also if there's one or two sources having the search phrase almost-verbatim then they're at the top in Google. Apparently, I already need to learn talking to the site like it's self-aware, to regret ditching it.

Now, if some wondertech helps me to home in on the answer to my exact software or programming troubles instead of hundreds of vaguely related SO posts—I could really dig that.


Funny that you mention Google's grouping and related SO posts ... I still regularly run into that annoyance that SO hasn't figured out how to do canonical URLs and Google then proceeds to put the identical content from SO on two consecutive spots in the results page. SO does not want to fix it (it has been an issue for years and has been pointed out on meta many times), but I'm confused by Google's fail to consider them duplicates when their content is nearly identical (the difference would be just "hot network questions" in the sidebar etc which depend on the time of the page request).


I wish they would let us use the "keywordese" engine as it was 10 years ago if we want to instead of funneling us along with everybody else into the newer intent search engine.


This is the kind of search you see everywhere. 95% of queries I see for customers in most fields are only 1 or 2 words long (and they are typically a noun phrase).


I wonder how this affects their costs of serving a query. BERT isn't exactly computationally cheap to evaluate.


I guess the performance impact could be controlled by applying some cheap heuristics, something like having more than 3~4 words with a preposition. They're giving a pretty specific number (one in 10 searches), so this might be the case.


It is google, it is probably their secret sauce. I won't be surprised that someday Google makes custom ASIC chip to just run transformer models.


That's actually old news, and mentioned in OP. They build their own chips (TPUs) that are super cool. You can even use them as part of google cloud! Still, moving to BERT ain't cheap.


TPU is old news. I think what would the actual news will be a Chip that is customized/optimized to just run BERT.

The Transformer architecture itself stays mostly unchanged in the 2 years after it has been proposed, and with BERT/variants, most (competitive) NLP models are now Transformer based, it makes sense to make custom chips to just run Transformers, the same as CNNs.


Meh, not sure how much more there is to do to specialize for transformer specifically. TPU and GPU are mainly just fantastic matrix multipliers. And transformer is partly designed with this hardware in mind: the operations are basically the same operations you see in a CNN. And in fact, one of the nice parts of the transformer is that you can run it without RNN, making it even better optimized for the matrix multipliers.

Furthermore, tpus are a moving target themselves: as ML needs change, the team build new operations and optimizations into the next generation of chips.


Does anyone have a nice resource they recommend on what BERT does? I've gathered it was trained by trying to predict missing words in a sentence, but I don't have an intuition on how this is useful for downstream prediction (like, say, learning a word embedding is).


I appreciated this blog post: http://jalammar.github.io/illustrated-bert/


Thanks!


s there any public information on actually how BERT is being applied to IR?

For each of the scenarios they described they are just like "here's potential hard search query, and BERT adds magic language understanding which makes it all better ". It's non-obvious how BERT is actually being used though, especially at the scale and latency they need.

(I get that that this is Google's "secret sauce" and they might not saying anything in this particular use of BERT. But I'm curious if anyone had seen anything related.)


This blogpost is very light on details. It doesn’t at all say at which stage in the search process BERT is used.


Totally off-topic. Is it just me, or is this "People also search for" the worst feature ever?

For those who haven't noticed, this is the box that shows up under a search result when you return to google from the page you've gone to. 50% of the time I click the back button, this freaking box shows up, the whole page shifts, and I click the wrong link.

Oh and did I mention I never ever, not even once, actually use it?


>By using new neural networking techniques to better understand the intentions behind queries

Great, so the search results are going to get even worse?


"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

The comments below are much better.


[deleted]


Ok, but accurate-and-shallow is still shallow. What's needed is a substantive accurate statement. It's particularly the shallow dismissals which harm discussion—hence that guideline.


uhmm, weirdly enough it might explain why some (myself included) have found that Google's results were shit /sub-optimal. Turns out I might have been using Google wrong for the last few years. I am used to using specific boolean search parameters while it seems that Google have been optimizing for natural language. An example from the announcement may clarify - "Here’s a search for “2019 brazil traveler to usa need a visa.” The word “to” and its relationship to the other words in the query are particularly important to understanding the meaning. It’s about a Brazilian traveling to the U.S., and not the other way around. Previously, our algorithms wouldn't understand the importance of this connection, and we returned results about U.S. citizens traveling to Brazil. With BERT, Search is able to grasp this nuance and know that the very common word “to” actually matters a lot here, and we can provide a much more relevant result for this query." Personally I would never have formatted a search query in natural language. Perhaps I should have been.


Yes, this is the exact problem I'm running into, I hammer out my search queries like its a wildcard CONTAINS SQL statement, because a simple inclusive search should bring back predictable results.


If you can remember any of the queries that failed, I'd be happy to pass them along to debug. If you have it turned on you can look in your search history here: https://myactivity.google.com/myactivity?product=19


"AC7X0_R3.2.0.7d_ENG_NB.exe"

I was after a specific legacy driver last night which the vendor no longer has available. Google returns zero results for this, bing returned a few relevent results (but sadly still didn't help me get what I needed) in the end I went mooching through the way back machine.

It's this class of search that bugs me the most, I know for a fact something with that exact filename is out there on the web, I just can't find where easily.

If I run into more issues or can recall anything else, I'll forward it on.


I am curious why you take the pessimistic view. Their own testing shows an improvement for at least 10% of their queries:

"Google says it can now offer more relevant results for about one in 10 searches in the U.S. in English"


The improvements are likely for non-power users. As they make up the majority of users, it's understandable that Google optimizes for them. Still, it gives power users worse results because they know what they are looking for and they (often) use certain words for a reason. Google's default assumption ("the users have no clue what they are doing") leads them to a paternal "I think I know what you're going to ask ..." approach which often fails and leads to shit results.

The method to combat that is to start wrapping every word in quotes which is annoying. I'm sure intermediate users will catch on at some point, start doing it too and Google will drop the quote modifier. Let's keep it a secret so that happens later rather than sooner.


Natural language is a lot more expressive than boolean expressions of keywords. Perhaps experienced searchers who use the same approach to query formulation that worked well two decades ago are no longer the true power users. Maybe they are just dinosaurs.


This assumes a simplistic view of how its core search algorithm works, though. It's obvious through even some basic querying that it uses different strategies for recognizing intent based on the nature of the input: single words, addresses, arithmetic, business names, full text queries, etc.

There's no reason to think that they will enact this to the detriment of other queries. It _could_ happen, but I am skeptical - optimistic even. As they mention, this improves ~10% of queries. The other 90% likely represent different forms of query input and I would hope remain unaffected.


I think most of us technologically inclined users are in the habit of using specific words to get search results as we have been doing that for years. But I think a large portion of newer users actually ask full questions so google is optimizing for it. For better search results we might need to actually start asking exactly what we are searching for instead.


I struggle to find substance in your comment.

Could you perhaps expand with a few supporting details for your thesis?


I agree with his sentiment based on my personal experience.

But we'll see, maybe this will actually be better?

also, maybe we should stop googling in "kewordese"?


Each time they move away from what people type vs what profile data collected from various sources suggest we move away from the ideal.

Best example is ads targeted to keywords on the page vs shows ads based on past purchases.


Google works well when you are searching for things you don't know much about, your input is imprecise and it generally points you in the right direction.

However for the opposite case, when you are trying to find something highly specific, even down to an exact substring match I find the results to be very poor.


Aww :-(

I suppose you'll just have to make your own...


I just queried "I struggle to find substance in your comment."

Top results:

* Managing Your Feelings Without Substances ...

* Why We Should Treat, Not Blame Addicts Struggling to Get ...

Google missed the ball here, the word substance in this case is not about drugs.


I wish google would release google2008.com

On that website, they keep using their technology from 2008 and let me use it to search for what I want. I've had enough


One issue with that is that there has been an arms race since 2008 with "search engine optimization". If you used a crawler from 2008 on today's web, you'd probably get a ton of spam.


The results for half of my searches are already all spam. It would be better to just accept that and have google let me engineer my queries so that I avoid spam. More token based searches, less word vector/machine learning based search. Let me query their index like its an SQL database


Are you sure they are spam? Google's engineers consider annoying things like recipe pages with 20 paragraphs of stories before the actual recipe not to be spam.


i am being tricked into pinterest link-pages multiple times a day. That is spam ,and has very little to do with their AI work


Restricting it to 2008's web could be an interesting exercise.


I'm now super curious what percentage of pages from 2008 are still around/available today.


They'd also have to release "The Whole Web as it was in 2008"

What people don't seem to realize is, as much as you think Google has changed, the Web has changed even more. If you kept Google the same as it was ten years ago, your results would be far far more full of irrelevent, SEO'ed, spammed up content today.


BERT is such a great case of clever hans. If you scrub the shallow statistical similarities from question answer sets accuracy drops significantly .


Now google is trying to read my mind, my thoughts and my intentions. I've been running away from EVERYTHING Google.


And yet...

    $ host -t mx vuln.ninja
    vuln.ninja mail is handled by 10 alt3.aspmx.l.google.com.
    vuln.ninja mail is handled by 1 aspmx.l.google.com.
    vuln.ninja mail is handled by 5 alt1.aspmx.l.google.com.
    vuln.ninja mail is handled by 5 alt2.aspmx.l.google.com.
    vuln.ninja mail is handled by 10 alt4.aspmx.l.google.com.
Maybe you should try running a bit faster.


Ouch. The burn.


There's a lot of virtue signaling on HN.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: