Hacker News new | past | comments | ask | show | jobs | submit login
MUM: A new AI milestone for understanding information (blog.google)
257 points by chris_f on May 20, 2021 | hide | past | favorite | 208 comments



A not-so-subtle reading shows Google is doubling down on ecommerce applications here:

> It could also understand that, in the context of hiking, to “prepare” could include things like fitness training as well as finding the right gear.

> fall is the rainy season on Mt. Fuji so you might need a waterproof jacket.

> MUM could also surface helpful subtopics for deeper exploration — like the top-rated gear or best training exercises

> you might see results like where to enjoy the best views of the mountain, onsen in the area and popular souvenir shops

Or, my favorite line:

> MUM would understand the image and connect it with your question to let you know your boots would work just fine. It could then point you to a blog with a list of recommended gear.

(in other words: "Thanks for showing you're interested in hiking gear. Here's a lot of hiking gear you can buy.")


There's an even bigger picture than possibly monetizing ecommerce revenue (through... ads?). The biggest impact is that they get to use all the content generated on the Internet to create these search "results" that synthesize information from multiple sources without ever having to share traffic or ad revenue with those content sources. Clever.


This really is a section that needs regulation. You basically have to use and allow Google to crawl your site if you want a website findable by 95%+ of Americans, so websites really should be able to tell google how they're allowed to use the scraped data instead of just 'for anything'. Maybe a meta tag would work well.


> websites really should be able to tell google how they're allowed to use the scraped data

Isn't it a bit more complicated than that though? While you certainly aren't entitled to republish things, you (ie anyone and everyone) have traditionally been free to consume public material in whatever way you see fit. The precedent from the recent LinkedIn case regarding scraping supports this.

Also you focus on Google, but anyone with sufficient resources can scrape the public web (anti-bot cat and mouse games notwithstanding I suppose).

(Paywalled sites that allow Google to scrape them for search indexing purposes are an interesting edge case though.)


Should human experts, like someone preparing a blog post on gear, also have to pay all the blogs and books they read in their research?


I'm not proposing a link tax, i'm just saying that websites should be able to opt-out from their content being used for some Google products just because they have to give google a very permissive license to transact online business and be found in search results.


Would it be reasonable that a vendor to put their merchandise by behind an login page?


What possible reason would they have for doing so? They would lose sales and search engine originated traffic.


No, because it makes finding the merchandise hard, and sales will tank.


I am sure your left field question has a fancy term in the encyclopedia of fallacies, but are you equating Google's crawlers and AI equivalent of humans.

Even if you got an answer to your question, how would it affect the the parents assertion?

Should the internet provider of the person reading the blog post get royalties for the book the person wrote? Should you be paid for your snarky question, as the generator of content?


We shouldn't extend IP protections to general knowledge gleaned is my point. If we start overreaching into that for AI there would be little stopping overreaching into that for humans.


There is a reasonable risk that future research material will not be publicly available by the traditional copyright system. Effectively an "uber" google could leverage all previous content to answer your query severely limiting the utility of producing any new content or maintaining accessibility to pre-existing content.

If you break the distribution model of content producers there won't be any new content, and the old content will simply rot away.


Record disks broke the distribution of taverns and pubs.

Radio broke the distribution model of record disks.

YouTube broke the distribution model of old content producers.

People still make music and more content than ever before. People are not gonna suddenly stop generating knowledge because of Wikipedia. The prediction that "there won't be any new content" is hyperbolic and just plain wrong as demonstrated by history.


The previous examples disrupted content distributors. An advanced q&a system disrupts content creators.


When the content people create is pushed aside and instead used by a giant corp. without any payment, it is safe to assume that people will stop writing content.


Books are typically paid for indeed (even if you get them through a library etc.). Blog posts are an edge case, but the rise of paywalled sites indicate that more authors think they should be paid for their work.

The interesting point here is: can a non-free site let Google crawl it in order to be searchable, while not allowing them to exploit their content otherwise?


The site can ask for such a setup, but Google is very much in a position to refuse and just don't index that site, unless it's one of, say, top 1000 sites.

I don't see any legal ground for such a limitation, so I can't imagine what would e.g. eBay or Walmart put in their suit if they tried to put such a limitation and then saw Google not honor it. Maybe someone with a legal background could comment.


There isn't currently a legal ground for that (in the US). The context of this thread, though, was to propose that such grounds ought to be introduced by the legislature.


I can see the rationale but at some point Google needs to make money on what it contributes to the picture.

On one hand, Google is in large part responsible for the idea that they’re just a free service and anyone can use them without payment, so they’ve trained users to in essence think of them as a non-profit utility which they’re obviously not.

On the other hand, how do you allocate Google’s value when no-one has ever paid for it or been asked to pay for it?

You can legislate Google into the ground (unlikely, but theoretically possible), but what would replace it? Any other similar service operating at their scale would need to make money too.


And how else would a search engine work ? I am not Shure what your proposed solution would look like.


In the current world, information wants to be free. In the AI-powered future, knowledge wants to be free.


Those who created information had a problem with that past. Those who create knowledge will have a problem with that future.


I don't know if this is actually true or not but I suspect that a big part of their thinking is that "we are just presenting 'facts' and facts are not subject to copyright laws".


Forgive the analogy but that sounds like a parasitic relationship, and one that might kill off, or at least impoverish, its host. Even if Google isn't doing that, the potential exists. The counter is paywall, I suppose.


Another not-so-subtle reading shows google doubling down on being "responsible" which has a lot of collateral damage when they block or de-emphasize legitimate results that don't fit their own goals.


It rings a little hollow when they fired members of/disbanded their nominally independent internal ML Ethics unit after a member published a paper raising some flags on the kind of models Google is betting its future on.


“Responsible” AI was what Google invented after the Ethical AI purge.

The appearance is that it’s about AI being responsible for advancing corporate image and interests.


It’s also just a huge step back from ‘ethical’. Responsible implies one can hide behind ‘technical limitations’ or ‘business concerns’. It means you took the unethical option but you at least weighed the pros and cons first


I don't think the language you've quoted was explicitly intended that way. But I think you're onto something. I think high-context answers open up all kinds of new contextual surfaces where ads can be placed, products + product categories suggested.


I don’t know if it’s “e-commerce” specifically, or just a more general fact that Google own a search engine, and want to surface URLs from their index as answers to questions, when appropriate. And, when you think about it, why would you be linking to a page — rather than giving a straightforward answer — unless you’re linking to a product page / review / other page that offers you a direct means of solving a problem that goes beyond a conversational answer?


> unless you’re linking to a product page / review / other page that offers you a direct means of solving a problem

Embedded in this answer seems to be the mindset that only buying things will solve problems.

Don't get me wrong -- I'm not a consumerist luddite, I use my credit card points like any good and proper citizen -- but when your mindset is "all problems can be solved by buying more shit", well, that's a pretty lonely existence.

Google's gotta make money, and helping people buy useful shit is a fine way of doing it, but just don't fall into the mindset trap that everything solution in life is just a Google Pay away.


No, let me rephrase: often, problems can be solved with words. In those cases, the conversational agent wouldn't link to anything. It would just solve your problem.

But if a problem being solved necessitates linking to something, then what kind of problem is that likely to be? Usually one where you need to stare at something, mull over a bunch of details, and make a decision. What kinds of webpages are those? Usually — for public clients — those are product pages.

(Another potential use-case is that a conversational agent could help people configure to software/services by deep-linking to configuration screens — but that's not really a thing Google Search could integrate with.)


The Google of the future is a conversation with a salesman.


Or maybe it's a personal assistant that lives in your phone and asks you how you're doing everyday, acts like a friend, inquires about your mental health and well-being... And then subtly nudges you in the direction of buying X,Y,Z thing or service to help you fill that existential void in your life.


A digital cult leader that can talk to everyone in the world!


Search quality at Google has been decaying over the past decade. Accuracy and quality of search results is compromised to optimize advertising revenue, penalize competitors or neutralize threats, and cater to the various needs of political or regulatory authorities.

Google's search was at its peak in 2008 when advertising hadn't fully compromised search quality. Google is an advertising business that supports its otherwise money losing properties. Why will things change in the future because you can synthesize data from multiple sources only to compromise that quality with the realities of Google's business model?


> Search quality at Google has been decaying over the past decade.

Is there any empirical evidence to back this up? If we’re talking anecdote, I swear as soon as google started labeling ads more clearly people complained more about ads. And if google really is getting worse, I would expect that I would get frustrated with DuckDuckGo bit getting the job done less often.

I do share your concerns though. Just look at YouTube as an example. You search for something, and half way down the page are completely unrelated videos that you watched before. This is because YouTube just wants you to click, they don’t care about you finding what you were after.


The one example I usually give people is the one that led me to the realisation myself.

Try searching "how valve index works" or "how valve index controllers work". My interpretation of "how it works" is "technical information on how an item operates". Google will interpret this instead as "how well it performs its intended functions" and flood me with both links to purchase the Valve Index as well as endless reviews. Results on Google are not tailored toward retrieval of factual information anymore. They're tailored to ordinary, garden-variety consumers, and obviously designed to sell you a Valve Index.

To this day I still have not found really good information on how the controllers in the Valve Index actually work. All I get are pushes and nudges into getting me to buy something.


Those are good examples! I'll pass them along to debug. I think what's happening is that the wording is ambiguous enough that it's colliding with concepts like "how well does valve index work." If you search for "how does valve index tracking work" then you get results like this, which is more in line with what you're looking for. https://gizmodo.com/this-is-how-valve-s-amazing-lighthouse-t...


While this mic is on, also fix the time ranges for some queries. If you search “best Wordpress plugin for exporting data” , Google often gives me 4 to 5 year old links. Current top result is from 2017... where some plugins don’t even exist anymore :)


I think what happened was they pivoted from _document search_ to an interactive oracle app. I would have used “index controller principles” to get the documents describing it, which no longer works. And I think what you want is the document search back.

And these days they throw a lot of machine translated ripoff sites as well as some malvertising dummy type sites. It’s really something.


> I still have not found really good information on how the controllers in the Valve Index actually work.

Isn’t the Occam’s razor explanation here just that that information is not actually available on the web - not that Google is hiding it from you?


Not in this case:

See the first page of results for DDG's search on how valve index works:

https://duckduckgo.com/?t=ffab&q=how+valve+index+works&atb=v...

Compare to Google: https://www.google.com/search?q=how+valve+index+works


On Google I get some Wikipedia extracted information that says:

"The Valve Index Controllers have a joystick, touchpad, two face buttons, a menu button, a trigger, and an array of 87 sensors that allow the controllers to track hand position, finger position, motion, and pressure to create an accurate representation of the user's hand in virtual reality." with a link to Wikipedia: https://en.wikipedia.org/wiki/Valve_Index

The first raw result is: https://www.pocket-lint.com/ar-vr/news/steam/147913-valve-in...

That's the same link as DDG uses as its first result.

The second Google link is a YouTube video (https://www.youtube.com/watch?v=bD8Y9gcPGzs) that has details about "optics and resolution". The second DDG link is about sprinkers (https://www.sprinklersavings.com/blog/how-an-indexing-valve-...).

Google seems a lot better on this query.


Then they should just come out and say it: "no results found".

Not returning results at all seems to be stigmatised these days for every site.


Not being correct is stigmatized in all aspects of society anymore, thanks to ever increasing business leadership mentality invading our culture. Being wrong, failing, etc. is no longer acceptable. You have to provide the appearance of success in the absence of success.

I don't know why we as a culture can't accept that people fail and fail often. A bit more humility would do everyone some good instead of setting constant unrealistic expectations that hampers all aspects of society. It's completely bananas.


I'll give you one: Google image search is so insanely hobbled by the copyright squad (and possibly right to be forgotten, etc) that it's essentially worthless now. Reverse image search used to be a valuable tool. Now it just spits out generic garbage, even for images that clearly have a wide presence on the net. These days if I need to try to hunt for something, I just pull up Yandex and get the results Google used to give five or ten years ago (better even, since there's a bunch of neat added features like object recognition and automatic OCR).


I have an example from just yesterday. I am new(ish) to the rails ecosystem and spotted a `.ruby-version` file in the root of the repository. I didn't know what it was so I googled `.ruby-version`. The results were less than helpful because Google interpreted that as a search for the term `ruby version`. Fine, whatever, I will just fall back on double-quoting the whole thing, like `".ruby-version"`. A couple of years ago this would have worked perfectly - I know, because I've been doing it for years. But Google no longer respects this kind of search query, instead it tries to be too clever by half and end up being worse than useless.


I miss the days when punctuation marks were significant to google searches. And the days when you could use logic operands in searches like +&!

I’m glad I learned POSIX and especially Linux when searches were evaluated more literally. It was simple to locate relevant technical pages.

It’s a shame google doesn’t offer legacy search.


I can only give you anecdotal evidence, which is that myself and many others (per social media) are constantly appending "reddit" because the first N results are all e-commerce sites or thinly-veiled promos for them in the form of listicles.

e.g. Search for "camera with wifi" versus "camera with wifi reddit". If you're doing any research, you will find the latter more useful. Now I know some will say many people just want to buy the product and will be satisfied with a direct link to purchase, but the thing is a good search engine will mix in different types of results. What you get here is dozens of virtually identical results with any genuine info - e.g. a recent post on a reputable personal blog or a social media post - completely buried.

Do any other engines do it better? Maybe not. But Google itself certainly used to do it better, if only because it didn't have the majority of the internet trying to game its algo.


At this point, I basically need to know or find an authorities source first. PCMag still appears to be a good resource, moreso than Tom's Hardware and Wirecutter at times (I think.) It's sort of the same shit, but they seem to put a little more work into being right. Too many listicles that are "10 best" are really "the first 10 the author saw while searching." When coronavirus started, theres no way anybody writing most of those "review rollups" ordered and tried on any of the masks they assembled into posts. There are fewer and fewer places that seem to be trying things themselves before recommending them.

https://www.pcmag.com/picks/the-best-sony-mirrorless-lenses

Google really needs an authoritative mode that strips out or deduplicates the news cycle and blogsphere. Something that can tell that every post is basically the same thing and turns it into one entry. I want uniqueness and quality. I dont need the same opinion repeated across 10 urls.

A CTRL+F of the MUM page didnt find the word duplicate once.


next generation of search engines should have a config where people can customize their algorithm.

I don't feel that google did worse over the years, more like the commerce part of the internet overtaking the information part.

Actually you make a good point, since google has a shopping tab maybe they should show ads over there only and dedicate the "normal" google to general info


Not to mention that most of those listicles are an extremely shallow cross-section of available products. Reddit is far more willing to suggest off-the-wall options like used 5 year old hardware that still performs better than the newest shiny, and uncovers far more slightly options that are slightly off the beaten, consumerist path.


And also, a search engine with a greater bias on UX would be more personalised, so it would show those kinds of results to the people who regularly seek them.


It’s funny Google Search still does good at what they were intended for, searching intelligence by keyword to gain understanding.

Google neither care to confirm or deny but the origin of Google Search is reportedly some CIA/NSA internal program. Imagine there’s a ton of random Soviet documents, and you wanted to know what the codename chikensandwich in Slicebread division might refer to, or which document is referred to the most from other documents regarding the topic. Don’t you think, Google Search as you remember it does exactly that.

And this conspiracy theory explains why Search, Maps and Mail and very few other products built by such a laid back disorganized organization work so well and only those work well, that it’s because those are technology dump from NSA and Google is just an elaborate museum shop allowed to capitalize on their heritage.

1: https://qz.com/1145669/googles-true-origin-partly-lies-in-ci...


I came here for a technical discussion of MUM, but your comment just triggered something: my wife and I pay for ad-free YouTube (part of the music bundle). As a paying customer, I just don’t understand why they would annoy me with showing videos that I have already seen. A better UI would be a top level menu option to show history of watched video (and search writhin already watched material). Then the default page could remove already seen material.

I am a happy paying customer for GCP, Play Books+Movies, etc., but I think they need to step up the quality of their services for paying YouTube customers.

Thanks for your comment.


Not likely to find empirical evidence of search results quality, but I think there might be for an overall lowering of content quality. It is so much cheaper to mass produce unimpressive content than ever before.


How about image search? In 2008 there weren't product images and shopping campaign ads inline with the rest of the results. Also reverse image search is now being supplanted by Google lens search, which again serves up products and ads bases on what can tag in your photo.


Huh. Just ublock all those annoyances away. Didn’t know Google search is so ad ridden.


> If we’re talking anecdote, I swear as soon as google started labeling ads more clearly people complained more about ads

When did Google start labeling ads more clearly? See: https://searchengineland.com/figz/wp-content/seloads/2016/07...

Today the labeling consists of the letters "Ad" in black next to the result.

source: https://searchengineland.com/search-ad-labeling-history-goog...


Yes the number of ads at the top of the page has increased. The colors have blended ads into content. The number of sites shown is reduced. The amount of content indexed available has been reduced.


> Is there any empirical evidence to back this up?

No. This is the same HN post as "Facebook is dying, pretty soon all their users will be gone and they'll collapse".


How much of it is Google getting worse and how much of it is garbage websites hyperoptimizing for SEO? Practically all news websites are chock full of ads. There's tons of filler websites that just copy/paste text from Wikipedia, etc. Of course, Google could do a better job, but it's codependent evolution.


It's Google's prioritization with ads and preferred sites taking priority even over those SEO-optimized sites.

Google would much prefer to be the sole source of your traffic instead of pushing you to other sites. Google's business is advertising. Why would they want to lose that traffic?

Check this article about the Google MUM announcement, which basically says the same thing:

"MUM is part of Google’s long-term shift away from ranked search results and toward the creation of AI algorithms that can answer user questions faster—often without ever clicking a link or leaving Google’s results page. (Think, for example, of the “knowledge panels” that now appear at the top of many search results pages and display an answer from a website so you don’t have to visit the site yourself.) This shift promises to reduce the amount of work it takes to find information through Google. But it’s not clear that this is a problem in need of a solution." [0]

The Google of today is not the Google of 2008. Google in 2008 was a search engine. Today it's an advertising business that would much prefer you not leave Google properties.

[0] https://qz.com/2010802/googles-mum-is-making-search-worse-by...


> This shift promises to reduce the amount of work it takes to find information through Google. But it’s not clear that this is a problem in need of a solution.

Getting people useful information faster is the problem in need of a solution when you're Google. There isn't a point where that problem is solved; organizing the world's information and making it universally accessible and useful is an unbounded goal.


Is this just speculation on your part or do you have a source for this claim?


What speculation? I quoted the above article stating that, and there's more on this topic. There's a reason for those Infocards.


FWIW I upvoted you upthread to try and counter the inexplicable downvotes; to me the points you made are uncontroversial and almost self-evident. (shrug)


Garbage websites hyperoptimizing for SEO have existed since the late 90s. I agree with the GP, the issue I have seen over the deterioration of search in the past 5-10 years is specifically a result of their business model:

1. Any remotely commercial search has an entire first page of ads, organic results are pushed way down.

2. Google has made it difference between ads and search results as minimal as possible. I long for the days of the early 00s of big yellow boxes.

3. On many pages the amount of content Google stuffs in at the top before you get to actual search results gets more annoying every year.

Honestly, I wish I had a button that made Google result pages look like they did 15 years ago.


I feel like the main problem I have with Google results is that they never surface anything interesting or old. There are a lot of searches where it returns nothing useful, but if you add "reddit" it becomes useful.

Besides that, they haven't fought SEO enough on image search, since Pinterest took it over for years.


Totally agree with this. If you're searching for something where the keywords happen to conflict with a current event, good luck finding it.


> I wish I had a button that made Google result pages look like they did 15 years ago.

Or a browser extension.


How much of it is Google getting worse and how much of it is garbage websites hyperoptimizing for SEO?

Those are the same thing. If garbage websites can game their way up the search listing then Google is failing.

This is a simple problem of competition. Google doesn't have any, so they don't need to provide a good product. They can optimize for ad placement and revenue instead of search quality because users perceive that they have no real choice but to use Google. If another search engine manages to get some real market share Google results will get much better again.


Idk in the sense I feel that google has been doing better in fighting SEO in over time. I used to get crap results, but then again I was less experienced and did not use ads blockers


I would have agreed with you a few weeks ago. I recently switched my browser to the new Edge and I stuck with Microsoft's default Bing search because fuck Google and all that. I had two occasions in two days where Bing's search frustrated me with their results despite many efforts to tweak the query. I switched over to Google and its first result was exactly what I needed. These were cases where the page didn't contain the phrase I was looking for so it had to interpret/translate it to find the correct information and it did a great job.

A few years ago I felt that Bing and Google search were basically on par. Google has definitely upped the ante regarding search in the last couple of years. It may just be that it does more interpretation than you've come to expect so you need to retrain yourself how to query it. There are also occasions where verbatim search is required for technical topics. But Google's search quality has shown real improvements.


> Search quality at Google has been decaying over the past decade.

This one line is echoed again and again on HN and yet in my experience all its competitors still pale in comparison. I hate Google now as much as the next HNer for its evil shenanigans but their search is still superior and if a browser comes with a default like Bing or Ddg (like ff on linux mint) the first I do is change it back to Google since the results are truly aweful otherwise.


But it is worse than it was a decade ago. Across the board. There's more pollution than ever on the internet, and search engines are doing a worse job of separating the diamonds out.


Just look at the number and size of ads. Quite often, the entire first page of results is ads, and you have to scroll down to find the organic results.


Yeah - seems to jive with my experiences. It's a tough pill to swallow, but bm-25 and tf-idf along side pagerank continue to be superior to dense vector methods for search. Even dense-vectors with re-ranking models afterwards don't perform as well. I've been sad to see that models like BERT are becoming more prolific in search as they are a significant portion of why googles search has gotten worse...


They key here is that transformer-based "search" isn't actually providing links to the sources of information such as how search works now, but rather synthesizing information as a result of being trained on the corpus of Internet data.

In this way, Google gets all the value from Internet properties they don't own without having to push any traffic to those sources. So, they get their cake and eat it too. They create a way to regurgitate information from the vast trove of info on the Internet without ever having to share traffic with those sources by moving traffic from their search engines to those sites, like they do now.

They get to sell advertising to those who want to capture eyeballs for search results, without having to share any ad revenue with the content providers that are powering that transformer-based search.

Ain't it grand?


This has been coming for some time now, to be fair.

Now that it's pretty close to actually being here, the grim reality is that anyone who was expecting the status quo to just march on like always is going to get screwed over; and the a new wave of successful businesses will adapt to it and thrive.

It's called 'disruption', and it's a bit disappointing to see people here of all places complaining about it.

Sure, I get it, it's google, and if it was some nippy unicorn doing it people would be more enthusiastic, but ML is hard to do right, and having someone who's actually pushing the boundaries of whats possible is, in my opinion, pretty cool.

BERT made a huge contribution, and if this eventually flows out to everyone else to use, that's great news.

...and, if google stops sending traffic to some websites, well, too bad. We'll adapt; so will others.

The ones that can't will disappear.


Disruption is a descriptor, not a moral imperative.


> They get to sell advertising to those who want to capture eyeballs for search results, without having to share any ad revenue with the content providers that are powering that transformer based search. Ain't it grand?

Reminds me of spammers making spinned articles.


Source? I am quite intrigued by this anecdote for information retrieval.


Content of the article:

- 1000 times more powerful than BERT, but still transformer architecture

- trained on 75+ languages, can transfer knowledge between languages

- can do text and images (not audio and video yet)

- can understand context, go deeper in a topic and generate content

Not much apart from their words about how amazing it is. Paper? Demo?


Lol, they state that their model is a 1000 times more powerful than BERT? Under what metric?


According to my understanding they are referring to parameter count. If we go by that logic, BERT has 340M parameters. GPT3 has 175B. So this will have 340B parameters?


That's what I was wondering! Such gibberish


Well so far the're mostly talking about what it would be able to do, so it's probably more wishful thinking than any exact metric.


> trained on 75+ languages, can transfer knowledge between languages

There is zero possibility that Google accomplished proper "language transfer" with the vast majority of Silicon Valley programmers being native English speakers.

In some languages, if you accidentally use a wrong single syllable in any sentence, you can end up saying something extremely embarrassing--and entirely different. This is the case with many Slavic languages.

This is a memorable "classic" [1]:

> "Tony Henry belted out a version of the Croat[ian] [national] anthem before the 80,000 crowd, but made a blunder at the end. He should have sung 'Mila kuda si planina' (which roughly means 'You know my dear how we love your mountains'). But he instead sang 'Mila kura si planina' which can be interpreted as 'My dear, my penis is a mountain'."

Many languages are much more grammatically complex than English, and also have an unbelievable amount of implicit contextual information derived from the grammatical morphology. For example, Slavic languages tend to be this way. The Slavic language that I speak, Croatian, tends to be very clean, direct, and concise, while being extremely complicated grammatically. Also, we have a lot of the same words for the same thing in Croatian, which in combination with the complicated grammar, it makes it a very expressive language. English, however, can be more expressive, in the sense that it allows for more figurative language, like with the usage of idioms.

[1] BBC: Anthem gaffe 'lifted Croatia': http://news.bbc.co.uk/sport2/hi/football/7109058.stm


Modern NLP architectures do not explicitly model language structure. Even in English, the model isn't directly told anything about about how words work. So the native language of the human authors of the model is (in principle) irrelevant to how effective the system is.


> There is zero possibility that Google accomplished proper "language transfer" with the vast majority of Silicon Valley programmers being native English speakers.

This speaks to ignorance of who Google employs. A ton of the engineers are immigrants there. When I was on Google Photos in MTV, I'd estimate it being about evenly split between native, English-first speakers, vs people who were either non-native English speakers or grew up with two languages simultaneously (children of first gen immigrants in the US).

Silicon Valley has a huge amount of cultural and ethnic diversity, so I don't know why you would make this mistake.


> There is zero possibility that Google accomplished proper "language transfer" with the vast majority of Silicon Valley programmers being native English speakers.

I don't know the people who worked at this project, but you do realise that Google employs swaths of programmers that are not native English speakers?


There is nothing here but a promise. Back in the day we called this "vaporware".


I don't think it's vaporware but the blog post with all these big claims like 1000 more powerful than BERT (based on our arbitrary cherry picked metric) makes one cringe.

Here's my guess: Some team under web search trained a large Transformer based model but with some adjustment here but now on a massive dataset from the crawled web pages using tons of TPUs. It made an incremental improvement to the search quality metrics and was shipped to production.


We sort of already know that these models scale in such a way that a model with 1000 times the parameters is, indeed, 1000 times more powerful. We haven't found a ceiling effect yet, so the onus is on the skeptics. These things scale.


According to the scaling laws it scales on a log scale.


It's Schrodinger's vaporware. We'll find out some years from now. In Perl 6's case, what, 12 years after the announcement?


Except this is Google not some startup.


Vaporware also happens with established companies.


Like IBM's Watson after Jeopardy.


What vaporware has come out of Google Brain? In fact, they've been publishing ground-breaking research after ground-breaking research that's completely changed the entire field in recent years.


After seeing Alpha* solve Go, Chess, and protein folding in the past ~3 years, I think it would be pretty silly for your prior to be discounting any Google AI project as vaporware.

Their models accomplish ridiculously powerful things. Tbh I think it's far _more_ likely the answer is "this is crazy powerful, but the engineers didn't feel like writing a blog post about it, and the marketing team hasn't figured out how to monetize it yet".


If there's anything SoTA AI researchers love and have experience doing it's writing blog posts and papers explaining how.

The lack of details makes me think they're either hiding a new technique they'd rather keep secret because it provides a competitive advantage, or that it's really only a marginal improvement over existing NLP models (or an ensemble of them with nearly no improvement on any given metric) and the 1000x improvement is on a metric that no actual ML scientist would respect.

I don't have the slightest bit of information about Google's AI team to know if those are the only two options and if so which is more likely.


It's not a secret at all. Transformer models scale. Big models are powerful. Everyone knows this. Google can afford to train very big models. It's not a new technique. I think the issue here is that people are uncomfortable with the idea of AI models displaying scale relativity.


Big model also means lots of data, including lots of unfiltered garbage used in training. Nobody can manually review so much data, all they can do is automated filtering at this scale. So this means the model has a large attack surface and it is going to be used to do something bad and shame itself when put together with critics determined to find those gaps.

We have seen in the last few months attacks on Google Translate, GPT-3 and other language models from the PC crowd, including the famous AI Ethics firings. It's just tricky to show it in this climate.

The PC crowd don't believe language is fair and concepts neutral, instead saying they are an expression of systems of power. So language models are a natural target for them because they could amplify biases against their identity groups.

I find this critique hasty especially because big language models are nascent technology. We shouldn't throw away the baby with the bath water!


The PC crowd is right. Language encodes our cultural beliefs, and many of them are pretty rotten. But how do you update a culture's shared set of beliefs? Banning words is a symbolic exercise. What we tend to do instead is that we tell stories and share perspectives. We learn to empathize.

Figuring out how to feed language models with diverse sources of information is a tough challenge, but not impossible. I share Gebru's concern about "stochastic parrots".


I'll take logical reasoning over "stories" any day.

And calling language model "parrots" is flouting. Many people worked for decades to reach that accomplishment, here come the critics to shit all over it.

> But how do you update a culture's shared set of beliefs?

It's not the place of AI models to do activism, and it's a slippery slope leading to AI based inquisition. Take a look at how China uses AI to oppress their own people.


Stories are compressed representations of complex spatiotemporal patterns. We use stories to make sense of the world and to share our insights with others. And if you think about it, stories are essentially containers for if-then relationships. So they're not as far removed from logical reasoning as you might imagine.

I don't understand why you'd find the use of the term 'parrot' offensive. Language models extract linguistic patterns. GPTs generate patterns based on those which they have been trained on. That's a process that can be described as parroting. If you find it offensive because you think it implies that the researchers coming up with these models aren't worthy of credit, I think you are reading something into it that isn't there. At least not from my perspective.

When I mentioned updating cultural beliefs, I was referring to the traditional way of going about it: through cultural products. My point was that the "PC crowd" would be better off if they relied on this strategy rather than attempting to halt the development of language models. I was absolutely not suggesting that language models should be used to "train" members of society. That's a dystopian nightmare.


> We use stories to make sense of the world and to share our insights with others.

True, as long as you empathize with the story, and empathy comes from feeling united. But the PC army abandoned empathy in favor of identity, and think they are on their own, fighting a war, a zero sum game. Noninclusive politics is asking for empathy, how ridiculous, it's the same as demanding tolerance for intolerance.

They teach a whole ideology of guilt in order to dehumanize their opponents and cut the empathy towards them. They can't complain.


I think showing the model would immediately trigger the critics to nitpick it like the famous "He is a doctor. She is a nurse." case, so they just don't show it until they figure out a way to avoid that. Moreover, language models are easy to trick into politically incorrect conversations and porn. AI Dungeon's GPT-3 was writing lots of porn, for example.


In most sci-fi, you ask the ship computer a question and it can answer using the sum total of all human information.

But judging by the comments her, when Captain Picard asks the ship how long to Starbase 17 at Warp 9, rather than answer you want it to tell the Captain to visit WarpTravelCalculator.com

If you publish information in this world, there’s nothing preventing people from learning it and rewriting it in a new way. Humans do it all the time and they don’t pay the people they learned it from a portion of proceeds.

Future AI will do this too. I want machine learning to read every book and paper ever written and be able to answer queries and summarize things for me.

We may need to find a better model for encouraging content contribution to society besides copyright and demanding royalties on every use.


The analogy here doesn't work well for a few reasons....

1. It mixes mapping math calculations with published information like texts.

2. The AI in star trek worked to serve the end user, in this case Picard. In our world the AI systems are designed to serve the software's owner such as Google. It's not trying to give you the best answer. Instead it's trying to provide you responses that make Google the most money or get them into positions of power and influence the leaders want.

3. Star Trek takes place in a world where the Federation doesn't use money and everyone is motivated to put in a hard days work. On most planets they don't have poor. This does not fit the societal cultural dynamic we have now.

> We may need to find a better model for encouraging content contribution to society besides copyright and demanding royalties on every use.

Right now we have a problem where people are trying to step on content creators. I was reading an example of where singers were trying to get added to songs as writers when they didn't write songs so they could get more of the writers royalty from sales. We live in a world where some will beg, borrow, steal, plagiarize, and generally try to hurt others to get a leg up. Including many at big businesses who would leverage AI for that.

We may hope for the best but we should plan for the worst.


Most of those starship computers are autonomous. In the current "AI" model, they would be reduced to a mere glorified Amazon Echo speaker. I think that's an important distinction to have.


Very much this. People yearn for a world of a giant number of websites and software packages like in the old days, but the reality is that a humane computer may not need a lot of different interfaces.


When I tell people I work on Google Search, I’m sometimes asked, "Is there any work left to be done?" The short answer is an emphatic “Yes!” There are countless challenges we're trying to solve so Google Search works better for you.

Sorry to be off-topic but it's hard to get excited about blue sky ventures when the search UI offers no capability for simple things like delivering search results in date order. You can filter results by date, but not sort them.


I would bet that sorting isn’t so simple, at least if you want it to be any good. If you did a naive chronological sort, I imagine you would end up with a whole lot of irrelevant results at the top. There is just too much stuff out there.

To be useful, your “sort” would really just need to be another parameter to the existing relevancy model. And if you did that, then people would probably complain that “it’s not a real sort” and we’re back to square one.

Edit: You know what, this probably is simple for Google, because they’re freakin Google. To your point, I guess they probably don’t do this because money.


> I guess they probably don’t do this because money.

Exactly this. There are many controls they could have given us to trivially improve search for end users without needing this AI, but they would have made search less good for their customers, the advertisers.


It's really hard to design something like that that anyone would get value out of using because matching the query isn't a binary notion.


Another simple thing is that there's no way to not get localized results.

I'm currently in Spain. I'm not Spanish. If I want results that don't have to do with that country, and aren't in Spanish, I need to use Duckduckgo. Google is unable to not give localized results.


google.com/ncr or use a VPN


google.com/ncr doesn't work anymore. Both the interface and results are localized in Spanish when I try it.


Should you always be able to sort results by date? If I search for "California", doesn't really ever make sense to date-sort all the pages that match?


Is a good way to find things you had seen in the past, but don't recall exact date.


The range filter exists for that.

Sorting by date wouldn't work for that. You'll have a ridiculous number of pages of oldest search results (or newest, depending on sort order).


It depends what you're searching for. And searching within a range yielding lots of unsorted results is often unhelpful too. Of course, you can get around this with the API, but that is a lot of extra work for a student or non-computational researcher if they don't happen to already have those skills.


Within the range, the results are sorted by relevance.


Yes, that's what I'm complaining about. Pagerank's idea of relevance is often not what I need.


But if it is sorted by range, you can start binary searching on events.

I do this quite often with photos.


Sure it does. I might want things written about California in the past week, or in October last year, or in 2005.


You can already do that on Google -- press "Tools", then use the dropdown to change "Any Time" to "Past Week", or use the Custom Range option to select whatever time period you want.

The question is whether you should also be able to do a strict order by date, without regard to ranking. I'm not sure if you'd ever really want that.

---

[1] California in the past week: https://www.google.com/search?q=california&tbs=qdr:w

[2] California in October of last year: https://www.google.com/search?q=california&tbs=cdr%3A1%2Ccd_...


Yes, I use the filtering tool all the time. As a use case, consider where you're trying to find information about an incident or criminal case, where you can specify your search quite narrowly. But although news reports generally have the publication date extracted (and highlighted) as metadata by Google, There's no way to sort on this. It's a huge pain in the ass from a research perspective, because reports of crimes and arrests usually get far less coverage than reports of sentences or appeals, so the top results tend to more sensational but less informative.


s/less/more, sorry for the confusion.


The way search works:

1) terms are split into tokens 2) tokens are looked up to find documents 3) documents are ranked by scoring functions

I suspect sorting by chronological order might require too many document metadatas to be retrieved at step 2. (A lot of filtering occurs between steps 2 and 3.)


A simple explanation is product-wise this feature request doesn't make sense. The number of users will be using the said feature is not worth the effort the amount of effort needed to implement, maintain and operationalize in a product of literally billion users.


I have the same feeling every time Amazon announces some amazing thing and their Top Rated sort still isn't weighted by review count.


I really hope Google gets some competition in their NN endeavors because they are creating an economy that sucks in free information and eventually spews out buying recommendations. In the past they would compensate websites for providing the precious raw material for their results with advertising. With DL models websites don't need to get anything back. This will lead to stale information or pretty much end the web


You’re being downvoted but it’s actually an interesting issue. Many companies (yelp) already have suffered from quick results… at a certain point Google will have a hive mind but little reason to have you go any further. This is good as a user (hypothetically), but does not contribute back at all to the producers of such information who may have additional value to unlock.

Meanwhile the Reddit’s and whatnots can’t afford to not have Google index them, so this is just the price of admission. I wonder if they need an expansion to do not crawl that lets you specify how the data could be used?


Are there other reasons than financial compensation that someone would put facts on a web page?


They believe it increases the probability of a world outcome the publisher prefers (e.g. activism, advancement of humanity, ...).


Do you think Wikipedia is driven by financial compensation?


Wasn't Google supposed to have some sort of AI that could make phone calls for you? It looked amazing when they demo'ed it but I haven't heard diddly squat since then. Did they cancel that project?


It works just fine and has been active for a year or so now, except in one state (Indiana?)


might be region specific, but I can use it from my Google home devices. less useful when pandemic hit.


Their hiking question is an odd example. Technology like this is probably perfectly fine for asking questions with low downside for wrong answers. But if someone asks "I've hiked Mt Pirongia and now I want to hike Mt Taranaki; how do I need to prepare differently?" and Google erroneously answers "nothing", that could get someone killed.


Are you suggesting that's a reason to not do this research?


Not at all. I'm suggesting that when writing up a PR blog post, choose examples where applying your technology is a sensible and safe thing to do.


That makes sense. What would have been a better example?


"What is the difference between ebike model ABC and ebike model XYZ?"


I think that one is often answered explicitly by product review sites, for popular brands and models, at least.


The comparison matrix is basically infinite in size when you want to compare last-gen vs newest-gen across several manufacturers, for example.


Yeah, but the number of people trying to make money from affiliate links is apparently near-infinite as well. A vs B product comparison is amazingly well covered. For the times I've searched for it at least. Reddit forums handle a good chunk of it.

Come to think of it, that might be why Google wants to make these kinds of things automatic and structured. Perhaps they think the shopper's experience could be streamlined and more likely to purchase if comparisons are more consistent.


OK. I assume there are questions that aren't answered explicitly by existing pages, where wrong answers don't get people killed. If there are no such questions, that's not my problem.


“What are the stylistic differences between Rembrandt and Monet?”


Googling for that particular question, it seems there are several pages answering it specifically. The article implied that no particular page answered the specific question about Mt Fuji, and that MUM had to synthesize the answer. Unfortunately, this article ruins the search results for the specific query it describes. But, the top result describing preparation for Mt Fuji is quite generic.


An AI named after the British diminutive for 'mother' is surely a wise choice. I would not trust this AI unless it kissed my forehead and tucked me into bed.


I'm reminded of the parody search engine/character named "MOM" depicted in the tower-building game "World of Goo". She promises to make lots of cookies and offers to send emails with many promotional offers.


You will do as your MUM says. Mum knows best. Yyou will eat the bugs and you will like it.



“When in trouble come to Mum, Mum will do your little sum”

Don't know if it's related, but the above is Arup’s speech for the computer he christened Mumbo-Jumbo.


My first thought was comparing to “Mother” in the book/movie Alien.


it's just temporary until they perfect DADDY


In this day and age? Not likely. Daddy's turned into a eunuch. Mum's in charge now. There, there... Come to Big Mum.


A lot of knowledge on the internet is just wrong. Also a lot of scientific progress is driven by folks persisting against the current dogma. So that seems like a big problem. I imagine this is true for almost any subject where there is tribal domain expertise.


...and still, Google Suggestions cannot understand that in Switzerland, some population do not speak German (e.g. here in Geneva, we are a trilingual country), and only shows me search completion in German (from the browser search bar). And there is no way to change language there. I would prefer English.


what is your `Accept-Language` header set to?


English.


This isn't "better search" it's entrenched market domination from the only player with enough smarts, data and (crucially) users to make this work.

While Google is building a bigger and "better" Behemoth we should ask if this kind of innovation is really doing anything at all to make the world a better place in a meaningful way. Better monetization of search seems like a way to make the world worse in my opinion.


I love how the example is a problem only a rich techie would have.


The examples are created by non-tech people


There is no doubt that given the current state of AI, these requests would produce bullshit answers. AI is just not capable of constructing the proper conceptual models for now. But it sure can give you some answers.

It's sad to see that they'll be spending so much time, effort and money on this...


Edit:

"Google MUM MultiTask Unified Model Introduction" https://youtu.be/s7t4lLgINyo

I originally posted the LaMDA video: https://youtu.be/aUSSfo5nCdM


Even in the video he is just citing the same content of the article.


This video is so silicon valley, it's amazing. They've obviously spent a lot of money producing it, but it's all vague claims, there isn't even a compelling demo. I'm guessing they're aiming for an audience of mainstream journalists, but they're not actually launching a new product per-se. What gives? Why are they trying to hype something that's not ready, isn't going to be released as a product, and that they're not willing to properly showcase or even explain at any level of detail?


>Take this scenario: You’ve hiked Mt. Adams. Now you want to hike Mt. Fuji next fall, and you want to know what to do differently to prepare.

Ah yes, that totally common scenario which I'm faced with all the time.

I love this. It perfectly illustrates the peril we are in with the current state of AI research. That the author would choose this as a problem to solve shows exactly the socioeconomic class they come from, and how that influences the way they solve problems. It may seem like a trivial and meaningless example, but these subtle biases will creep their way into these systems and be amplified. And you can bet that this kind of work is the foundation for what will become the technology that eventually governs every facet of our lives once AGI is a thing.

I, for one, am terrified of the implications that a bougie tech bro AI overlord entails.


No need to resort to goofy phrases like "bougie tech bro." They're just out-of-touch rich people. Same as it ever was.

If that's your concern though, the good news is that in its purest form, machine learning tends to bend AWAY from this. You need large data sets to get good results, which means these projects tend to sample huge chunks of the general Internet, not just the isolated bubbles of SV types. Of course this still has limits, any data set has limits. You can only scrape data from the net if someone has posted that data in the first place, for example.

But in their initial form, a lot of these models are pretty diverse. That's why AI Dungeon had all kinds of "objectionable" content that kept getting the always-offended on their case: GPT-3 is just built off the general Internet, including a lot of weird, fucked up shit. The real problem is that inevitably someone complains, and they start hacking away at the ideal model to try to make it squeaky clean and ruin it in the process.

If you want to keep the tech from being perverted by "bougie tech bros," focus on the censorship. The models often start off pretty good.


I am surprised with how many people in this thread are equating mountain climbing with techbro culture. Really? How are those related? The fact that some techbros climb some mountains for fun?

How about the millions of people in rural counties and developing countries without access to vehicles who rely on walking across difficult terrain to make deliveries / get to work / get to school / visit family? Are they also techbros? My grandfather was an electrician in Albania and he would regularly walk dozens of miles on foot including through mountain ranges in order to get between jobs. Granted, this was dozens of years ago, but there's no reason to believe there isn't someone doing the same thing today.

If anything your own upper middle class bias is showing here, because you assume that everyone who navigates terrain is doing so for fun and not because they don't have other options.


Generally, it is the upper middle class that travel to different countries to go hiking. The lower class aren’t traveling to Japan to hike Mt Fuji. Also, hiking Mt. Fuji requires some care.

https://www.thesun.co.uk/news/10248155/climber-livestreams-d...

Indoor climbing is definitely a SF techie thing. Tons of tech people climbed at Mission Cliffs.


It's a behavioral shibboleth. They don't hike for pleasure or genuine reasons. They hike so that they can post a picture on Instagram. It's just a thing you do as part of the bland, petty, superficial, materialistic upper middle class bubble.


As someone with family members who immensely enjoy hiking for non-status related reasons, I can confirm you are entirely incorrect. They started in a pre-Internet era and as far as I know, haven't changed their motivations since.


Researching, travelling to and from the mountain, buying and maintaining equipment, and getting training for mountain climbing all cost time and money. Techbros have at least the latter in great abundance.


We are all, in general, smart. It's the data points from our surrounding that differentiate our opinions.


That's too simplistic.


Some Googler or Google fan replied to me yesterday with, "Sheesh. Why the FUD."

Ask MUM.


A bit off topic but I am wondering if there are open knowledge graphs in public?

Ignoring AI etc, my kids play a couple of games where there is clearly some backend that "knows" Taylor Swift is a Singer, is Female, and has acted in this movie X

You can go a long way in a Turing test with that and I was wondering if folks knew where those graphs were built ?


Wikidata [1]! They also offer a SPARQL endpoint [2], which you can use to programmatically answer those kind of questions. As an example, the page for Taylor Swift is [3].

[1] https://www.wikidata.org/wiki/Wikidata:Main_Page

[2] https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/...

[3] https://www.wikidata.org/wiki/Q26876


I think http://conceptnet.io/ is what you're looking for!


Makes sense. I want insights and context. If Google can do that synthesis that’s great. I do wonder about the training data and data quality though. When I do these targeted searches you have to filter the spam... books are somewhat better but nothing beats talking to someone who lives it or did it.


I can't see any link to an actual paper, anyone know if they released one for this?



I can't find one either and this article is just fluff.


I see a lot of people here expressing doubts and confusion. I want to try to clear up some of that.

The key notion here is scale relativity. This is the reason why transformer models have been so, well, transformative. Bigger models are better than smaller models in a proportional manner. That is, they display scale relativity. Where is the limit? Where does this break down? We don't know. We haven't found the ceiling yet.

Another important notion is multimodality. When you can cross-reference your text-based knowledge of an apple with your image-based knowledge of an apple, you can use this information as leverage. Archimedes said, "Give me a place to stand on, and I'll move the Earth." It might seem ridiculous to say that the same is true when it comes to information, but it is. Informational leverage is powerful. Multimodality allows you to make very accurate predictions. The McGurk effect is a nice demonstration of how we do the exact same thing. We rely on visual information from a speaker's lips to predict what they're going to say. In other words: we make use of multimodal leverage.

The twin notions of scale relativity and multimodality explain what makes MUM possible. As some of you have pointed out, there's another aspect that we can't ignore: utility. Google will be using MUM to make money. Which means that they'll have to train MUM to make you spend it. But if you're uncomfortable with this idea, you are uncomfortable with capitalism in general. Which is fair, but I think it's important to keep it in mind.

As I'm sure they've already considered at Google, MUM can be used to revolutionize education. Imagine people all over the world having access to an expert instructor who can answer all of your questions. You might think this sounds like a dream, but we're a mere stone toss away from achieving it. That's the true power of scale relativity + multimodality: we can now make advanced systems that can communicate with us.

I appreciate the skeptics and naysayers here: you keep the rest of us sane. For that, I thank you. At the same time, I want you to open your eyes to the possibility that something very important and transformative is happening right now. You don't have to go full Kurzweil, but I think you would benefit from reflecting on the opportunities this new technology might offer.


Yeah, I'm a little surprised at all the negativity here considering the game-changing potential of this sort of research. The HN crowd has always been a pretty cynical bunch, but come on! A single model that can extract information from images, text, and webpages across multiple languages and generate answers in response to natural language questions written by a user? This feels like straight-up wizardry!


I find it difficult that Google wants search to be easier for the end user - for example I believe a very long time ago you could setup sites to exclude from all of your searches - I don’t think this is possible any longer.


As usual, a lot of AI hype from Google.


"Since MUM can surface insights based on its deep knowledge of the world" Which just means taken from the millions of websites written by humans and used without permission or any payment.


  "Is there any work left to be done?"
The short answer is an emphatic “Yes! Dismantling your monster of a corporation!”


Any millennial who is using search for some time would easily know where to find what he needs. This sounds like Google is trying hard to drive more money out of its search business.


> "Is there any work left to be done?"

Google could search captions on all the Youtube (etc) videos. Not sure why this doesn't happen. Along with a few other big resources not indexed.

I think the big thing with the article(Taken as a workable technology) is it's not search, it's getting other peoples information and transforming it into a Google resource.

Which does add to humanities knowledge, but it's owned and profited on by Google.


When the text starts with "Is there any work left to be done?" The short answer is an emphatic “Yes!” I was sort of hoping they would announce that pinterest will now be banned from all non-image search results...

Instead it's an announcement that Google has made a new, even bigger, pile of linear algebra that can sort of answer questions and won't end up like Watson.

I like that they put in a deadpan bit about how they are very ethical when they make and then exploit their huge collections of data found by their spiders. There sure hasn't been any AI controversy at google this quarter, no sir-e!


"When I tell people I work on Google Search, I’m sometimes asked, "Is there any work left to be done?" The short answer is an emphatic “Yes!”

Hands up everyone who is 100% satisfied with Search ... ... OK no one.

So now we have an unsolved problem left behind in favour of ... chat about mountains ...

"MUM has the potential to transform how Google helps you with complex tasks. Like BERT, MUM is built on a Transformer architecture, but it’s 1,000 times more powerful. MUM not only understands language, but also generates it."

Piss off and while you are at it, get BERT to explain my response to MUM or vice versa.

If MUM can decipher my immediately prior sentence given this input then I might start to get interested.


There is nothing in that press release that could not have been done in the 1980s with Prolog.

Yeah, it’d have been more code but you would not have needed to destroy a forest to train the thing.

This is the NLP trade off of the 21st century. The code is easier to write but the model is completely opaque, and you need to really burn a lot of electricity to make it work.


This is totally false, I dare you to write anything close to e.g. BERT with Prolog.


> This is the NLP trade off of the 21st century. The code is easier to write but the model is completely opaque, and you need to really burn a lot of electricity to make it work.

This is basically a meme now. We actually have a pretty good understanding of how the models work. In fact that understanding is how you can do things like build chatbots that don't spew hate.

Also the electrical cost of ML training large language models is indeed high (e.g GPT-3 has 175B params and is estimated at 190,000 kWh to train on GPUs). But the folks who pay the cost (Basically OpenAI, Google, MSFT, Facebook, Amazon) are incentivized to make that go down (TPUs are way more efficient than GPUs), and they are incentivized to do it infrequently because it costs $$$.

FWIW Google's datacenters are also technically carbon neutral. I know that's not great because carbon credits don't have the impact that folks think they have, but there is definitely a difference in ecological impact from datacenter electricity and other kinds of energy usage (e.g cars all burning fossil fuels).

Okay also let's compare to bitcoin, which is the real ecological disaster if we want to talk about inefficient software: ~387,096,774 kWh PER DAY. _and_ incentivizing things like cheap coal, and miners are definitely not using their crypto wealth to purchase carbon offset credits :(


I mean yes. But it is a funny example to choose to illustrate the power of a NN approach. They're talking about mountains--an entity that has very concrete and definable attributes (e.g., height). And the rest of the examples are similarly dealing with semi-structured data that could theoretically be represented in RDF or something like that.

There's been a bit of discussion on HN lately about the effectiveness of sophisticated models vs. just good metadata.


The better wager is: I dare you to write and train up a sophisticated real-time neural network model that can interpret human language and provide reliably useful contextual search results with the compute power and memory constraints of the 80s.


Why would anyone take that wager? I see no reason to believe that's possible with either Prolog or NNs when you're restricted to 80s hardware.


Exactly my point. The wager was rhetorical




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: