Inside Google Brain

api · on July 17, 2014

"Google isn't really a search company-- it's a machine learning company."

No, it's an advertising company. That's who pays the bills. At the end of the day all this cool tech is to better understand and model human beings in order to better push ads.

Sometimes it depresses me that so many of the world's most brilliant minds are working on that, but a generation or two ago they'd all be building doomsday bombs. I guess that's some progress.

cpeterso · on July 17, 2014

Google is DoubleClick, literally. I don't think people would be so quick to use a web browser or mobile phone built by DoubleClick instead of Google.

magicalist · on July 17, 2014

Only 22% of revenue was from ads on other sites in the last quarter, so, no, Google is not DoubleClick, literally or no.

They do own what was DoubleClick, but it's quite different from the DoubleClick of yore (malware distributed over their network seemingly every other day, for instance).

sgnelson · on July 17, 2014

Well, Google now owns DoubleClick, so yes we are all using search and web browsers and phones made by a bigger, better, DoubleClick.

api · on July 17, 2014

I do think people understand that, and I think it's one of the reasons Glass seems like it's a failing product. People are drawing the line at wearing a heads-up mobile camera controlled by an advertising company.

magicalist · on July 17, 2014

You could say the opposite thing about Android: people embracing it in spite of the possibility of a screen they take everywhere and look at all the time controlled by an "advertising company".

And in practice, the same thing is true for Android and Glass: there aren't ads popping up in your face all the time, because that would be a sure way to alienate all your users (which is one of the reasons why the "they're just an advertising company" phrase doesn't give any real insight).

I think it's more likely that having a giant thing on your face that doesn't actually do all that much is more the reason for people seeming to start drawing the line with Glass. That and meme status ("glasshole") and echo chamber effects (most people outside of tech circles don't care either way).

dm2 · on July 17, 2014

If there was anything else they could do that was profitable then they would. Do you have any suggestions?

The only reason Apple has as much money as it does is because it greatly overcharges customers, doesn't participate in research that benefits society, and takes advantage of it's customers psychological need to have the latest model (even if the improvements are minimal).

api · on July 17, 2014

I wasn't even really dissing Google, just pointing out the reality of the world we live in. The fact is that very very few people really care about visionary things. There has to be a way to make it pay, and unfortunately surveillance-based marketing is the only big meal ticket in town for services like Google.

dm2 · on July 17, 2014

It is true about the ads being their primary source of revenue, I've thought long and hard about the concept but eventually accepted the ads as being necessary to fund the other projects they work on.

Their research isn't solely focused on gathering data for ads. If these technologies can be applied to ads then it helps justify large costs because it'll increase revenue, but that is not the sole driving motivator for their moonshots and research.

Hopefully the percentage of people who are passionate about futuristic concepts and visionary ideas will rise in the near future. The faster we get to a post-scarcity society the greater chance we, as an intelligent species, will have to survive long-term.

Just my 2 cents. I love them as a company and have a huge amount of respect for what they've done and for sticking to their founding principals of transparency and "do no evil" (for the most part, within reason for a company of their size).

I know it's pretty subjective and irreverent to bring Apple into any Google debate, the amount of people who put them on a pedestal drives me crazy. Anything I can do to redirect money from going to Apple is a positive thing in my opinion.

daviddumenil · on July 17, 2014

The original paper being discussed:

http://arxiv.org/abs/1312.6082

rtkwe · on July 17, 2014

Strange they didn't link to google's own publication page:

http://research.google.com/pubs/pub42241.html

cliveowen · on July 17, 2014

"They’ve also found that the models tend to become more accurate the more data they consume. That may be the next big goal for Google: building AI models that are based on billions of data points, not just millions. "

I'm not versed in machine learning, but it looks to me that any model whose output quality is dependent on the quantity of data it ingests is deeply flawed. There's no doubt a bigger number of samples will make the predictions more accurate, but isn't the challenge to develop a system that is as accurate as possible regardless of the number of data points its fed, like the human brain?

ckaygusu · on July 17, 2014

If you think human brain is so sophisticated that it can perform its cognitive duties with little data, this is simply wrong. While it is definitely not a simple organic construct, it does get stimulated significantly all the time. See [1] what happens when you cut out this factors.

Regarding artificial systems, I think more data is the only way to reach super-performing classifiers. The data you supply doesn't have to be big but at least the data you extract from raw data should be big. For example, a method called Integral Channel Features [2] is designed to act in such a way.

[1] http://en.wikipedia.org/wiki/Sensory_deprivation

[2] http://pages.ucsd.edu/~ztu/publication/dollarBMVC09ChnFtrs_0...

jfields513 · on July 17, 2014

Your first statement here is not true.

Humans are excellent at learning from very few or even 1 example. Show a toddler a single image of an elephant and the toddler will generalize perfectly on new examples; show a machine a few thousand images of elephants and it might generalize decently if your machine is really clever.

There are very few tasks where machine systems achieve anything resembling human level performance. But on all such tasks, the machine requires far more data and still underperforms.

daughart · on July 17, 2014

That toddler has already processed lots of visual image data, examples of objects, nonliving and living, animals, mammals, etc. Don't you think that constitutes a large, important dataset for the problem of elephant recognition?

jfields513 · on July 17, 2014

Yeah i agree. When the child is shown the labeled example of an elephant and infers the traits that make an elephant an elephant, her previous visual experiences provide background knowledge that restricts the space of hypotheses she considers. After all, there's an infinite set of logically consistent hypotheses.

Nevertheless, if you provide your machine system with the video of all the child's visual input, it still won't generalize well from single examples, the way children do effortlessly.

adamlett · on July 17, 2014

This reminds me of the saying: It took me ten years to become an overnight success.

Humans generalize well from few examples because, well, they've already processed billions of examples. A toddler may have never seen an elephant before, but it may have seen cars, trucks, birds, dogs, people, trees, skies, buildings etc, giving it concepts for bigness, smallness, aliveness, humanness and much else. With all these concepts in place, then yes it becomes easy to see what makes an elephant distinct from a dog or a person. And it would be too for an artificial neural network.

An interesting fact is that newborns have very few concepts to begin with. It takes some months for them for instance to learn to differentiate between alive and dead things (the family cat vs a teddy bear for instance).

valarauca1 · on July 17, 2014

It does but it's unsorted.then post processed.

If you show a child 1000 images or animals. Then show different photographs of animals. And tell the child what animal each animal photograph is, you can now go back to the original 1000 and the explained ones will likely be recognized dispute them never beig initially sorted, or modeled as such.

Going from 5-10 to 1,000,000 is what computers have a problem with. They go from 1,000 to 1,000,000 easily, or even million to billions.

trevorstrohman · on July 17, 2014

You're describing unsupervised training. It works with computers too, as in this article on using Google Brain to build an unsupervised image classifier. http://www.wired.com/2012/06/google-x-neural-network

valarauca1 · on July 17, 2014

Unsupervised training is the deep learning equal of context clues. It sees people talking about cats, sees an image guesses cat. Sees similar images + similar words, eventually builds a cat type prototype.

What I'm talking about is a child can see a photo of an unknown animal, I can show that child a cartoon elephant (which is the original animal). I then ask what the original animal is, the child likely responds correctly.

Reprocessing of already learned data as the scheme of the world changes based on new information.

fzltrp · on July 17, 2014

This is the ability to abstract concepts and then recognize them in different settings (for instance, the idea of a child being a miniature version of a given animal, with less pronounced traits). In order to understand the clues you are talking about, an AI has first to be familiar with the terms used in the discussed topic, so as to be able to construct a definition by itself (what is "miniature", "traits", "pronounced"). These terms' definition must be synthetized somehow before hand, or perhaps as the discussion goes, but then the amount of necessary information in that discussion must be much larger, for the AI to untangle them properly.

Houshalter · on July 17, 2014

Using unsupervised learning (the same as humans) there are machines that can learn from a picture of a single elephant. You first learn a compressed "representation" of images with fewer dimensions than an entire bitmap. Then you can compare that small vector to others.

snowwrestler · on July 18, 2014

I can only speak for my situation, but my daughter saw easily hundreds and maybe thousands of examples of elephants, including real live ones at the zoo, before she could ever seem to use the abstract concept to identify new examples.

robert_tweed · on July 17, 2014

You're sort of describing the problem of "over fitting", which is now very well understood in machine learning circles. That's when you get a model that describes it's training data very well, but doesn't generalise well.

The thing about using lots of data is that prior to publication of "The Unreasonable Effectiveness of Data" in 2009, most people did think that good algorithms were the most important thing. What that research showed was that a bad algorithm given more data will eventually outperform "better" algorithms, at least when those algorithms are initially judged based on their performance on smaller datasets.

So what happened with neural nets was that after some initial excitement about how they were more like the human brain, etc., it was found that "stupider" algorithms actually performed better and ANNs were written off for a while. It turns out that the reason NNs were performing badly was that they weren't being fed enough data.

Nowadays it's pretty easy to saturate a feed-forward neural network with data to the point where it's performance will never get much better. Deep learning techniques allow you to train bigger and more complex models with more data, but these more complex neural nets won't perform very well unless you feed them tons of data.

So in reference to your point about the brain, the thing about brains is that they actually learn based on massive amounts of data too. Think about how much data you have from continually streaming video ~16 hours/day, plus sound, plus touch, proprioception, and other inputs, over the course of many years.

Deep learning tries to emulate this to some degree with "pre training" which is where you feed lots of data into a deep network and have it learn "something" (it learns by itself at this stage). Then you start teaching it more complicated, high-level concepts. This pre-training allows it to do things like recognise common patterns in images, which the later training allows it to then associate with semantic ideas like "this is an apple", "this is a person", etc.

TL;DR: What seems to work best is fairly "dumb" algorithms, scaled up to be able to handle vast amounts of information and fed a ton of data to learn from. This is also how the human brain works.

nl · on July 17, 2014

No. The outcome is the goal.

It's rapidly becoming apparent that some algorithms (eg Deep Learning related models) work much better at scale than on small amounts of data. It doesn't make sense to discount these better algorithms because they don't work as well as other models when tested against less data.

It is also apparent that these models require significantly more computing power to perform well than other models. That doesn't make them less worthy, just a cost people must consider.

It turns out that intelligence is hard..

mendicantB · on July 17, 2014

That's another good point. The required computing power that is now cheap and widely available has changed our ability to even try these methods.

mendicantB · on July 17, 2014

Every model's output quality is dependent on the quantity of data it ingests.

Statistics developed as a science because of the need to overcome the weakness of large samples being expensive. Machine learning has taken off as a direct result of the field's ability to take advantage of and get serious performance gains from the massive amounts of data being generated and leveraged recently.

Here is the best summation I can reference, and I can tell you from personal experience it is very true:

"The accuracy & nature of answers you get on large data sets can be completely different from what you see on small samples. Big data provides a competitive advantage. For the web data sets you describe, it turns out that having 10x the amount of data allows you to automatically discover patterns that would be impossible with smaller samples (think Signal to Noise). The deeper into demographic slices you want to dive, the more data you will need to get the same accuracy."

http://www.quora.com/Big-Data/Why-the-current-obsession-with...

cliveowen · on July 17, 2014

I completely agree with this and, as I said, it's obvious that more data produces better predictions, even with simple models. My point is that it looks backwards to me putting effort into finding more and better data (creating a corpus for a given subject is a challenge in itself) instead of trying to come up with a model that infers more and produces better predictions with less data. Once you have such a model then you can surely collect and feed it a lot of data to improve the output, but until then, why even bother?

gipp · on July 18, 2014

It's not like they aren't trying to improve the model as well, all the time. It's just saying that right now the benefit of getting more data for existing (already very sophisticated) models is greater than the incremental benefits of model improvements given existing data.

mendicantB · on July 20, 2014

Bingo. More data beats better algorithms, see

http://anand.typepad.com/datawocky/2008/03/more-data-usual.h...

awj · on July 17, 2014

It may seem counterintuitive, but that's a relatively common result in machine learning. Often the issue isn't so much the quantity of data as the representative nature of your training data used to create the model. All other things being equal, a larger set of training data is likely to be more consistent with the true data. Caveats abound, but that's the general idea.

One way to think about it is to look at problems with human perception like forced perspective. It's relatively easy to create a situation where the only available information results in mental models that describe the size of an object incorrectly. Given a different point of view (i.e. more information) the faults in the model become obvious.

quarterwave · on July 17, 2014

How quickly does a model's accuracy and precision improve with increasing sample size? Indeed, there are counter examples (like the periodogram estimator for psd) where variance does not decrease with window length.

We often don't know which model to use. Occam's Razor [1] can be effective in favouring simpler models, but I tend toward the view that a good data scientist is invariably needed to build good models. Hence I view Big Data more as a consulting business than SaaS.

[1] For an excellent Bayesian discussion on why Occam's Razor actually works, see Chapter 28 of David J.C. MacKay's book 'Information Theory, Inference and Learning Algorithms'.

rwissmann · on July 17, 2014

Until you have reached a very large subset of all available information, more data allows you to make better predictions. Period. That is as true for machine learning as it is of the human brain.

You often want your models to also perform well when you have fewer data points. Those are two separate - if in effect related - design goals.

niangb · on July 17, 2014

This is true to the extent that you are not overfitting your dataset. Neural networks and random trees are quite good at fitting anything! And still they can perform poorly on your validation set.

MaysonL · on July 17, 2014

Overfitting will only occur when the dataset is too small for the model.

niangb · on July 23, 2014

Not only. Your example is too particular. I would say that overfitting tends to occur when one do not understand the underlying dynamic of a system you are trying to model. Any model with enough degrees of liberties can fit anything and still explain nothing.

jesuslop · on July 17, 2014

Possibly when a brain start getting redundant information its predictions start to peak in accuracy.

jakek · on July 17, 2014

https://static.googleusercontent.com/media/research.google.c...

cliveowen · on July 17, 2014

Doesn't this further prove my point? If you're saying that some tasks, like NLP, are too complex to tame and you should just throw more data at it, you're basically capitulating to complexity and taking the easier route. Isn't that the opposite of what researchers should be doing?

penguat · on July 17, 2014

There's no shame in taking a non-optimal path, and then working out how to get better at it later. Research is about solving problems, answering questions. Why should that have to be the hard way?

cbsmith · on July 18, 2014

Have you ever taught a kid to learn? I don't think there is a lot of evidence suggesting the human brain reliably learns without a number of data points...

Seriously, Norvig has been big on this since forever: the reality is that consuming large amounts of data with relatively subtle features tends to be one of the few areas where computers can easily outclass the human brain.

mathattack · on July 17, 2014

It would be great to handle problems with only a little data, but more is better is generally true. Also, the goal can be to calibrate the model with a lot of data, and then have it operate on small amounts of data.

roneesh · on July 17, 2014

These Wired click-bait headlines are terrible.

dang · on July 17, 2014

They certainly are. Can anyone suggest a better one here? We want accurate and neutral.

michael_nielsen · on July 17, 2014

Could submissions from Wired (& other sites which practice link-baity titles) be automatically put in a moderation queue? They wouldn't appear in the "New" queue until the title had been reviewed by a moderator, and fixed if appropriate. That sort of fix seems to be necessary a large fraction of the time. Regularly having such titles on the front page is a small but real hit to HN's quality.

dang · on July 18, 2014

There actually is a status like that, sort of. But Wired isn't currently in that bucket. We may have to put it in there; their titles have gotten noticeably (to me) more sensational lately.

ttctciyf · on July 17, 2014

The section title: “Google is not really a search company. It’s a machine-learning company.” might be better, but ... meh, title fussiness. Anyway, it's an ad company!

mseebach · on July 17, 2014

> Anyway, it's an ad company!

Can this meme die soon? Google is an ad company in the same way the New York Times is an ad company. Sure, that's ultimately where the money comes from, but there'd be no ad revenue if search, maps, mail etc weren't all among the very best available and that's a very real and very big engineering problem. Just like if the NYT stopped during journalism, their ads would stop generating revenue.

qnaal · on July 17, 2014

adsense is pretty big

dcre · on July 17, 2014

Inside Google Brain?

dang · on July 18, 2014

Good suggestion. We changed it. Thanks!

woodchuck64 · on July 17, 2014

Seems inevitable that the company with the world's largest distributed compute engine (estimated at 40 petaflops as of 2012, must be 80 petaflops by now) would eventually get into machine-learning.

bcRIPster · on July 17, 2014

Anyone else notice the sample street number photos looked a lot like some of the CAPTCHA images going around not too long ago? Hmm...

VikingCoder · on July 17, 2014

That's old, published and confirmed news:

http://techcrunch.com/2012/03/29/google-now-using-recaptcha-...

thisjepisje · on July 17, 2014

I knew this was a Wired article without even looking at the URL.

soapdog · on July 17, 2014

me too... argh, I hate those headlines. Its a proof of how much tech and digital culture journalism has fallen.

I remember on my pre-broadband days here in Brazil subscribing to Doctor Dobbs Journal to the tune of 25 USD per issue and being happy. Each issue filled with little gems that would advance my knowledge a lot... these days its all those glossy covers with photoshop covers and over the top headlines.

:-(

icebraining · on July 17, 2014

I don't think tech journalism as really fallen - Wired has always been fluffy, and Dr. Dobbs is still around [1]. I think the change happened more on the shelves, as laymen with an interest in technology replaced geeks as the most profitable group.

[1] http://www.drdobbs.com/

arel · on July 17, 2014

I really don't know who buys Wired... despite it being a quality publication on the whole it seems far too dumbed down for real geeks and too niche for the layman. I imagine it ends up on the reception table of lots of startups and desgin agencies who want to some glossy 'we do technology' badge.

sidcool · on July 17, 2014

It seems they missed mentioning Ray Kurzweil, the AI master who's a Director of Engineering at Google.

defo_nonconvex · on July 17, 2014

Kurzweil seems more like a master of generating linkbait-like titles.

"The singularity is near: When humans transcend biology" "The age of spiritual machines: When computers exceed human intelligence"

I'd argue that Stephen Boyd is more of a master of AI than Kurzweil.

eitally · on July 17, 2014

To clarify, Ray is a director, not the director. Google has many engineering directors.

sidcool · on July 17, 2014

Corrected it. But I do think he deserves a mention when talking about AI and Google in particular.

snowwrestler · on July 17, 2014

There's pretty good evidence that the entity we call Ray Kurzweil is just an early prototype of Google Brain built by Jeff Dean.

michaelochurch · on July 17, 2014

Google is not really a search company. It's a machine-learning company.

It wants to be seen that way, but until it abolishes closed allocation (and maybe it has, but I haven't heard anything to indicate that it has) it will just be an ads company. It's still a pretty good place to work, by industry standards, but the percentage of people who'll get to work on machine learning is very low.

Google definitely wants to have the image of being the machine learning company because that's a great way to attract talent (even if that talent is mostly wasted under closed allocation). And if you land in the right place, there is interesting work. The reality most people face, though, is that most people (especially outside of Mt. View) aren't going to get real projects and won't be anywhere near the machine learning work.

Google does have a lot of talent and probably would be the undisputed #1 tech company if it implemented open allocation, though.

programminggeek · on July 17, 2014

Google is an Advertising company and it delivers ads via these platforms like search, mobile apps, etc.

It clearly is a tech company, but to pretend it's anything other than an ad delivery machine is sort of ignoring the elephant in the room.

Everything goes back to ads. Even Google doesn't bite the hand that feeds it.

dsymonds · on July 18, 2014

That's a common trope, but it doesn't make sense. We don't call the New York Times an advertising company simply because the majority of its revenue comes from ads. Why apply the same standard to Google?

astazangasta · on July 17, 2014

>About a year later, Google had reduced Android’s voice recognition error rate by an astounding 25 percent.

lol. This is the grand payoff?

deong · on July 17, 2014

Considering that for the preceding 25 years or so, progress in the state of the art had been annual reductions of far less than 1%, it's a pretty big deal.

astazangasta · on July 17, 2014

From an academic perspective, maybe. In terms of practical use of machine methods, not much. Machine learning is largely hype. I pity the army of PhDs they must have building training sets.

sp332 · on July 17, 2014

Have you tried Google's voice recognition? It's been fantastic for me.

kylebgorman · on July 17, 2014

While I'm not sure, I assume the author is referring to "error reduction". Error reduction is defined as

    def ER(accuracy_before, accuracy_after):
        delta = accuracy_after - accuracy_before
        return delta / (1. - accuracy_before)

(It is often written as a percentage.) Note that ER(.5, .75) = ER(.98, .99) = .5. It is for this reason I dislike it. Without knowing what performance before and after was like, it's hard to tell whether this was an incremental improvement to an already impressive system or the innovation that made a system useable.

(It strikes me as unlikely that the author means to say that Google reduced the Android recognizer's word error rate [WER] by 25%, since WERs on many publicly available databases used for evaluation purposes were already well below 25% before the deep learning revolution. But, it is not impossible that Google's test set was particularly hard and the pre-deep-learning Android recognizer was unimpressive. Caveat: I know, I know, WER isn't really "accuracy".)

DanBC · on July 17, 2014

There's a bunch of stuff that gets optimised. You start with big gains, and those get smaller and smaller until you're spending a lot of effort to get 0.5% increase in performance.

See, for example, compressing the ENWIK8 in the Hutter Prize.

http://prize.hutter1.net/

If you can compress this approx 100 MB file to less than approx 16 MB (including the decompressor) you win cash.

That website shows small decreases in filesize over a few years.

    Alexander Rhatushnyak	23.May 2009	6.27 | 	1614€	Marcus Hutter
    Alexander Rhatushnyak	14.May 2007	6.07 | 	1732€	Marcus Hutter
    Alexander Rhatushnyak	25.Sep.2006	5.86 | 	3416€	Marcus Hutter
    Matt Mahoney	24.Mar.2006	5.46 | 	pre-prize	 -

The compression ratio goes from 5.46 to 5.86 to 6.07 to 6.27.

sp332 · on July 17, 2014

It was already as good as a bunch of PhD's could make it after working on it for years. Removing 25% of the remaining errors after that is a grand payoff!

gunnario · on July 17, 2014

"This form of internal code-sharing has already helped another cutting-edge Google technology called MapReduce catch fire." Map-Reduce is not a Google technology.

paperwork · on July 17, 2014

I believe it is. Obviously map and reduce operations have existed for a long time, but MapReduce was Google's work, before it was popularize by hadoop and friends. Soon after MapReduce became popular, there was a paper published (by non-googlers) which 'described' the algorithm and poked some fun at the terminology:

http://lambda-the-ultimate.org/node/1669

dekhn · on July 18, 2014

Most people miss this important point: the most important part of MapReduce is the shuffle step, which is a global, partitioned disk-to-disk sort. See the FlumeJava for a bit more discussion on why this is relevant. Everything else about MapReduce is just framework to make programmer's lives easier. BTW, not having strong typing in classic MR is a major pain point. Flume goes a long way to addressing this in a practical way.

BTW, this point elucidates the importance of this classic Google Interview question: http://www.glassdoor.com/Interview/Sort-a-million-32-bit-int...