What Hinton’s Google Move Says About the Future of Machine Learning

nonsequ · on March 17, 2013

I know HN thinks Google is much cooler than IBM, but is it weird that I like IBM's chances in ML progressing to AI? For one thing, Watson was a very impressive demonstration. For another, it has the old materials science know-how to create neuron-inspired chip architecture. And here's an important one: Fortune 500 companies with lots of valuable data trust IBM to solve their problems. Anybody here with other thoughts, caveats? Other companies with good shots at commercializing ML/AI technology?

dude_abides · on March 18, 2013

Great point. As per the author's thesis, we are in phase 2, and phase 3 is going to be initiated by a startup, and not before another 20 years.

But IBM is currently doing (and making good progress at) exactly what the author describes as phase 3.

conductrics · on March 17, 2013

I think what makes Hinton surprising is that he has a long established academic lab and so many current top researchers went through that lab. Yann LeCun (ANNs/Deep Learning), Chris Williams, (GPs), Carl Rasmussen (GPs), Peter Dayan (NeuroScince and TD-Learning), Sam Roweis (RIP). As you note, industrial research labs (with Nobel prize winning researchers) have been around at IBM, NEC, and ATT Bell etc. One thing that I think about, is what happens to the quality of research as top folks who have an established record of producing new researchers are pulled from that role? Also not sure about startups having anything to do with with making technology real. Is Google still a startup?

auggierose · on March 17, 2013

Machine brains are already much "smarter" than human brains. For certain tasks, that is, like calculation. With increased computing power, these tasks will grow more and more. But will machines ever be REALLY smarter than humans? I will only believe that when I see it. This question might be (but does not necessarily have to be) related to the question of all questions: Can machines have consciousness, like we do?

igravious · on March 17, 2013

There is a growing realization that cognition is fundamentally _embodied_ cognition. If you think of the mind as an ethereal entity removed of its physicality (or at the very least made out of a different substance from the body - that is to say, substance dualism) then it is easy to imagine the following scenarios. Containers are unimportant, so minds can be uploaded and downloaded, whether machine or human the housing is unimportant.

If we come to accept cognition as fundamentally embodied then it becomes less sensible to compare cognition across differing architectures - human cognition will always be quite unlike any other type of cognition except itself. I think machines will have consciousness (why should they not be able to, what is so special about us that would limit this phenomenon to us?) but it will be a machine consciousness and radically different from ours.

I think we're going to have to get a lot more fine-grained about how we talk about features and functions of brains whether human or machine. You've already put "smarter" in quotes which shows that already you're aware of how blunt and crude our terms are.

Does this all seem reasonable?

auggierose · on March 17, 2013

I understand your point of view, which is basically the one shared by many people in CS. But personally, I don't think that machines will ever develop consciousness as we have it. Because I understand how current technology works, and there is no consciousness there. I would have no qualms shutting down a machine, even if it begged me to keep it running.

rdtsc · on March 18, 2013

Just like a computer science freshman I imbued computer systems with magic. Oh look I feed this machine numbers and it spits out words (text to speech). Or I search for something and a magic algorithm find me the result. As I learned more about algorithms and data-structures, that magic disappeared. Now I had the same feeling about hardware. This magic black square on the motherboard that can execute a set of couple of hundred or so assembly instructions many billions of times per second. Then I took a hardware architecture class and poof! magic disappeared. We started with transistors and build to designing our own CPU chip.

I am guessing something similar is going on with our understanding of the brain and mind. I think we just haven't figured out a good way to model and represent knowledge. There was terrible optimism at the end of 50s that super human AI will take over in just a decades. But it didn't happen. We have sort of been stomping our feet (I personally don't consider playing chess an AI achievement). I think there will be a breakthrough -- maybe it will be a simple organization of existing ML and knowledge representation methods (neural networks, mixed with evolutionary algorithms) or some new framework - OR - enough of very specific applications (chess playing, image recognition and speech recognition) advanced will slowly chip away at this "magic" AI core until maybe nothing will be left. And we'll look back at that and at our brains and say "ah, it wasn't that complicated after all, it is just all these specific subsystems working together"...

akiselev · on March 18, 2013

Current technology != future technology. Sorry, induction doesn't work that way.

bennyg · on March 18, 2013

I'm more interested in teaching computers lateral thinking, and thus the beginning of creativity - what I believe to be the real hallmark of human thought.

Can a computer be "processing" water pipes, analyzing the construction of pipes for the best flow, then jump to half-pipes and building a new half-pipe so skateboarders can flow better and produce better tricks, get more air, etc. Albeit a kind of lame example, but that jump is crucial, and something we do flawlessly. There's no hard guideline to what triggers our brains to jump. It could be audible, visual, or tangentially related to the task at hand. It could be body language of someone talking to us, that reminds us of somebody else, that reminds us of... Logical thought isn't that beautiful to me. It's predictable. Lateral thinking is though, and that's where all of the good inventions/discoveries begin anyways.

deadairspace · on March 18, 2013

Analogy-making is an important part of perception.

ilaksh · on March 17, 2013

As far as I can tell, people only wonder about this if they assume that consciousness has some kind if supernatural aspect to it.

paganel · on March 17, 2013

I'm not the OP, and most certainly I do not believe there is any "supernatural" aspect behind the human mind, but what will really, really convince me that the Singularity would have arrived will be the moment when robots/machines will have understood humor. Them, the machines, being able to actually make new jokes will be the decisive proof that we, humans, are not the only "intelligent" entities on this planet.

And even more OT, this reminded me that I don't recollect any "robot jokes" in any of the science fiction books I've read. Granted, there weren't that many (just the basics: Asimov, Frank Herbert, Philip K. Dick, some Stanislaw Lem), but I'm curious if any SF writer wrote "robot jokes", more exactly jokes that us, humans, think will be made by robots in the not-so-distant future.

sigil · on March 17, 2013

Have you ever read transcripts from the Loebner Prize Competition, a Turing (con)test they hold each year? Machines keep getting funnier.

http://www.worldsbestchatbot.com/Competition_Transcripts

jfoutz · on March 17, 2013

Another in a long line of goalposts that assert "this is intelligence". Chess fell, driving fell, machine translation is falling. Robot storytellers (which, i think would cover humor) are only a matter of time.

dnr · on March 17, 2013

Lem's best work (IMO) is the stuff about robot culture, including jokes. Try The Cyberiad and Mortal Engines.

Edit: Though, to be fair, Lem didn't write near-future SF, his robot stories were more like alternate universes.

MaysonL · on March 18, 2013

Robert Heinlein, The Moon is a Harsh Mistress. Read it.

s_baby · on March 17, 2013

Not really. Believing a turing machine can create consciousness has implicit assumptions that may or may not be true. Is consciousness completely computational or does it piggyback on some qualitative attribute of the substrate? Can the processes of the brain be reduced to data structures and algorithms? Can simplified models be an adequate replacement for "chaotic" processes of the brain which are not computable without remainder using silicon? There are plenty of known unknowns which have implications for the possibility of such simulations. If you think the simulation of consciousness is a given then you probably have a hand-wavy understanding of the problem.

aheilbut · on March 17, 2013

This is reading way too much into it. Google happens to have a very nice confluence of money, data, people, and interesting applications at the moment. But there is and always has been back-and-forth of ideas and people between academia and industry in machine learning and all other fields.

davmre · on March 17, 2013

As an ML researcher, this article isn't persuasive to me for a few reasons:

- Computing power is getting exponentially cheaper even as computing requirements increase. The resources available to a university lab in the future will be much greater than those available today, even given the same budget. Of course this is also true for industry, but this growth is not a unique advantage of industry.

- Other scientific fields already have equipment costs that are orders of magnitude larger than CS. Physicists regularly write grant proposals for multimillion-dollar pieces of equipment. If building large clusters is necessary for academic research to stay relevant, academics will start building large clusters. The foundational work done at Bell, IBM, Xerox, etc in the 70s and 80s was not due to resource constraints in academia (academics had expensive computers too, and also did plenty of good work during that time), it was because those companies had the right combination of smart people and an immediate need to find practical solutions to difficult problems.

- Finally, and most importantly, even in the age of big data almost all fundamental research can be done quite successfully at small scales with modest hardware requirements. Notice that Hinton et. al. have spent 6+ years developing deep learning in academia, and it's only in the past couple of years that it's matured to the point of implementation at scale.

Here's the basic pipeline of most machine learning research: you come up with a new approach for training SVMs, or multilayer perceptrons, or some new type of more interesting model. First you develop your ideas conceptually, with some equations on a whiteboard. If you're a theorist, you might prove some theorems. Next you write a toy implementation in Matlab or Python to show that your method actually works, and that you get improvement over previous work for the dataset size you're using. This could mean that your method is faster -- which indicates it'll be able to scale to bigger data -- or that it's smarter / taking advantage of some new type of structure, in which case it still ought to get decent (if not state-of-the-art) results on small data. Only then, usually after publishing a few papers and working out the kinks, does it generally make sense to put in the effort to implement and test a big, efficient distributed version of your algorithm. And while that last part might be best done by industry, the first few steps are easily possible in academia and will continue to be for the foreseeable future.

Case in point: Google Translate is a massive system whose performance rests squarely on exploiting big data, in that they use the Internet as their training set. But academic machine translation research still runs quite effectively with smaller datasets on small clusters. The academics come up with ideas, implement and test them, and some ideas flop while others take off. The idea that take off get picked up by Google and implemented into Translate, where they hopefully end up pushing the envelope. So even though the academics don't have the resources to work at massive scale (which most of them don't want to do anyway -- ML researchers are usually more interested in ML than in building distributed systems) their research still has impact, through transfer to industry. This sort of relationship has been the model for academic/industry research collaboration for quite a while, and I don't think it's dead yet.

jkldotio · on March 17, 2013

Having worked with a lot of ML guys who were ahead of Google on numerous fronts I have to agree. With Knol's death Google failed to control Wikipedia, arguably one of more important ML datasets. People can fire up the common crawl on demand from Amazon. Anyone who thinks Google is the real bleeding edge just isn't browsing recent academic papers.

I've got no formal CS training and if I get funding for jkl.io the objective is to have (most of) a Google News (English) competitor implemented in a year, part-time. Google has thousands of ML employees but there are three million users on Github. If I need facial recognition, it's on Github. Topic modelling to layer on top of my NLP, or to aid in entity resolution, on Github. Crawlers, got it. Next gen databases (http://hyperdex.org/), got it. The jkl.io site is only just over 1000 lines of code written by me at the moment, but it probably uses tens of thousands from just the python libraries before we even talk about the DB and the OS.

The more people understand the filter bubble and the information diet concepts the more personalisation will be a thing only for side interests and friendship networks. I don't think people want black box advertising-oriented algorithms manipulating their political and economic news. The computation required for me is therefore so much smaller and cheaper. I know it's not HN's focus because people want their exit money but donation models, as Wikipedia beating Knol shows, can actually be the most efficient solution in many domains where you can't trust a corporation with a fiduciary duty to maximize shareholder profit.

People might say "but what about really huge data like location services using not just GPS, but mobile data and wifi response times, pictures from Google's new alt-reality game and street view"; they might say "Google just can't be caught up to" and point to the failure of Apple's maps. But I worked with some guys who scaled a solution using SIFT features => Lucene that could geo-locate instantly on massive datasets of images. You can prove an algorithm can scale theoretically without having 10,000 machines to run it on. One of the key points separating computer science from just programming is the analysis of algorithms in theoretical terms. Apple's failure was because they are primarily a luxury product company not an ML company but people just think "technology". Even so Apple can get stuff done, or buy companies that can (Siri). Microsoft, Yandex, Yahoo, Amazon, huge rising data powers in Asia, thousands of computer science professors, tens of thousands of post docs and doctoral students, millions of Github tinkerers are not going to fall behind. Google isn't even the major search engine in a lot of countries.

aspis · on March 19, 2013

I attended a talk by Quoc Le at UCSD recently, and he made the case that it is necessary to get the algorithms tested large scale, rather than sending too much time on it at small scale.

He had presented a graph comparing some models and their accuracy as the number of features was scaled up to the tens of thousands, his point being that some models that work best at smaller number of features fall off as the number is scaled up. Unfortunately the slides he has on his web page is outdated, so I haven't been able to find that reference. I'd be very happy if one of you know which paper he was referring to. In the old slides he refers to this paper, which makes something of the same point: http://ai.stanford.edu/~ang/papers/nipsdlufl10-AnalysisSingl... It shows how simple unsupervised models with dense feature extraction reach the state of the art performance of more complex models.

Of course, I can see how it makes sense to at least do some small scale prototyping, to work out kinks like you say - but the lesson is that if you are planning to do large scale machine learning you can't necessarily use the small scale tests as a good guide for large scale performance. It's certainly promising if you get very good accuracy, speed or both at small scale, though neither necessarily will carry over to large scale. On the flip side, if your method is worse than state-of-the-art at smaller scales, that doesn't mean it won't beat state-of-the-art at large scales.

jmares · on March 21, 2013

Data shows, as you say, that small scale performance is no indicator of large scale performance.

How then do you decide which projects are worth trying on the large scale?

dxbydt · on March 17, 2013

Distributed systems building is not a precondition to big data ML. Most of those systems have been built and commoditized...to such an extent that the difference between having one and not boils down to a command line flag. I routinely run ML algos in local mode on my mac on a small dataset. Once its up to snuff, I turn off the--local flag, and it now runs on giant MR clusters over terabytes of data. I personally have not done any changes other than turning off the local flag.

davmre · on March 17, 2013

Sure, lots of existing ML algorithms have efficient big-data implementations. But for new algorithm development, my (admittedly limited) experience is that the Matlab-prototyping stage usually comes well before the implement-at-scale stage. You're right that modern tools effectively abstract out a lot of the difficulty of implementing at scale, but IMHO it's still generally not the first thing you'd want to do.

ilaksh · on March 17, 2013

I think that ml people should take a look at the AGI field. I also think that more powerful techniques, specialized hardware like qualcomms baby Brain corporation are building, and/or large peer computing networks will make general intelligence accessible for small groups or individuals In fewer than twenty years.

davmre · on March 17, 2013

AGI has cool ideas, and is in some sense the "right" theoretical framework for AI, but it's not clear that it gives any kind of practical path forward for AI research. The main problem is that its basic idea -- an AI performing Bayesian inference over a hypothesis class of all potential environment-generating computer programs, with a Kolmogorov complexity prior -- is wildly uncomputable, so to make it practical we'd need to find simple, computable approximations that work on real problems. But this is basically what modern ML research is already trying to do -- finding models that are complex enough to capture interesting structure in the world, but still simple enough for efficient inference to be practical.

ilaksh · on March 18, 2013

"an AI performing Bayesian inference over a hypothesis class of all potential environment-generating computer programs, with a Kolmogorov complexity prior, -- is wildly uncomputable, so to make it practical we'd need to find simple, computable approximations that work on real problems"

That's not what AGI is trying to do or how they are trying to do it.

wookietrader · on March 18, 2013

It's at least one way which has been advocated by leading researcher of the field. If you think differently, you should give references and explain what your AGI definition is.

jfoutz · on March 17, 2013

I'd wager human brains to a lot of stuff unrelated to solving problems at hand, like keeping the heart beating. Given that the machines don't need to do all of the underlying biological stuff, you can probably get away with fewer connections.

jpadkins · on March 18, 2013

might be the same amount of overhead needed for an OS to keep tabs on it's hardware, cluster, etc. Just like brains need to translate to the physical world via the nervous system, pure software needs to translate to the physical world via an OS.

gingerlime · on March 17, 2013

by which time (2050s-2060s) we will have machine brains that are orders of magnitude smarter than human ones (!)

that's a fascinating yet chilling thought (granted, orders of magnitudes dumber than those future thoughts of the machines)

jmares · on March 20, 2013

Dear Googlers, it would be interesting to know how computational resources are allocated to new ideas (eg. Kurzweil's PRTM-based NLU system) at each stage, from prototype genesis to mature technology. What are the factors that come into play?

wookietrader · on March 18, 2013

There is a machine learning that is not related to big data, you know. Many interesting problems in machine learning, and most of the hard ones, have a computational demand for which a single i7 and 16 GB of RAM are more than enough.

kurtbuilds · on March 17, 2013

I've been thinking this exact same trend ever since I saw Hinton's move to Google, but I didn't have the historic background to make these comparisons. Really nice job.