Hacker News new | past | comments | ask | show | jobs | submit login
Geoffrey Hinton spent 30 years on an idea many other scientists dismissed (torontolife.com)
193 points by varunagrawal on Feb 3, 2018 | hide | past | favorite | 69 comments



>For more than 30 years, Geoffrey Hinton hovered at the edges of artificial intelligence research, an outsider clinging to a simple proposition: that computers could think like humans do—using intuition rather than rules.

This is so disrespectful to the 1000s of researchers who have been studying machine learning since well before 2012. It was well established that the future of teaching computers came from statistics and not rules in the 80s/90s by researchers like Michael Jordan (https://en.wikipedia.org/wiki/Michael_I._Jordan) and his students.

It was even engrained in popular culture: neural networks are how the AI brain worked in Terminator 2, in 1991! https://www.youtube.com/watch?v=xcgVztdMrX4

edit: I don't want to downplay Hinton's accomplishments, I've been lucky to have been surrounded by and motivated by his work since I started learning machine learning. I did my masters research on neural networks that were partly inspired by his work, and it was a deep networks paper he presented at a NIPS 2006 workshop that got me really excited to stay in machine learning while I was starting my career.


There've been cycles & fads in the meantime, though.

I remember that in the early 90s, neural nets were supposed to be the huge new thing. Many scanners shipped with built-in OCR, the Apple Newton had handwriting recognition on a PDA, and my Centris 660AV could do speech recognition & text-to-speech out of the box.

But they ultimately weren't powerful enough to satisfy customers or meaningfully change how people interacted with computers, so they failed in the market, and the hype cycle moved on to the World Wide Web.

I guess this shows the power of continuing to study something when the fad goes away, so that you're well positioned to capitalize when the next fad hits.


> I guess this shows the power of continuing to study something when the fad goes away, so that you're well positioned to capitalize when the next fad hits.

So true. VR popped up in the 90/00s (Virtual Boy, VRML, etc.), went nowhere, and here we are again.

I suspect something similar will happen with blockchain/cryptocurrency. Only once all the hype and speculation dies off will meaningful uses for the tech become evident.


Are you trying to suggest that Cryptokitties aren’t meaningful? Blasphemy :-)


The hard part is surviving the 20 year drought of research funding in the meantime. Most researchers want to hang onto an idea and develop it until it changes the world but they need to keep their job, so they follow the funding and that means following the fads.


So true. I did part of my PhD on reinforcement learning, finished in 2004. I couldn't get money for continuing work on the topic (wasn't cool), so ended up getting into P2P and Big Data. Still do a bit of ML, but it's not core anymore.


Could not have said it better. Hinton is one of the most important contributors to the field, but giving him all the credit is a disservice to others who made important contributions. Probabilistic modeling was popular way before DNNs got big. At that point it was already clear that logic based inference was only so powerful, and different methods were necessary.


>> At that point it was already clear that logic based inference was only so powerful, and different methods were necessary.

Er, logic inference is "only so powerful"? The first order predicate calculus (first order logic) is Turing-complete and there is plenty of maths that prove the soundess and completeness of various logic inference rules. In other words, if you can compute a function, you can compute it with first order logic.

The switch from symbolic to statistical AI happened partly because of the AI winter that cut the funding to all AI, which at the time was primarily symbolic AI, partly because it became evident that developing and maintaining huge databases of logic rules was inefficient. Some of the early work in machine learning focused on overcoming this inefficiency by inducing rules from data.


>> The first order predicate calculus (first order logic) is Turing-complete

Brainfuck is turing complete, does that make it as powerful as a general purpose programming language such as Java?

I'm not criticizing first order logic, it has use cases no doubt. On the other hand there are many cases where codifying knowledge into a we formed KB isn't practical. Some of these use cases are better suited to probabilistic approaches, deep learning, and some are unsolved. That pretty clearly puts limits on the power of FOL.

Also, thanks for mansplaining turing completeness.


>> Also, thanks for mansplaining turing completeness.

Snark is not OK:

Be civil. Don't say things you wouldn't say face-to-face. Don't be snarky. Comments should get more civil and substantive, not less, as a topic gets more divisive.

https://news.ycombinator.com/newsguidelines.html


>> It was well established that the future of teaching computers came from statistics and not rules in the 80s/90s by researchers like Michael Jordan (https://en.wikipedia.org/wiki/Michael_I._Jordan) and his students.

As a counterpoint, one of the most successful (classes of) machine learning algorithms are Decision Tree learners, whose models are decidedly symbolic.

There's also plenty of work on learning first-order logic theories with neural nets (see for instance the work of Artur D'Avilla Garcez).

The problem with rules is that it's hard to develop and maintain large rule-bases. However, it's perfectly possible to do machine learning for rules -and logic-based machine learning is totally a thing (full disclosure: it's the thing I'm doing a PhD on). You probably haven't heard of it though because data scientists tend to be good with statistics but bad with symbolic logic.


While I won't dispute that many in AI thought machine learning was the correct path, the majority of researchers in machine learning in the 2000s were dismissive of neural networks as the way forward. When I started my PhD in 2007 with the specific goal of studying deep learning, many of my peers told me this was a very bad career move, because neural networks were a dead technology. Convex optimization, kernels, and variational Bayesian methods were the future. They were wrong.

AI has had a lot of fads, but neural networks are here to stay, in my opinion.


Convex optimization, kernels and Bayesian methods are also here to stay. They are all in the toolbox. Optimization-based methods are still rapidly spreading in robotics, for example, even as we add CNNs to the mix.


Convex optimization is heavily used in industry for real problems. It just doesn't have the sexiness that neural networks too, so it doesn't get as much attention.


I agree. My comment about being wrong was regarding neural networks being a dead end. Convex optimization and Bayesian techniques are both incredibly useful.


I see the article just as an effort at highlighting Hinton and his AI research. I don't think he would say he had done it all. Toronto Life is saying it all.

I have just finished reading this interesting wiki titled AI winter defined as a period of reduced funding and interest in AI research:

https://en.m.wikipedia.org/wiki/AI_winter

There have been such winters since the 60's.


Didn't Hinton stick with neural networks as his primary (often only?) research focus for all that time, whereas most of those other 1000s of researchers moved in and out of neural network research as the popularity of neural networks waxed and waned?

I think that is what the article may be getting it.


In respect to the first quote from the article "computers could think like humans do -- using intuition rather than rules". Is this really a good description of neural networks? I don't think so, to me it sounds like a bad misrepresentation of what neural networks are by a writer who can't describe the way ai works without humanizing it inaccurately.


That's probably Hinton's words.


If that is the case I am eating my words


We get blasted with X greatest sports player of all time, or Y most important historical figure! None of those are really accurate, but are those people important and accomplished enough that they deserve someone lauding them so highly along with the footnote in history they receive? Whether or not they deserve the prestige they're receiving or sole credit for whatever accomplishment? I am not disputing whether or not the article's portrayal is entirely accurate.


Winner takes all, that's how human brain works... Thats why I am fundamentally pessimistic about human future. ..


The article is misleading if not false. Neural nets were hot in academic AI research 30 years ago (1988). The original Perceptron had fallen out of favor in part because of arguments that it could not implement a exclusive or (XOR) in Minsky and Papert's book Perceptrons

https://en.wikipedia.org/wiki/Perceptrons_(book)

Neural nets fell out of favor in the 1970's but came back and became hot in the early 1980's with work by John Hopfield and others that addressed the objections.

https://en.wikipedia.org/wiki/John_Hopfield

Practical and commercial successes were limited in the 1980's and 1990's which led to a reasonable decline in interest in the method. There were some commercial successes such as HNC Software which used neural nets for credit scoring and was acquired by Fair Isaac Corporation (FICO).

https://en.wikipedia.org/wiki/Robert_Hecht-Nielsen

I turned down a job offer from HNC in late 1992 and neural nets were still clearly hot at that time.

Some people continued to use neural nets with some limited success in the late 1990's and 2000s. I saw some successes using neural nets to locate faces in images, for example. Mostly they failed.

AI research is very faddish with periods of extreme optimism about a technique followed by disillusionment. One may wonder how much of the current Machine Learning/Deep Learning hype will prove exaggerated.

Also, traditional Hidden Markov Model (HMM) speech recognition is not rule based at all. It uses a maximum likelihood based extremely complex statistical model of speech.


Hinton himself did just fine in terms of academic popularity in the '80s and '90s. We can look at his citations on Semantic Scholar:

https://www.semanticscholar.org/author/Geoffrey-E-Hinton/169...

Citations to his papers have been rising steadily from between 88 and 107 in 1987 to between 685 and 826 in 1999. That's hardly an unpopular researcher.

And for a bit of comparison with other machine learning researchers, here's a link to a data set of family relations from a 1986 paper by Hinton:

https://archive.ics.uci.edu/ml/datasets/Kinship

At the bottom of that page, in the Relevant Papers sections there's two links to two papers using the data set, one Hinton's own paper that introduces it and one by Quinlan.

Clicking on the [Web Link] links for the two papers, I can see the references to those papers. There is a single reference to Quinlan's paper. There are 43 to Hinton's, of which all but 6 are from 1999 and earlier. And those are not self-references, neither references by Bengio, Le Cun et al. If there is a clique, it is hard to see it.

So there was a lot of interest to Hinton's work even in the years he was supposed to be "exiled to the academic hinterland" as another article said.


> AI research is very faddish with periods of extreme optimism about a technique followed by disillusionment.

Truth! I happened to go through grad school at a time when SVMs and kernel methods were cool, neural nets were the opposite, and Hinton and his students were oddball outcasts at NIPS. Kudos to Hinton for sticking to his story until the current "deep learning" hype wave presumably allowed him to buy a yacht and an island. I imagine we'll be hearing about some other non-linear optimization technique in a few years.


The XOR story is misleading. :-) The Wikipedia page does a good job in what the book actually proved:

"What the book does prove is that in three-layered feed-forward perceptrons (with a so-called "hidden" or "intermediary" layer), it is not possible to compute some predicates unless at least one of the neurons in the first layer of neurons (the "intermediary" layer) is connected with a non-null weight to each and every input. This was contrary to a hope held by some researchers in relying mostly on networks with a few layers of "local" neurons, each one connected only to a small number of inputs. A feed-forward machine with "local" neurons is much easier to build and use than a larger, fully connected neural network, so researchers at the time concentrated on these instead of on more complicated models."


Most other scientists dismissed neural networks? Is there some history I am unaware of as that doesn't seem true. Did the article want to push the idea of the lone rogue thinker a bit too much?


http://www.dataversity.net/brief-history-deep-learning/

Neural Networks weren't really thought of very highly until CNNs started winning image recognition competitions in the early 2010's.

I think most people had the feeling that they were interesting tools to learn how the brain worked, but too slow and opaque to be practical statistical tools. I've been following Hinton for a while (because of Hinton and Shallice 1991), and my understanding is that it was really hard for him to get funding especially when he was just starting out.

The fact that so much of the work from the mid-80's to 2000 came from just a few labs should tell you how hard it was to get funding for that kind of research.


> The fact that so much of the work from the mid-80's to 2000 came from just a few labs should tell you how hard it was to get funding for that kind of research.

If you look through papers from that era I don't think you'll find that's true at all. You could fairly say that only a few of the prominent labs from the first neural-net boom lasted long enough to still be prominent labs now in the second neural-net boom (though even then there are several: Hinton, Bengio, Schmidhuber, LeCun, etc.). Since they're still around doing interviews and putting out new papers, understandably their work has a higher profile now than that of people who aren't in the field anymore. But there have been a ton of others over the years too, just many of them from the first wave have moved on or retired by now.

Especially in the '90s the field was hot and reasonably large (and it was pretty easy to get funding, too). If you look at e.g. the NIPS 1992 proceedings, it's definitely not just a handful of labs: https://papers.nips.cc/book/advances-in-neural-information-p...


>You could fairly say that only a few of the prominent labs from the first neural-net boom lasted long enough to still be prominent labs now in the second neural-net boom (though even then there are several: Hinton, Bengio, Schmidhuber, LeCun, etc.)

I agree, I don't mean to imply that Hinton did all of the work on NN's, but he certainly contributed a lot to the field, even as the popularity waxed and waned.

Even if you just click on a few of the names in your link, you can see most of those authors submitted ~1-6 papers. Hinton, Bengio, and Jordan have submitted >50 each. I wouldn't say that NN's were exactly thought of as a joke before 2011, but from the people I talked to about them, they weren't considered very promising as practical statistical systems.


Definitely seemed like that in the 2000s when I was doing my PhD. Yann LeCun even wrote an open letter to the computer vision community around 2012 complaining about them automatically rejecting papers that used neural networks. I've been told that letter led to him launching the conference ICLR. Only a couple years later he was a keynote speaker at CVPR, the big computer vision conference.


Maybe, but let's not get hung up on the title—that tends to keep discussion shallow. We'll change "most" to "many" above.


I wouldn't wouldn't go so far as to say "dismissed," but they was definitely a period or two when they weren't as shiny.

When I did my M.S. reasearch in 2010 SVMs were far more popular than neural networks. Deep learning was just over the horizon and most people had given up on it being useful for learning real world problems. We had proven that NNs could learn anything given enough data and the right structure, but the hard part in real life was having enough data and finding the correct structure. So SVMs seemed more robust with available datasets, and most people used those. There were/are several other popular methods, like nearest neighbors and tree-based methods, and NNs just weren't as sexy as they seemed in the late 80's.


They are referring to Minsky and Papert's 1967 book "Perceptrons". Which said that a single layer NN can't be turing complete because it can't do XOR. The big 1990s breakthrough was having multilayer networks...which actually were also described in the Perceptrons book, though at the time computers weren't powerful enough to implement them.


I don't understand how it took an entire book to show that a linear classifier can't learn the xor function.


There’s more to the book than my one-sentence summary. Believe Minsky and Papert have something extensive to say on a subject it’s probably not shallow. There’s an ok, if brief discussion about the book’s influence on NN research in its Wikipedia entry.

It’s not a super long book and an excellent, eye opening read even 50 years later.


You should check out the book. It is really fun (especially the illustrations).


As time goes on, higher frequency computers will increase the number of AI tasks which are possible in a fixed period of time.

If you took half the apps built today and tried to run them on compatible hardware from 20 years ago, they would all flop.

Without the understanding of modern computing, one would have dismissed all such inventions as useless wastes of time.


I thought this too. Neural networks definitely seemed like a credible research topic in the 90s at the very least (which is when I first became aware of them).


I wonder when they'll write this about Michael Jordan. "History doesn't repeat itself, but it often rhymes"

Probably they'll never mention Friedman and Breiman, which seems pretty unfair considering their gizmos have arguably had a bigger impact in "actual machine learning gizmos deployed..."


Hah. I am genuinely confused about which Michael Jordan you were referring to. I am assuming you mean Michael I Jordan.


I see this article as an opportunity to know a little bit more about Hinton's personal life and personality. It's not a neural nets article, we we shouldn't dig too deep into the controversy about who invented what and if they were alone or not.


In my single interaction with Hinton, we were talking about a theory, he told me how he thought of it years ago and he remembered people distinctly disagreeing with him. I feel Hinton carries a tiny bit of salt with him where he goes, which explains his sarcasm as well.


Jeepers, his family re-invented logic, worked on the Manhattan Project, created the jungle gym. Plus he looks a bit like Davros, the (fictional) creator of the Daleks.


This is an odd story that seems to gloss over the downsides of neural networks. Computational power needed to build some of the models is enough to explain how slow uptake was. At least in large. I would be interested to see just how many multiplications go into a typical model nowadays. In particular, the training of one.

But that still skirts the big issue, which is generalization. We are moving, it seems, to transfer learning. The danger is that we don't seem to have a good theory to why it works. At a practitioner level, I don't think this is as much of a problem. For the research, though, it is pretty shaky.

I think there is more than a strong chance this remains the future for a while. And I am layman in this field. At best. But this story presupposes that the past was wrong for not being like the present. That is a tough bar.


I believe neural networks are not quite the right approach, and their success is misleading.

Historically, almost all ML approaches were based on separation in Euclidean (vector) space, which is understandable because they were developed for much weaker computers. However, really useful ML tasks require to deal with huge nonlinearities, and the fact that you're training in linear space becomes less relevant.

Neural networks have surmounted the nonlinear difficulty by increasing the number of layers. But it's a question whether a similar result couldn't be achieved with Bayesian networks on binary representations (which is approach I favor).

There is some evidence that the precision of linear calculation in neuron (for example, resolution of the weights) doesn't really make much difference in the performance of neural network. Could it be that the neural learning through vector manipulation is just an artifact of the origin of the neural networks, and the really important thing is the overall organization of the network (the layers)?


I'm assuming people are trying with Bayesian methods. Any good research it practical examples showing advantages?


I find it strange you mention transfer learning, since one of the reasons neural networks are so popular are because they tend to excel at it. Adapting (i.e. fine-tuning) networks trained on a task with a lot of data (e.g. image classification on ImageNet) to different tasks, such as image segmentation, has proven a very successful approach.


That was my intent in mentioning them. My understanding, though, is that we don't have a solid foundation on why it works. Just that that is the direction we are moving, and neural networks are good at it.


Researchers are supposed to be looking toward the future, and shouldn't be hindered by the limitations of the present. So even if your methods are not feasible today, you can still investigate them and be ready for when they will work. That many of those other scientists didn't have the foresight to see that this would work someday is definitely a big miss on their part. Let's not give them a pass.


That is somewhat silly. Especially when you can't investigate using the tools of the day.

Be clear on that. He was not able to see the success using these techniques on older hardware. Period. People weren't dismissing him. They were doing better.


“I cannot imagine how a woman with children can have an academic career. ...." This is the real truth and reality for anyone actively parenting and trying to deeply understand and research anything ...Grateful that this article chose to include the quote


There weren't many, but there was a strong contingent of neural network researchers going strong since the time I was in high school in the early 80s. Jerome Feldman (University of Rochester) was a neighbor. James McClelland (Department of Psychology) was a mentor of mine in the mid-80s. This field was far from ignored. We used different names (connectionism, backpropagation) and most importantly we had computers that were tens of thousands of times less capable than what is available today.


his model for contrastive divergent learning pre 2000 iirc was what really set the base for his breakthrough in the mid 2000. I think it took him sometime to make the jump from contrastive divergence learning to RBMs that learnt good priors for deeper layers...


he was well known name in the 80s and back again with RBMs in the 00s https://www.youtube.com/watch?v=AyzOUbkUf3M . He and Sejnowski are some of the few names i remember when i took an NN class a long time ago. He was insistent on working on it when many others saw it as a peripheral curiosity to their career.

what's with everyone here?


There is something in Canada because the best book (I tried to understand) about neural networks back in the 90s was "Neural Networks: A Comprehensive Foundation" by Simon Hayking [1]

[1] https://en.wikipedia.org/wiki/Simon_Haykin


> His great-great-grandfather was George Boole

Is that true?


According to https://www.geni.com/people/Geoffrey-Hinton-FRS/600000006697... , yes:

Geoffrey Everest Hinton → Howard Everest Hinton → George Hinton → Mary Ellen Hinton → George Boole


That is a pretty awesome bit of trivia. I assume they did some basic fact checking, and if Hinton's grandfather was not Boole, he probably would notice this in the article and correct it.

It certainly seems plausible: https://en.wikipedia.org/wiki/George_Boole


Sorry, this was just a terrible boolean algebra joke that I was unable to delete in time.


In a video chat between Andrew Ng and Hinton (part of Deeplearning.ai course), Hintom himself confirms this.


This was a very interesting article, but as a juggler, the most interesting thing to me was how he learned to juggle grapes with his mouth. I need to run to the store to pick up some grapes now!


   "an outsider clinging to a simple proposition: that computers could think like humans do—using intuition rather than rules. "
I stopped reading right there.


Please don't post shallow dismissals here, especially not internet trope ones.


We know humans go by intuition first (the book Rightous Mind supports this if you want o read further).

So why isn't this something that's in the realm of possibility for computers?


Intuition is an ill-defined term that often distances us from the actual processes at hand.


Is it though? We might not understand it fully, but we understand it's role in allowing us to make quick decisions and not overload our cognition with the mundane.


The opposite viewpoint is that intuition is an ill-defined term for a phenomenon that, if better investigated and nailed down, could lead to better processes.


Despite hype and success, Machine/Deep Learning have their own limitations, which is a generally admitted fact.

At a fundamental level - are our brains actually comparable to how ML works (beyond some basic analogies)? Do we have an statistical engine running inside our heads, needing tremendous "CPU power" to do something remotely useful/accurate?

I'd say that no, and that that conceptual mismatch indicates that the next big iteration on AI will be something more like what D. Hofstadter advocates/researched.

(Using ML as a sidekick, why not. No need to trash out the current progress)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: