Hacker News new | past | comments | ask | show | jobs | submit login
Yann LeCun on the IBM neural net chip (facebook.com)
120 points by luu on Aug 9, 2014 | hide | past | favorite | 45 comments



What LeCun is missing is that the IBM chip was designed to be ultra low power (63mW to do pattern recognition on a real time video). One of the novel techniques this chip uses to achieve low power is to use spikes to send information between neurons.

It is not clear how LeCun would propose to send information in his convolutional neural networks in specialized hardware (from what I can tell, he uses FPGAs which are very high power in comparison). In the worst case, that approach sends data between neurons on each time step, which is very inefficient in power. If they do something more clever, I would bet it would start to like sending spikes.

It also seems very short sighted to say that if the hardware is not specifically designed for convolutional neural networks, then it is not the right architecture. It seems like the IBM chip does support convolutional networks, but it might require a few extra time steps to average.

Biology has evolved to use spikes (across many different species). Perhaps evolution didn't get the memo that non-spiking convolutional neural networks is the only architecture worth building. Maybe it will take a few more thousand generations before evolution catches up, but until then spiking neuron architectures seem like a decent gambit ...


> Biology has evolved to use spikes (across many different species).

Those different species didn't independently arrive at their own neuronal implementation, we're all using the same basic pattern (with heavy modifications).

More importantly though, in biology, most cells use temporal encoding - one spike isn't meant to be a binary event. Instead, the frequency of pulses over time is used to encode intensity. The receiving end can then continuously integrate the signal over time. I'm not sure the IBM chip uses spikes in the same way.

> Perhaps evolution didn't get the memo that non-spiking convolutional neural networks is the only architecture worth building.

Temporal encoding in nature is used for the obvious reason of hardware efficiency. If every axon had 8+ binary signal lines it would become unreasonably complex and large, and prone to failure. This somewhat mirrors the design decision in the TrueNorth chip, again for obvious reasons.

However, the two diverge algorithmically. In nature, signal processing is used to transport relative values over the network. The chip on the other hand really seems to use boolean encoding.

You're right about one point though: it's fairly certain that a large number of configurations and algorithms do produce "working" neural networks. Nobody said that imitating our own implementation details will lead to better results than anything else we might want to try.

And in fact, that's the reason why so many computational implementations of neuronal networks do not use temporal encoding, because encoding values in floating points or even integers comes more naturally to computers than it does to wetware. But it is an abstraction that takes up comparatively many resources.


I tend to agree that evolution reaches a local optimum given enough time, but it seems that this chip is geared towards machine learning rather than biological accuracy. And currently integrate-and-fire spiking neurons don't appear to work better on the data we're interested in.

In this light, and although CNNs aren't the only architecture, his criticisms may be a little more reasonable.


From my understanding, the chip is geared towards implementing neural networks with low power consumption, which makes using spikes a reasonable design choice. So one can argue that using spikes is not about biological accuracy, but power efficiency.

That is great that people want general purpose machine learning chips, the question is how to do it in low power. My guess is that the right architecture will be a mix of ML primitives mixed in with things like spikes (and perhaps other primitives seen found in biology).


Yes, that's also my understanding. Important background is that this is not (or at least not solely) a commercial initiative by IBM to produce a machine-learning chip, though I'm sure they would love to sell some too. It's a DARPA initiative to find a way to greatly reduce the power budget needed for large-scale data-processing. And one of the starting hypotheses of this particular program, SyNAPSE, is that sparseness in time, aka spikiness, is part of why biological organisms seem capable of processing large amounts of video/etc. data with lower power budgets than computers seem to require. Here's an excerpt from their program statement [1]:

Current computers are limited by the amount of power required to process large volumes of data. In contrast, biological neural systems, such as the brain, process large volumes of information in complex ways while consuming very little power. Power savings are achieved in neural systems by the sparse utilizations of hardware resources in time and space. Since many real-world problems are power limited and must process large volumes of data, neuromorphic computers have significant promise.

That may or may not be a good hypothesis, but it seems interesting to investigate. In any case, LeCun's real beef is with the DARPA program managers: he thinks a different area of ANN research would've been a better allocation of funds, because in his view this is not among the most promising lines of research. Not an uncommon reaction to DARPA choices, and not always wrong either, but DARPA's got the money.

[1] http://www.darpa.mil/Our_Work/DSO/Programs/Systems_of_Neurom...


> Biology has evolved to use spikes (across many different species). Perhaps evolution didn't get the memo that non-spiking convolutional neural networks is the only architecture worth building.

I think this is representative of a lot of AI now. This chip doesn't obviously improve the state of the art on an arbitrary (but standard) benchmark, so LeCun dismisses it. That type of attitude strikes me as over fitting for an arbitrary benchmark (a local optimum) and missing the bigger picture. This chip (and line of research) could help identify how the brain works (which may very well also unlock strong AI).


Did you make a throwaway account just to mock Lecun without people finding out who you are?


IBM seems to be really good at getting press releases published and patents filed, and really bad at actually turning any of that into purchasable products or services. They've been gutting their core US businesses now for a while (revenues down, costs down, profits up, and everything going to India) and a lot of areas that should be strengths have been horribly mismanaged. I'm curious to see whether they'll continue investing in these kinds of prestige products, and wondering what their rationale is. Patent battles down the line? Effectively part of their ad budget?


It be fair, in this case it's pretty clear this was a DARPA funded project.

It would make even less sense for them to have just said, 'oh no, we're busy focusing on other things at the moment' when DARPA tried to start throwing money at them.


My understanding is that TrueNorth is a product of DARPA SyNAPSE program, and spiking neuron is a program requirement, so IBM delivered what it was asked to deliver.


> Now, what wrong with TrueNorth? My main criticism is that TrueNorth implements networks of integrate-and-fire spiking neurons. This type of neural net that has never been shown to yield accuracy anywhere close to state of the art on any task of interest (like, say recognizing objects from the ImageNet dataset). Spiking neurons have binary outputs (like neurons in the brain). The advantage of spiking neurons is that you don't need multipliers (since the neuron states are binary). But to get good results on a task like ImageNet you need about 8 bit of precision on the neuron states. To get this kind of precision with spiking neurons requires to wait multiple cycles so the spikes "average out". This slows down the overall computation.

Surely you can do this in the spatial domain instead of time? That is, each neuron is one bit, so you have 8 each computing one of the 8 bits he says are necessary? Perhaps the problem is using that value afterwards, I guess.


It's not clear to me how that would work. Sending 1 bit of information to 8 processors is not the same as sending 8 bits to one processor.


this neuron works by providing one bit of output. Lecun claims you need to send the same signal to the machine 8 times to get 8 bits of output, so it is slower than a neuron that can output 8 bits at once. The question is why you can't send the signal to 8 different neurons and get your 8 bits in the same time as one bit.


The answer here is that you send different subsets of input pixels to different neurons.

E.g. you have a 4x4 pixel image you want to feed in to the net and recognize. You subdivide it into 2x2 images (of which you'll have 9), and you send each of those 2x2 images to each neuron. Then the output of the neurons is one if they see waldo, zero if they don't. Why not send everything to every neuron ? That won't work, and they don't have enough inputs anyway in real image sizes.

Actually you'd send more than 9. You'd also include a 2x2 that only includes the corner pixels, to achieve scale independence (recognizing a car whether it's 5x5 pixels or 50x50). Then you'd send that 36 times, but each time rotated by, say 10 degrees. That's how human vision works ("how God does it") and, well, that's how AI is trying to do it. In humans it's not a full haar cascade : we can only see small features using the centre of the retina, and only large features using the rest, and we only allow for limited rotation (meaning humans brains rotate the source image a limited number of times along an exponential curve that only goes up to ~40 degrees rotation).

(This is comparable to a haar cascade)

Now spiking neural networks have suboptimal performance, true, but they have a major advantage : they do unsupervised learning only. You show them a world, and they will build their own model of the world (which isn't as good as our state-of-the-art models for known "worlds" like ImageNet).

Here's what you're doing with spiking nets (more or less). You show them ImageNet (or any dataset) and you keep showing it to them. It will build up an internal model of what the world looks like. After training you train a second algorithm that searches which neuron encodes what. So the idea is that one of the neurons in the network will encode "I saw the letter A", another will encode "I saw B", third will encode "I saw C", so you look which neuron does that.

Spiking neural networks need time, which is another disadvantage. This is "simulated" time, but it nevertheless requires computation to happen to advance time. It takes spiking networks some amount of this simulated time to recognize things (just like animals/humans need time). So to have it recognize it you have to "put a picture in front of them for X time" (meaning keep triggering their inputs in the same way for a while), then wait to see if any of the identified neurons fires in the first, say, 5s after showing the picture.

Where spiking networks wipe the floor with convolutional nets is on unpredictable tasks. Suppose you were having an "open-ended" problem (like, say, a lifeform has). And the environment changes. A convolutional net that was trained will, quite simply, start giving random results. A spiking network will do something. Not necessarily the right thing, but it will try things.

Say you were building a robot that has to deliver supplies across Iraq. Convoluational nets won't adapt. Spiking models will adapt (assuming you let them, and I imagine DARPA will let them). Problem with letting them adapt, of course, is that you may lose control.

And before you say "what about morality ?", I would say that spiking nets are actually more moral. In both cases, spiking or convolutional, you don't actually know how it will respond to unpredictable stimuli. However, if a convolutional net is confronted with something it wasn't trained for, it will simply have random reactions (it's a robot, it'll send random instructions to the higher levels, meaning if it has a gun, it will extremely likely fire the gun, probably aimed at the first thing it recognizes), a spiking model will try something (which, of course, may be "kill all humans", but it might also decide to wait and see if there are hostile moves, or ...). The difference is the spiking model won't simply lose control. I would argue that spiking models will respond much more like soldiers would.

You should think of convolutional nets as classifiers. You train them to answer a yes/no question, and then they can respond. Spiking neural nets are more like puppies. You can train them to bark if they see a car, and then use that to detect cars, but you can also train them to retrieve a ball. (In practice you "read the mind" of the spiking model, and because it's stored in memory, that's easy)


Probably a neural network analog of an analog to digital converter.


Are you claiming that there is a difference in the class of problems that can be computer between an 8 bit processor and a network of 1 bit processors?


It's unclear how the 'wait multiple cycles' is such a deal-breaker. Even when using a GPU (in the standard way) there are cycles that get used in processing the different stages of a computation. But, more significantly, the Spiking Neuron thing doesn't (necessarily) have to work at a low clock rate, or any clock rate, since the same integration-through-time approach could even work asynchronously, or with jitter, etc. It's a pretty robust & low-power design (and evolution found it, who would have guessed?)


LeCun mentions disadvantages of the chip and then mentions a special purpose chip he has had a hand in producing. It doesn't seem surprising that he likes the idea of special purpose chips.

Aside from the particular criticism he makes toward the IBM algorithm, it seems to me that the approach of jumping from one special chip to another abandons the advantages of a general purpose computer itself. If your algorithm has to be cast in silicon each time, tuning the algorithm would depend on the chip's lifecycle. Also, only those few who have the resources to build a chip would be able to supply algorithms narrowing the number of minds working on this.

That alternative I'd like to see is a general purpose highly parallel chip.

The one I know of is the Micron automaton chip. http://www.micron.com/about/innovations/automata-processing

Anyone know of anything similar?


Reading the comments here suggests that it would be cool to use both of the chips together. If it is true that spiking does well with unsupervised learning, then you could feed a bunch of input to a spiker, then scan it for components that could be mostly mimicked with a convolutional chip and reroute the inputs/outputs. (Yeah, the interconnect would suck.) The point is not to come up with some magic better-performing hybrid, but rather to explore an intermediate point in the design space of augmenting/replacing actual neurons with silicon. The IBM chip isn't that close to biology, but it's closer than a convolutional network, and a convolutional network is a much smaller step than a general purpose processor. We might learn about some simple augmentations that are likely to work in practice.

Also, the whole "airplanes don't flap their wings" analogy can be taken too far. Little flying things are qualitatively different from big flying things. You'll notice that a lot of small artificial flying things are flapping, and the biggest natural fliers tend to glide a lot. There are other reasons why nature didn't evolve large fliers. (Although I'm willing to believe some large fliers may occasionally have hot gases shooting out of their back ends, I do not believe propulsion is their purpose.)


The main criticism should be that these neurons are not like real neurons, because integrate-and-fire is an oversimplification of neurons. So it's not really like the brain at all. There is a lot of fanfare from IBM about it, but truly we've had these models since the 80s. I think it's bad science to just "build a machine with a shit ton of IF neurons and see if it does anything".

The fact that Truenorth can learn approximations is not really surprising, we know that thresholded units can approximate well[1]. They should have implemented compartmental neurons [2]

[1] http://en.wikipedia.org/wiki/Universal_approximation_theorem [2] http://en.wikipedia.org/wiki/Compartmental_modelling_of_dend...


Bad science? Trying things? That's fundamentally what true science is all about. Experimentation is where theories are supposed to come from. Remember Nature just put a ton of neurons together to see if it did anything.

What would be called 'good science'? Reading articles and spinning tales about what comes next? Regurgitating summaries of others' work?


This money would be better spent in experiments to find out how neurons work. IF neurons are well studied, and large scale models of the brain using IF models have been done before[1]. The result? Nothing.

It would be an experiment if they were testing a new model. This is a simulation.

[1] http://www.izhikevich.org/human_brain_simulation/Blue_Brain.... of Large-Scale Brain Models


This experiment was different in some way? More 'neurons' this time? That qualifies as an experiment.


This experiment was different and inferior in almost all ways. Both less neurons and simpler neurons than simulated before. We had simulated 100x more neurons in more biological detail than what TrueNorth team did.

This is a great asynchronous circuit power efficiency research, and not a neuroscience research at all.



Out of curiosity, does anyone know why they choose I&F instead of Izhikevich neurons[1] which model more biological spike forms. Perhaps to meet the low power consumption goals?

Also, how do convolutional neural networks model time? I thought that was one of the benefits of spiking networks and STDP.

1. http://www.izhikevich.org/publications/whichmod.pdf


The Izhikevich neuron is a two-dimensional system where I&F are usually one-dimensional, and the former has a larger number of parameters. Mathematical analysis (and implementation perhaps) is easier for the simpler I&F.


>My main criticism is that TrueNorth implements networks of integrate-and-fire spiking neurons...Spiking neurons have binary outputs (like neurons in the brain).

Isn't this better. LeCun is looking at this only from a machine learning perspective.


Machine learning is the goal. This chip should be judged on its learning performance, not on how well it adheres to an oversimplified and incomplete model of how biological neurons might work in the brain.


FWIW, there is no learning on-chip. Machine learning is not the goal of this project, nor is its success dependent on it's learning capabilities (at least not at this phase). Where it does succeed, however, is in low-power computation in an architecture that is scalable and fault tolerant. LeCunn is criticizing an orange for not tasting like an apple.


Learning may not happen on-chip, but the network is still learned, and the performance of the chip is dependent on the learning. The spiking architecture of the chip means that the best learning algorithms can't be used. An ASIC implementing a convolutional neural net could also be low-power, scalable, and fault-tolerant, while taking advantage of the best currently known learning algorithms and ultimately performing a lot better on real tasks.


Just because the brain does it one way, doesn't mean it's the best way. Birds versus jets, nature versus artificial.

That being said, the brain is basically the only working example of true intelligence we have, so perhaps trying to emulate it isn't a bad idea.

But I have this sci-fi notion that eventually the AI community will produce some sort of intelligence that is unimaginably different from our current notion of a brain.


The neurons in the IBM chip are only slightly better approximations to biological behavior, nowhere near close enough to hope that it will accidentally result in brain-like intelligence.

Last I heard, the behavior of biological neurons was so badly understood that even the behavior of the 300-neuron C. elegans worm could not be accurately simulated, even though the neuron connections have been fully determined. That was about 8 years ago, though.


C. elegans situation remains the same today, although there now is a serious project to get it done.

http://nemaload.davidad.org/


Thanks, this is the coolest thing I've seen in weeks.


I believe that project is dead. David Dalrymple took a job at Twitter a few months ago.


Am close to the source; the project is not dead, despite his full time employment, it has made some recent progress and is currently seeking funding.


When I heard about spiking neurons, I assumed that IBM tries to speed up Blue Brain project or cognitive research in general.

But it seems like SyNAPTIC group tries to push this chip into common machine learning applications.

http://www.research.ibm.com/articles/brain-chip.shtml


I should note that Blue Brain Project is very critical against the group which produced TrueNorth. The director of Blue Brain Project wrote an open letter titled "IBM's claim is a hoax", and considers the approach completely useless for neuroscience research.

http://technology-report.com/2009/11/neuroscience-expert-dr-...


If this is the same team who did the "cat brain" at 2009, then I am on Yann's side.


Published on Facebook? Really?


There are some great posts about machine learning and database technology on Facebook.com ( by their staff ).


Yeah that's what I thought - unless he actually works for Facebook. But then the comments on his post are actually quite informative, so I guess each to their own and maybe he has a great audience on FB.


Yann LeCun does work for Facebook.


I believe he's director of AI research at Facebook.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: