this neuron works by providing one bit of output. Lecun claims you need to send the same signal to the machine 8 times to get 8 bits of output, so it is slower than a neuron that can output 8 bits at once. The question is why you can't send the signal to 8 different neurons and get your 8 bits in the same time as one bit.
The answer here is that you send different subsets of input pixels to different neurons.
E.g. you have a 4x4 pixel image you want to feed in to the net and recognize. You subdivide it into 2x2 images (of which you'll have 9), and you send each of those 2x2 images to each neuron. Then the output of the neurons is one if they see waldo, zero if they don't. Why not send everything to every neuron ? That won't work, and they don't have enough inputs anyway in real image sizes.
Actually you'd send more than 9. You'd also include a 2x2 that only includes the corner pixels, to achieve scale independence (recognizing a car whether it's 5x5 pixels or 50x50). Then you'd send that 36 times, but each time rotated by, say 10 degrees. That's how human vision works ("how God does it") and, well, that's how AI is trying to do it. In humans it's not a full haar cascade : we can only see small features using the centre of the retina, and only large features using the rest, and we only allow for limited rotation (meaning humans brains rotate the source image a limited number of times along an exponential curve that only goes up to ~40 degrees rotation).
(This is comparable to a haar cascade)
Now spiking neural networks have suboptimal performance, true, but they have a major advantage : they do unsupervised learning only. You show them a world, and they will build their own model of the world (which isn't as good as our state-of-the-art models for known "worlds" like ImageNet).
Here's what you're doing with spiking nets (more or less). You show them ImageNet (or any dataset) and you keep showing it to them. It will build up an internal model of what the world looks like. After training you train a second algorithm that searches which neuron encodes what. So the idea is that one of the neurons in the network will encode "I saw the letter A", another will encode "I saw B", third will encode "I saw C", so you look which neuron does that.
Spiking neural networks need time, which is another disadvantage. This is "simulated" time, but it nevertheless requires computation to happen to advance time. It takes spiking networks some amount of this simulated time to recognize things (just like animals/humans need time). So to have it recognize it you have to "put a picture in front of them for X time" (meaning keep triggering their inputs in the same way for a while), then wait to see if any of the identified neurons fires in the first, say, 5s after showing the picture.
Where spiking networks wipe the floor with convolutional nets is on unpredictable tasks. Suppose you were having an "open-ended" problem (like, say, a lifeform has). And the environment changes. A convolutional net that was trained will, quite simply, start giving random results. A spiking network will do something. Not necessarily the right thing, but it will try things.
Say you were building a robot that has to deliver supplies across Iraq. Convoluational nets won't adapt. Spiking models will adapt (assuming you let them, and I imagine DARPA will let them). Problem with letting them adapt, of course, is that you may lose control.
And before you say "what about morality ?", I would say that spiking nets are actually more moral. In both cases, spiking or convolutional, you don't actually know how it will respond to unpredictable stimuli. However, if a convolutional net is confronted with something it wasn't trained for, it will simply have random reactions (it's a robot, it'll send random instructions to the higher levels, meaning if it has a gun, it will extremely likely fire the gun, probably aimed at the first thing it recognizes), a spiking model will try something (which, of course, may be "kill all humans", but it might also decide to wait and see if there are hostile moves, or ...). The difference is the spiking model won't simply lose control. I would argue that spiking models will respond much more like soldiers would.
You should think of convolutional nets as classifiers. You train them to answer a yes/no question, and then they can respond. Spiking neural nets are more like puppies. You can train them to bark if they see a car, and then use that to detect cars, but you can also train them to retrieve a ball. (In practice you "read the mind" of the spiking model, and because it's stored in memory, that's easy)
Are you claiming that there is a difference in the class of problems that can be computer between an 8 bit processor and a network of 1 bit processors?