New technique would reveal the basis for machine-learning systems’ decisions

YeGoblynQueenne · on Oct 28, 2016

>> Neural networks are so called because they mimic — approximately — the structure of the brain.

Grumble.

ANNs are "approximately" like the brain [1] as much as Pong is "approximately" like the game of Tennis. In fact, much less so.

ANNs are algorithms for optimising systems of functions. The "neurons" are functions, their "synapses" are inputs and outputs to the functions. That's an "approximation" of a brain only in the most vague sense, in the broadest possible strokes, so broad in fact that you could be approximating any physical process or object [2].

Like, oh, I dunno- trains.

Trains, right? The functions are like coaches and the parameters they pass between each other are like rails. Artificial Neural Networks --> Artificial Train Networks; they mimic - approximately - the structure of the train.

Stop the madness. They're nothing like brains, in any way, shape or form.

And grumble some more.

_____________

[1] Wait- which brain? Mammalian brain? Primate brain? Human brain? Grown-up brain? Mathematician's brain? Axe-murderer's brain?

[2] Because... that's what they do, right? They approximate physical processes.

akyu · on Oct 28, 2016

You are right that the learning process which ANNs use is pretty much nothing like the brain. But an argument can be made that the feature representations which they learn are quite brain like. For example in visual cortex, you will find feature detectors for things like edges and whatnot, with more abstract features as you climb to higher layers. You see very similar things in Convolutional nets.

Spiking neural network models are actually very brain like, but no one really knows how to use them effectively.

I think its also important to realize that we don't have simulate the brain to do what the brain does. If you want to build a flying machine, you could model a bird with flapping wings, but its much easier to just build a fixed wing aircraft. Back propagation may be a similar case. Evolution found a very good way to learn with neurons but it may not be optimal for the things we care about. We may be better off sticking to back propagation because it works so well. Also people are investigating backpropagation alternatives which start to look more and more brain like.

Basically my point is there seems to be intrinsic characteristics of neural computation, which ANNs and real brains both demonstrate.

argonaut · on Oct 28, 2016

quite brain like*

* for just two layers of the visual cortex, V1 and V2 (out of 6 layers). But nevermind for any other brain regions, and never-mind about CNNs applied to speech, robotics, NLP, finance... basically any non-vision domain.

joeyo · on Oct 29, 2016

You may be right that NN activations don't resemble neural receptive fields (though Tomasso Pogio and others disagree [1]), but I strongly suspect that to the extent that NNets are able to learn to represent statistical invariants, we will see clear homologs to biological neural networks.

The grandparent post made the point that there could intrinsic characteristics to neural computation shared by natural and artificial networks. There's something that feels right about that, but I'd take it a step further: both natural and artificial networks learn to explain the statistics of the natural world in a compact representation. How many possible equivalently compact representations are there? Wouldn't it seem like a monstrous coincidence if there were multiple representations possible yet both DNNs and the mammalian visual system both decided to represent edges first?

By the way, you should get your anatomy right if you wish to speak so authoritatively: V1 and V2 are not layers of cortex, but rather adjacent cortical subdivisions with cytoarchitecture that correspond to Brodmann areas 17 and 18.

1. https://mitpress.mit.edu/books/visual-cortex-and-deep-networ...

argonaut · on Oct 29, 2016

Notwithstanding the fact that this is just his theory, Tomasso Pogio mostly does not seem to disagree. Reading the summary of that link, it appears he asserts the ventral stream (one of two streams) is well explained by convolutions. So he does not assert the same about the dorsal stream, any other brain region, or any other non-visual domain of deep learning.

It bears noting that our visual cortex is one of the simpler areas of our brain, considering that equivalent functionality is in all sorts of animals.

It's not a coincidence that DNNs and mammalian visual systems both decide to represent edges first. That's exactly what Yann Lecun was trying to do when he designed convolutional neural nets! You've got the causality mixed up. Yann Lecun explicitly wanted to model the visual system so he came up with a system that would give him the results he wanted (edge detection in the first layer) - which meant linear filters.

joeyo · on Nov 2, 2016

Well, the ventral stream is the "what" stream, so that's the natural one to draw comparisons to for the task of representing objects. The "where" pathway for the most part represents space with a place code, that is, different neurons "care about" different parts of space in an increasingly abstract way going from representing eg "the left visual hemifield" early in the dorsal stream to representing "the left side of objects" in parietal cortex. It would be interesting to see if this kind of invariant emerges in NNs. Convolutional networks are able to caption images with labels like "a cake on top of an oven". What do their activations look like?

I reject the notion that visual cortex is "simple", but I will concede that it is highly conserved across species. This just means that the representations that it uses are effective.

You are no doubt right that Lecun was inspired by the biology of visual cortex (along with theorists before him), but you missed my meaning: can we build a network that doesn't start with edge detectors first that does better? My guess is no.

argonaut · on Nov 7, 2016

This is just speculation, however. (Speculation about captioning neural nets and the direction of evolution and edge detectors)

mountaineer22 · on Oct 28, 2016

What's your favorite JS framework?

AndrewKemendo · on Oct 28, 2016

Oh stop already. The original paper for ANN's published by Pitts said they were directly inspired by biological models of neuronal activation functions [1] - hence the name. Yes it's potentially misleading to a non practitioner, but everyone gets so mad about this point unnecessarily at tech writers.

[1] https://books.google.com/books?id=gJhBBAAAQBAJ&pg=PA209&hl=e...

egocodedinsol · on Oct 28, 2016

You're not technically wrong, but I think this is a bit pedantic. Most people realize this. Convolutional networks don't actually seem to be a totally bad approximation of feedforward visual processes in the brain at a computational level. And I say this as a neuroscientist.

argonaut · on Oct 28, 2016

Most people don't realize this. Most people will just read popular science articles in the media. They will never read discussions on HN/Reddit/Twitter.

bitCromwell · on Oct 29, 2016

which is precisely why analogies are helpful in communicating with them.

argonaut · on Nov 7, 2016

Which is precisely why false analogies are not helpful.

YeGoblynQueenne · on Oct 29, 2016

>> Convolutional networks don't actually seem to be a totally bad approximation of feedforward visual processes in the brain at a computational level. And I say this as a neuroscientist.

I'm saying that a "(not) totally bad approximation" can still be bad enough that it's completely useless. And even a good one can.

The problem is that you can model pretty much anything with an analogy that's broad enough. Say, you can model any distribution with a straight line... except you will often not learn anything you didn't know before. Or maybe you'll learn a lot about a whole class of distributions, but not about a specific distribution, or how it differs from all the others in its class.

If you've read Foucault's Pendulum- it basically makes this point about the Tree of Life [1]. They're always "fitting" it to everything from pinball machines, to cars, to peoples' sex organs. And it always fits so well! Maybe that's because it's truly divine?

Come to that- why are ANNs based "on the brain" and not the Tree of Life? They sure look a lot like it, superficially. Maybe ANNs are really based on the Mystic Qabbalah, Geoff Hinton is a Rosicrucian and it's all a scheme of the Illuminati. CNNs are probably a model of The Eye on the Pyramid. It all fits, innit! That way lies madness- or at least a whole big bunch of confusion and waste of time.

You mentioned computation- look at Turing machines for a good model of a thing. It's an analogy that's broad enough to represent a whole class of computational devices, yet at the same time it only represents those devices and nothing else. You can't mistake a Turing machine for a potato, or a cricket, fnord what have you. Why can't we have that sort of thing, instead of "Neural Networks"?

____________

[1] https://en.wikipedia.org/wiki/Tree_of_life#Kabbalah

egocodedinsol · on Oct 29, 2016

Can you name something that's a better approximation to the brain's computational processes than a neural network?

duaneb · on Oct 29, 2016

You know, pong is a LOT like tennis. Certainly more than most other ball games. Perhaps your specialized knowledge makes you more critical than necessary? Neural networks certainly work analogously to brain structures, and certainly at the layman level readers of this article are likely to understand.

danielmorozoff · on Oct 29, 2016

Though this is the prevailing view in Neuroscience, in truth there is very little information on the actual computation the brain in doing biologically.

The majority of Neuroscience literature in the CNS focuses on sensory systems signal processing and on the low levels of those pipelines( vision, sound, and taste). There has been recent work in olfaction, but there exist inherent problems with state space and encoding (also olfactory neurons seem to project all over- causing even more experimental problems) to consider (one of the reasons for the attractiveness of vision, sound and taste encoding).

All higher order processing really is a black box. This is one of the primary arguments of pursuing a connectomics based approach. Some cool work in doing targeted connectomics is being done at HHMI and the Allen Institute (both with fluorescent and em microscopy). Where the latter focuses more on fly because of the resolution

https://www.janelia.org/project-team/mouselight https://www.janelia.org/project-team/flylight https://www.janelia.org/project-team/flyem https://www.alleninstitute.org/

So in short, we don't know if NN are like the brain or not.

ujal · on Oct 28, 2016

"Towards an integration of deep learning and neuroscience" - https://arxiv.org/abs/1606.03813

_0ffh · on Oct 29, 2016

Dismissing factual kinship of basic functionality because of difference in details of implementation. Well done, you! Have a cookie.

wcrichton · on Oct 28, 2016

Title is a little general. This specifically is a technique for breaking down text analysis, where the goal is to give semantic meaning to a block of text. In their example, they want to condense beer reviews into star ratings of a few categories. A totally black box technique would take the review and spit out the scores, whereas their technique has two jointly trained networks: one identifies relevant text fragments for each category, and the other gets the corresponding category score for the fragment.

This is not groundbreaking, but still a good example of a larger trend in trying to understand neural network decision making. Here's a cool paper that analyzes how CNNs can learn image features for attributes like "fuzziness" and other higher level visual constructs while training for object recognition: https://pdfs.semanticscholar.org/3b31/9645bfdc67da7d02db766e...

hammock · on Oct 29, 2016

From a business point of view (getting executives to want to use ML) understanding "the black box" is important. But the two-step process you outline would tend to be less accurate than a one-step process, no?

stuartaxelowen · on Oct 29, 2016

It sounds almost exactly like work 2 years ago: "Extraction of Salient Sentences from Labelled Documents"

https://arxiv.org/abs/1412.6815

avivo · on Oct 28, 2016

The general concept: we can figure out what lead to a particular classification for an item by finding smaller subsets of the item that still gave the same classification.

For example, this can show which snippet of text implies a particular review should be classified as "very negative", or which part of an image lead to a classification of "cancerous" for a biopsy image.

This doesn't give you much predictive power about the network however, or tell you how it actually works in general. It simply tells you how it made a particular classification.

Paper link: https://people.csail.mit.edu/taolei/papers/emnlp16_rationale...

avivo · on Oct 28, 2016

From the paper:

"Our goal is to select a subset of the input sequence as a rationale.

In order for the subset to qualify as a rationale it should satisfy two criteria: 1) the selected words should be interpretable and 2) they ought to suffice to reach nearly the same prediction (target vector) as the original input. In other words, a rationale must be short and sufficient."

intro-b · on Oct 29, 2016

any ideas on how this is similar to or differs from the prospects and structure of Darpa's explainable AI (XAI) contract?

https://www.fbo.gov/index?s=opportunity&mode=form&id=1606a25...

curious about the inherent trade-off between predictive power/complexity in ML model and the accuracy of system explanation's inferred by these models

_0ffh · on Oct 29, 2016

IIRC the last XAI approch I read about basically gives you a local linear approximation or Jacobian of the decision surface. You take a look at the biggest terms and call them an explanation. Not too shabby, probably. I don't know if all of the XAI project follows that approach.

Edit: Framing

Houshalter · on Oct 29, 2016

So this is pretty limited. It only works on text data, and just picks the part of the text that most determines the output. The basic idea of doing this isn't new. You can easily ask a naive bayes spam filter what words it thinks are the highest evidence of spam or not spam in a document. It is interesting to see this done with neural nets though. I recall reading something similar that uses gradients to find what words changed the network's output the most, I'm not sure if this new method is actually better and it's not as general.

But this is a long way away from the NN being able to give understandable reasons for its decisions. These methods will always be limited to pointing to a part of the input and saying "that part seemed relevant". But it can never articulate why it's relevant, or what it's "thinking" internally.

I think this is ok though. I mean looking at what features the model is using to make predictions is pretty useful and should give you a rough idea how it works.

I've wondered in the past if neural networks could train humans to understand them. The human would be shown an input, and try to predict the value of a neuron in the network. So the human would learn what the network has learned, and gain intuition about the inner workings of the model.

You can also do a similar process with other machine learning methods. You can train a simpler, more understandable model, like decision trees, to predict the neurons of a neural net. And then the human can study that. You can even train a smaller neural network to fit to a bigger, more complex one.

hyperion2010 · on Oct 28, 2016

I'm still curious about whether it is possible at all to deal with "who sunk the boat" or "the straw that broke the camel's back" cases or whether there will just end up being a list of a thousand reasons that summed up to a decision, perhaps ranked by their weight.

quantum_state · on Oct 29, 2016

Noting that neural nets are associative memories, one would think the best one can get out for the rationale of a specific outcome would be the high level land marks of the dynamics, starting from a given input and all the way to the outcome ... is there anything else one would expect to get out from the system?

godmodus · on Oct 29, 2016

the name neural network is unfortunate. but then again you can't call them "continuous function best fit support vector feedback machines on cocaine"-networks.

the name will cause confusions for many years to come. but it's what we have. nice article, it's good they're trying.

jbclements · on Oct 28, 2016

Yes! I wrote about this in a widely-discussed ^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H totally-unread blog post back in 2014: https://www.brinckerhoff.org/blog/2014/12/12/the-why-button/

Edmond · on Oct 28, 2016

Hmmmm

You're making quite a bold claim, I don't see you make any reference to ML, if I were to draw inference from your post I would conclude you were talking about better ways to instrument software so you can track what it is doing, not a particularly novel idea :)

The paper referenced in this post talks about a technique for teasing out the "logic" for a learning model's decision.