I found it easier to understand the idea by skimming the introduction in the paper [1]:
> TCAV uses directional derivatives to quantify the model prediction’s sensitivity to an underlying high-level concept [...] For instance, given an ML image model recognizing zebras, and a new, user-defined set of examples defining ‘striped’, TCAV can quantify the influence of striped concept to the ‘zebra’ prediction as a single number. In addition, we conduct statistical tests where CAVs are randomly re-learned and rejected unless they show a significant and stable correlation with a model output class or state value.
> “AI is in this critical moment where humankind is trying to decide whether this technology is good for us or not,” Kim says. “If we don’t solve this problem of interpretability, I don’t think we’re going to move forward with this technology. We might just drop it.”
AI interpretability is a very important and exciting field, and I don't mean to detract from the rest of this article, or from the speaker's work. However:
1) Technology is neither good nor bad in and of itself. It is a tool that can be used for one or the other.
2) If it's useful for something, some people will use it, despite the protests of others. Have we ever collectively decided to "drop it" in the past, where "it" is a powerful technology?
CFCs and asbestos are very useful and we've also decided to largely ban them outright because of the horrible side effects. It turns out there's hardly any safe ways to use them so they're not really worth the benefits.
> Because CFCs contribute to ozone depletion in the upper atmosphere, the manufacture of such compounds has been phased out under the Montreal Protocol, and they are being replaced with other products such as hydrofluorocarbons
Maybe this success is at least somewhat due to the fact that there is an alternative technology that has very similar utility and cost, but without the significant negative effects.
The CFC-free albuterol inhalers (for us asthmatics) introduced in the last ten years are objectively worse than the ones they replaced. On the bright side, the new formulations meant new patents for drug manufacturers and the elimination of cheap generic alternatives for patients, so there's that.
I'm not sure how applicable this is because of the nature of the AI beast. It took decades for asbestos to be banned and given how slow we were to collectively make the decision we might just be too slow in turning away from AI/ML tech.
The other problem with AI is a matter of definitions- how do you even define it? Whereas it's obvious what leaded gasoline or radium watch faces are. When you've got a technique that is contained in featureless server racks and produces outputs, not substances, you can't really ban the outputs (they're just decisions) and banning the techniques is probably impossible short of melting down all the GPUs or whatever.
One possibility is that a bad AI decision could create huge liability costs on whoever is running the algorithm. If the algorithm can't explain that its decision was reasonable, courts could get... testy. And that would drive people towards at least explainable AI.
That is, it might end up being cheaper to have some racist employees than a big monolithic AI making racist decisions (or other human failings like being a bad driver).
I feel like one big difference with AI is that, supposedly, at some point it's not just technology, it's conscious and conscious things can definitely be inherently bad.
at least to a certain level of consciousness, the goals of an AI are still determined by its creator (given they understand how to set goals for machines that are advanced enough to call them "conscious" on some level
> Macaque monkey: (2017) First successful cloning of a primate species using nuclear transfer, with the birth of two live clones, named Zhong Zhong and Hua Hua. Conducted in China in 2017, and reported in January 2018.[59][60][61][62] In January 2019, scientists in China reported the creation of five identical cloned gene-edited monkeys, using the same cloning technique that was used with Zhong Zhong and Hua Hua – the first ever cloned monkeys - and Dolly the sheep, and the same gene-editing Crispr-Cas9 technique allegedly used by He Jiankui in creating the first ever gene-modified human babies Lulu and Nana. The monkey clones were made in order to study several medical diseases.[63][64]
> Over the past 10 years, technological advances and innovative platforms have yielded first-in-man PSC-based clinical trials and opened up new approaches for disease modeling and drug development.
Imagine making a linear regression to predict runners’ speed using data from two individuals timing their speeds. Not a very interesting problem. Assuming they’re both competent measurers, you would probably be best off taking half of the first measure and half of the second measure (aka the mean). But the clever linear solver might notice that you can technically get a better answer by weighting the first measurer by 3.001 and the second measurer by -2.002 due to natural variance and the way things happened to land. We can adjust the solver to not do this by punishing it for large coefficients (ridge regression) and that’ll mostly settle that.
But for a neural network with hundreds of mini logistic regressions built in... seems tough. Interpretability isn’t just knowing how decisions are made but also how much irrationality is being built into it. If one factor is being marked as a huge negative influence, what does that actually mean about the problem space? Maybe nothing. If you actually want an interpretable neural net model, you should probably hand craft layers of domain specific ensembles that you can individually verify and are closer to being self evident. Maybe you won’t know how it determines stripes or horselike, but if you feel good about those two models individually then it’s a much easier task to follow the last step which is: it’s a zebra if and only if it’s horselike and has stripes.
Very interesting although I thought the chainsaw analogy was poor, after all while the user may not understand how it works the person who built it certainly does.
That's what stroke me when I was first reading about neural nets 15 years ago, that researchers was admitting nobody really understands why neural nets do what they do. Maybe there are new publications that overthrow this view?
The question of explaining how a neural network made a decision also comes up with us humans of course and is often not easy. Don't know why I didn't call... https://www.youtube.com/watch?v=tO4dxvguQDk
Human rationalisation has it's issues eg “So convenient a thing to be a reasonable creature, since it enables one to find or make a reason for every thing one has a mind to do.”
Just finished reading the paper.
TLDR; a linear classifier is trained on the activation values of an already trained network when it is fed examples of a specific high level feature. Assuming the feature is isolated to a certain linear subset of the activation matrix, a vector along the decision boundary can be computed. When feeding examples, the magnitude of the directional derivative in this vectors direction can be computed on the activation matrix, yielding a measure of how much the feature is responsible for the overall derivative..ie. the total activation.
Seems like an obvious technique. A few months back, someone posted a similar linear reverse engineering technique to make a customizable face generator from already trained GANs.
"how much the feature is responsible for the overall derivative" part is obvious. But I'm curious how the assumption can be justified - how do you isolate a feature to a certain linear subset? And how do you know that this feature is (i.e.) "stripes"? The mapping from vectors to human language is the part that seems hard.
By training a binary linear classifier over the vector space of the other NNs responses to a set of inputs with that specific feature.
Language shouldn't be necessary if your feature can be conveyed through examples.
But yes, it's a big assumption to say that all features can be isolated as linearly decidable subsets of the activation space.
I would guess one could get better results with stronger, non linear classifiers combined with more abstract generalizations to directional derivatives.
Well, aren't we all powered by organic decision making algorithms ?
Can it be emulated with our current tech / knowledge ? maybe, maybe not.
In the end it all boils down to: is intelligence / consciousness 100% material[0].
If it's the case it would be theoretically possible to replicate it. In practice it's much more complex.
If these principles can't be explained by materialism I think we'll have even bigger questions to answer.
It could be material while not being expressible by a Turing machine. If by "algorithm" you mean something that runs on a Turing machine, then human intelligence doesn't have to follow any algorithm. If you mean that human intelligence is governed by equations, then that must be true if materialism is true, because the laws of physics (or "material reality") are governed by equations.
Sure, but I don't think anyone can say with confidence "machines will never think" or similar things (unless we're debating about the definition of "thinking").
I remember reading that people who worked on "flying machines" said it wouldn't be possible to have heavier than air planes before decades or centuries (or even ever), little did they now that they already existed for a few months but the news didn't came to them yet.
Am not going to argue whether a machine can “really” be alive, “really” be self-aware. Is a virus self-aware? Nyet. How about oyster? I doubt it. A cat? Almost certainly. A human? Don’t know about you, tovarisch, but I am. Somewhere along evolutionary chain from macromolecule to human brain self-awareness crept in. Psychologists assert it happens automatically whenever a brain acquires certain very high number of associational paths. Can’t see it matters whether paths are protein or platinum.
What if there is no evidence that the path matters, and on top of that, there is evidence that human brains and society are biased toward unjustified belief that the path matters?
I understand this, however it's not a fair point. Without proper understanding of 'swimming' and 'flying' these technologies wouldn't be possible. So the link between thinking and AI is not totally trivial in my opinion.
>> (...) whether Submarines Can Swim.
> Or planes can fly, right?
It's very interesting that there is no problem to describe the action of planes as "flying" (because we humans do not fly), but OTOH it feels very weird to say that a submarine swim... becase WE do swim...
> TCAV uses directional derivatives to quantify the model prediction’s sensitivity to an underlying high-level concept [...] For instance, given an ML image model recognizing zebras, and a new, user-defined set of examples defining ‘striped’, TCAV can quantify the influence of striped concept to the ‘zebra’ prediction as a single number. In addition, we conduct statistical tests where CAVs are randomly re-learned and rejected unless they show a significant and stable correlation with a model output class or state value.
[1]: https://arxiv.org/abs/1711.11279