Capsule Networks Explained

asavinov · on Nov 11, 2017

Here are some other posts explaining the nature of capsule networks, their goals and how they work:

- https://medium.com/@pechyonkin/understanding-hintons-capsule...

- https://hackernoon.com/what-is-a-capsnet-or-capsule-network-...

ktta · on Nov 11, 2017

Here's a fluffy short piece about Geoffrey Hinton + Capsule Networks.

[1]: https://www.wired.com/story/googles-ai-wizard-unveils-a-new-...

sheerun · on Nov 11, 2017

When it comes to translation/rotation invariance, this is similar idea to "Harmonic Networks: Deep Translation and Rotation Equivariance" paper:

- https://arxiv.org/pdf/1612.04642.pdf

- https://www.youtube.com/watch?v=qoWAFBYOtoU

Maybe they can be combined?

icc97 · on Nov 11, 2017

Interesting links. That paper indicates that it's primary benefit is with rotations rather than translations. Regular CNNs are perfectly capable of dealing with translations.

alde · on Nov 11, 2017

Looks like the part about translational invariance is wrong. Translational invariance is an invariance to translations, not rotations. If a model detects a rotated cat as a cat, then it is rotationally invariant.

tomxor · on Nov 11, 2017

I suspect transform invariance is what is meant, although we find some transforms much harder than others which may hint at a more descrete process than a transform matrix in human visual systems.

icc97 · on Nov 11, 2017

I'd say transformations are more important than rotations, as in a 3D world we'll almost never see an object from a perpendicular view point, but most of the time we'll see objects that are the right way up.

tomxor · on Nov 12, 2017

> in a 3D world we'll almost never see an object from a perpendicular view point

True, however transforms would be more useful as an umbrella term in this context for the subset of transforms that include perspective + orientation of a fixed geometry. Visual systems only need to care about this subset in almost all cases...

In which case it's conceivable that we infer geometry through a set of discrete transforms somewhat like rotations, translations and scaling, or perhaps there is a component that did happen to converge on something more unified resembling an arbitrary transform matrix. If only we could simply identify these pieces in biological systems.

tycho01 · on Nov 11, 2017

If the point is to easily reconstruct geometry, then mimicking humans should mean using 3D imagery (same object seen from two eyes) to get a better idea of its shape. Wonder if that might some day become part of best practice in computer vision too.

taneq · on Nov 11, 2017

In the same vein, I've always thought that operating on short (< 1s) video clips would help a lot with overfitting and object differentiation.

NHQ · on Nov 11, 2017

There one thing in the paper that has me stumped

> Each primary capsule output[sic] sees the outputs of all 256 × 81 Conv1 units whose receptive fields overlap with the location of the center of the capsule.

What does that mean? The capsules are bundles of convolutions, and the output of the "256 * 81 conv1" is a 1D manifold. What does it mean "overlap" and what is the center of the capsule?

Note on [sic] - seems like it should read "input"

eref · on Nov 12, 2017

I think it is a pretty unnecessary sentence. 81 comes from the 9x9 kernel size. It is obvious that those will overlap despite of the stride of 2. Maybe they mean the projective field.

NHQ · on Nov 12, 2017

Thanks. So maybe it is saying that the field overlap with capsules is implicit in network, not a step in the calculation? That's my conclusion.

dnautics · on Nov 11, 2017

I feel like capsule network are one step closer to a hybrid between standard deep learning tools and hofstadter conceptual slippage networks

shpx · on Nov 11, 2017

https://mindmodeling.org/cogscihistorical/cogsci_10.pdf page 601

http://www.cognitivesciencesociety.org/conference/ is an intimidating amount of information

m3kw9 · on Nov 12, 2017

I thought CNN was also translational invariant, why are they saying it’s not?

m3kw9 · on Nov 12, 2017

These guys doesn’t seem to understand capsule network all the much, they had a translational image showing a rotated cat, now modified to properly show it translated

catern · on Nov 11, 2017

Aw... I thought this was a post explaining active networking using capsules.

yeukhon · on Nov 11, 2017

The startup? Is that what you are referring to?

catern · on Nov 12, 2017

No, https://en.wikipedia.org/wiki/Active_networking