Deep Quaternion Networks (2017) [pdf]

adamnemecek · on March 17, 2018

There's a lot of space to be explored at the intersection of ml and hypercomplex numbers. There's a Clifford svm that unlike a regular sam which learns a hyperplane, learns any manifold.

MrQuincle · on March 18, 2018

That's also what I thought. This is a dissertation on multilayer perceptrons using backprop over Clifford type neurons by Sven Buchholz: https://www.informatik.uni-kiel.de/inf/Sommer/doc/Dissertati...

dbranes · on March 17, 2018

[flagged]

adamnemecek · on March 17, 2018

Look into work by Bayro-Corrochano. I can't find a public link.

mike_n · on March 17, 2018

https://pdfs.semanticscholar.org/c3b3/9a378e6f805d6cae5b05f5...

thesz · on March 18, 2018

The fine article generalizes complex numbers to quaternions. Okay.

But quaternions are themselves generalized by Geometric Algebra. And there is a plenty of information about use of GA in the field of neural computing: https://arxiv.org/pdf/1305.5663.pdf (page 3). For example, universal approximation theorem for GA is presented at https://www.informatik.uni-kiel.de/inf/Sommer/doc/Dissertati...

I think that fine article is a step back.

cgearhart · on March 18, 2018

Thanks for sharing this. There's a lot to digest in there, but there were a few highlights that stood out as possibly relevant to the OP paper.

> Theorem 6.4 ([2]) Complex FCMLPs having (6.9) as activation function are only universal approximators in L∞ for the class of analytic functions, but not for the class of complex continuous functions.

> ... the complex numbers (C0,1) are a subalgebra of the quaternions (C0,2). Hence the quaternionic logistic function is also unbounded. Neither could it give rise to universal approximation (w.r.t. L∞) since this does not hold for the complex case. One may argue that such things become more and more less important when proceeding to higher dimensional algebras since less and less components are affected. This is somehow true, but it hardly justify the efforts.

> ... Summarising all of the above the case of Complex FCMLPs looks settled down in a negative way. ... Hence Complex FCMLPs remain not very promising.

Unless I'm misreading, it seems already known that you _can_ use complex numbers (or quaternions) in neural networks...but you don't really gain anything from doing it.

gaudetcj · on March 23, 2018

One of the authors here. One thing about quaternion convolution is that you can write a color image into quaternion space by considering each channel as an imaginary axis. This lets the convolution act on the entire color space in a different way compared to real-valued networks, which may it do better for things like segmentation where you need to be more sensitive to changes in the color space.

enriquto · on March 18, 2018

geometric algebra is the Rust of mathematics. You guys cannot simply let people do their work without trying to evangelize ?

thesz · on March 19, 2018

In my opinion, GA is Haskell of mathematics. Just like Haskell subsumes Rust as a library (OCaml does [1], thus Haskell does too), GA subsumes quaternions (which are Rust here). Using subpar tool is not a good practice if better tool is readily available.

And it is better tooling what I am trying to advertise here.

[1] https://news.ycombinator.com/item?id=16597329

enriquto · on March 21, 2018

less general does not necessarily mean subpar. On the contrary.

Jeff_Brown · on March 18, 2018

I skipped to the table at the end. The gains don't seem enormous. Is there a kind of problem where we would expect quaternions to perform dramatically better than other kinds of numbers?

highd · on March 18, 2018

Not to mention that it appears they're comparing against networks of the same architecture. If you build your quaternion components with with same number types as your reals you effectively have 4 times the number of parameters, which could be most of the benefit. They should also benchmark against similar architectures with equivalent parameter counts.

gaudetcj · on March 23, 2018

Hi I'm one of the authors of this paper. Sorry if it unclear, but we reduce the number of filters per layer to account for this. The quaternion networks actually have fewer parameters.

highd · on March 24, 2018

Can you provide some technical details on what you do? Do you divide the number of channels on each real layered network by 4? I don't see anything describing this in the paper.

gaudetcj · on March 27, 2018

Yes that is exactly what we do.

gaudetcj · on March 23, 2018

This paper was mostly to lay out the framework and give out the Keras layers for others to use. We expect that the biggest improvements will come from segmentation where the gains may come from treating each color channel as an imaginary axis. And from architectures like PointNet https://arxiv.org/abs/1612.00593.

mochapixel64 · on March 17, 2018

I’m able to follow neither the article nor the discussion. What would I have to learn in order to be able to?

Even if it was, like, years of studying. I’m just curious how deep this rabbit hole is.

theoh · on March 18, 2018

Quaternions are an extension of the idea of complex numbers. Complex numbers have a real and an imaginary part, while quaternions have a real part and multiple imaginary parts (3). So the basic idea is that these richer types of number, when used to build a network (instead of plain real numbers) have benefits.

So to get started with reading this paper you just need to learn about deep learning, and then also the very basics of quaternions, which would be taught in, for example, a first course on abstract algebra.

std_throwaway · on March 18, 2018

Quaternions are used to describe a rotation in 3D-space. Three numbers give the rotation axis and the magnitude of all four is the rotation angle. They are used instead of euler angles because they don't have a singularity problem (gimbal lock). I don't know if this plays a role here. Could these neural networks represent 3D transformations really well?

theoh · on March 18, 2018

Geoffrey Hinton's idea of "capsules", which I don't really know anything concrete about, tries to address the recognition of objects subject to rotations, etc. That's a topological/structural strategy within the neural network, though, so quite removed from an idea like quaternions.

It's worth observing that what distinguishes complex numbers from 2-vectors like (x,y) is that there's a multiplication rule that corresponds to rotation around the origin. Similarly with quaternions. But you can also just use them as glorified vectors of 2 or 4 elements.

See this https://www.technologyreview.com/s/610278/why-even-a-moths-b... for a recent interesting finding on dimensionality.

Myrmornis · on March 18, 2018

I believe you missed a step there. After learning deep learning, GP would want to learn about how complex numbers are an improvement. Only then would one want to consider how quaternions are an improvement.

smadge · on March 18, 2018

I don’t think quaternions would be taught in a typical first course of abstract algebra. Do you know of a textbook where they are featured prominently?

dsacco · on March 18, 2018

I don't know of any book which features them "prominently", but I also don't think you'd really need one. They are taught in various abstract algebra books, they're just taught in the fashion of, "Here's an exercise that introduces a peripheral topic it's useful to know about." For example, groups and rings of quaternions show up in MacLane & Birkhoff's Algebra (62, 426; 282) and Lang's Algebra (9, 545, 723, 758).

Edit: In an effort to find more applied information I put down my math books and picked up the information theoretic ones. You can find more information about the use of quaternions in the two volume Handbook of Digital Signal Processing and Salomon's Data Compression. More generally, when quaternions aren't explicitly referred to it's helpful to look up the coverage of complex rotations, especially with respect to the Discrete Fourier Transform.

For a discussion of rotations with quaternions in the context of animation, this is a reasonably short paper: http://www.cs.cmu.edu/~kiranb/animation/p245-shoemake.pdf.

theoh · on March 18, 2018

https://en.wikibooks.org/wiki/Abstract_Algebra/Quaternions or https://books.google.ie/books?id=ouvZKQiykf4C&pg=PA98&lpg=PA...

I am from Dublin, where quaternions were invented, so they get mentioned a lot by mathematicians and physicists here, maybe getting a higher billing than they do elsewhere. Computer graphics is obviously a place to go for introductions also, but it is typically going to be a more applied and less rigorous treatment.

gnulinux · on March 18, 2018

When I took abstract algebra in Berkeley years ago, they were taught but they weren't the focus of the course. Basically, they're an example of skew field (division ring) so they have some interesting properties that was briefly studied. But obviously, one has to study more to understand their applications.

logimame · on March 18, 2018

Probably some of the introductory level Computer Graphics books. The field uses a lot of quaternions, so they might have a good explanation about it.

cgmg · on March 18, 2018

In summary: Just as the complex numbers are defined by adding a new element i such that i^2 = -1, the quaternions are defined by adding elements i, j, k such that i^2 = j^2 = k^2 = ijk = -1.

akira2501 · on March 18, 2018

Also, see https://en.wikipedia.org/wiki/Cayley%E2%80%93Dickson_constru...

freethemullet · on March 18, 2018

Turns out we can approximate many-body wavefunctions using networks with complex weights: https://arxiv.org/abs/1606.02318.

loxias · on March 17, 2018

Does use of complex numbers really provide improvement? How does this work? (other than cramming 2 numbers into 1... which itself is suspect... the complex plane has the same cardnality as the reals...)

danharaj · on March 18, 2018

Cardinality is a complete red herring here, we don't care about the set theoretic structure, and ultimately we're taking finite approximations anyway. The structures we care about are the metric which tells us which solutions (neural nets in this case) are nearby to each other and the algebra which tells us how to compose solutions.

The algebra of real numbers is simply less structured than the complex numbers. One of the key properties of the complex numbers is that they naturally have both a magnitude and a phase. This lets them capture phenomena that have a notion of superposition and interference.

As you correctly pointed out, you can simulate a complex number with two real numbers. The key is to exploit the particular geometric and algebraic properties of the complexes. One example in neural networks is the phenomenon of synchronization, where the outputs of neurons depending on the presence of a particular stimulus all have the same phase. This can be exploited for applications such as object segmentation.

So the widest possible view of this line of research is that putting more algebraic structure on your parameters can improve the behavior of your learning algorithms. My extremely hot take on how far this can go is a full fledged integration of harmonic analysis and representation theory into the theory of deep learning.

loxias · on March 18, 2018

Ah, fantastic explanation, thanks. =)

I'm coming from a signal processing background, so thinking in terms of magnitude and phase is comfortable to me. Does synchronization, in the sense you're describing, really happen in deep learning (ANN) systems? I'd love a link or reference.

danharaj · on March 18, 2018

https://arxiv.org/pdf/1312.6115.pdf

Myrmornis · on March 18, 2018

Any suggestions for readable introductions to how complex numbers help neural networks?

danharaj · on March 18, 2018

https://arxiv.org/pdf/1312.6115.pdf

There aren't really any introductions, just research papers. If you have some understanding of real valued neural networks you'll probably be able to work your way through the literature.

gnulinux · on March 18, 2018

Cardinality has nothing to do with this discussion. Quarternions have different structure than real numbers. Over every set, you can find any (reasonable) structure you want, but that's not a very interesting question. For example Q and Z have the same cardinality but they look nothing like each other other than the fact that Q is analogous to the fraction field of Euclidean Domain Z.

moultano · on March 17, 2018

It isn't just the size of the space that matters, but how smooth and connected it is.

norrius · on March 18, 2018

Are quaternions more smooth and connected than complex numbers? My understanding was that higher-dimensional hypercomplex numbers tend to lose useful structure. I'm also curious what being connected in this context means.

loxias · on March 18, 2018

I'm still having trouble wrapping my head around how complex numbers are an improvement over the reals. >.<

Jach · on March 18, 2018

Maybe it would help to think of the turing machine as analogous. Many programming languages are Turing complete, you can express any computation in any of them, but some languages are more expressive than others and let you reach and work with ideas you wouldn't conceive in a less expressive language.

Lots of things in math are similar. Simon Altmann's Icons and Symmetries makes a case that using representations with insufficient symmetry impeded our learning of the laws of magnetism.

yorwba · on March 18, 2018

Complex numbers are a particular 2d slice of 2x2 matrices that happen to capture rotation and other periodic phenomena very well. If you are trying to solve some problem that you suspect to involve periodicity, focusing on complex numbers helps you get there faster.

wespiser_2018 · on March 18, 2018

You can use complex numbers number to represent higher dimensional objects using only primitive operations, scalar values, and an imaginary number for each dimension. However, computing with these values is significantly more challenging that real vectors. This book on 'Geometric Algebra' starts to explain: http://www2.montgomerycollege.edu/departments/planet/planet/...

dustingetz · on March 18, 2018

https://en.m.wikipedia.org/wiki/Euler%27s_identity

https://en.m.wikipedia.org/wiki/Euler%27s_formula

rnhmjoj · on March 18, 2018

They have some pretty useful properties. For example every polynomial of degree n has exactly n complex roots and if a complex-valued function is differentiable wrt a complex variable then it's also infinitely differentiable and analytical.

kuwze · on March 18, 2018

I remember being introduced to quaternions recently by this post[0] which recommended this book[1].

[0]: https://www.haroldserrano.com/blog/best-books-to-develop-a-g...

[1]: https://www.amazon.com/Quaternions-Computer-Graphics-John-Vi...

eleitl · on March 18, 2018

This assumes numerics is free at very large scale, which is not reasonable if you want to create efficient biologically inspired AI.

naveen99 · on March 18, 2018

Mathoma videos on geometric algebra: https://youtu.be/ERpcSJzX448

godelmachine · on March 18, 2018

May I ask its applications?

mike_n · on March 17, 2018

if quaternions, why not octonions?

wespiser_2018 · on March 18, 2018

Quaternions are known to represent spatial transforms, and there is a little bit of prior work that demonstrates quaternion filters 'make sense'.

However, octonions are the obvious next step here: if you look at Appendix Figure 1, of "Deep Complex Networks" [1] , the authors authors used (Real + Complex), and Figure 1 of our paper[2] with quaternions uses (Real + Complex + Complex + Complex)!

[1] https://arxiv.org/pdf/1705.09792.pdf [2] https://arxiv.org/pdf/1712.04604.pdf

joppy · on March 18, 2018

I don't think that octonions are the obvious next step here. They are a non-associative algebra, and therefore incredibly difficult to deal with.

Starting with real n-space, one can form the Clifford algebra, which essentially gives a method of multiplying vectors which "knows" something about the length and angle of vectors. The even subalgebra of the Clifford algebra gives a very convenient way of encoding rotations on real n-space. Furthermore, the Clifford algebra is always associative, and works for any n.

If you apply this construction for n=1, 2, 3, you get back the real numbers, complex numbers, and quaternions respectively. If you apply this for n=4, you get back an 8-dimensional associative algebra encoding rotations in 4-space.

AstralStorm · on March 18, 2018

Sounds quite quantum... negative time?

posterboy · on March 18, 2018

phase shift, a shift in phase space.

danharaj · on March 17, 2018

The lack of associativity might suck.

wespiser_2018 · on March 18, 2018

quaternions are associative! Anyway, you can clever code all the algebraic operations of quaternions into matrices and go with that!

gugagore · on March 18, 2018

Yeah, but octonions are not associative. Which is what the poster was saying. (By the way, that also means that there isn't a matrix representation for octonions, since matrix multiplicative is associative).

mike_n · on March 17, 2018

perhaps, but i would think that if you threw another tool (a different way to relate some inputs) at a neural net, it might figure out some way to exploit it, even if it's not clear to you or me. just like a RL agent sometimes finds + exploits bugs in the environment.

wolfgke · on March 18, 2018

The lack of commutativity for quaternion multiplication might suck, too.

danharaj · on March 18, 2018

It doesn't because neural networks are already (matrices + nonlinearity) over the ground field and therefore all but the simplest possible neural networks are non-commutative anyway. Non-associativity is much more troublesome because it sabotages the most important property of neural networks for deep learning, that you can compose them by feeding outputs of one as inputs into another.

Take out the associativity and you've taken out the "deep" in "deep learning".

Edit: On further reflection, the non-commutativity of neural networks is also a crucial component of machine learning. Without it, a neural network can't make a decision at one layer that depends on its decisions at a previous level!

repsilat · on March 18, 2018

I'm not sure I buy this.

If

- your network structure is fixed, and

- you always evaluate the matrix in a given order, and

- you are careful/smart about how you train the weights (Ok, I haven't thought through the ramifications here...)

then I'm not sure you care much about either commutativity or associativity. Maybe a lack of associativity makes backprop impossible, and maybe commutativity makes "Google deep dream" impossible (no idea), but I don't quite agree with the "composability" objection to a lack of associativity and I don't understand the objection to a lack of commutativity sorry.

AstralStorm · on March 18, 2018

The problem is with gradient propagation which requires approximately associative mathematics. You will be training your network something else than you wanted to.

Toy evaluate the network in two orders - forward (use) and backward (training).

varelse · on March 18, 2018

Floating point math itself is commutative, but it's not associative as it is implemented in current processors so you've already lost associativity from the get-go.

danharaj · on March 18, 2018

Floating point's non-associativity is certainly a problem for composability and stability of neural networks, you're right. However this non-associativity is well behaved in the sense that there's an ideal arithmetic we wish to approximate and we have techniques for mitigating the discrepancy between floating point and that arithmetic.

The non-associativity of the octonions is fundamental to their structure, not something to be worked around. In particular, there's no way to consider an octonion-valued network as comprising several layers plugged in serial.

snissn · on March 17, 2018

and to that - why not N-nions for large values of N?

jblow · on March 17, 2018

Because after octonions there isn't anything more.

stephencanon · on March 17, 2018

https://en.wikipedia.org/wiki/Sedenion

But they're pretty weird; very few of the rules you're used to for "numbers" apply. Ditto for the further constructions.

osteele · on March 18, 2018

Many references itemize what properties one gives up with each successive application of the Cayley-Dickson construction:

Complex numbers - lose self-conjugate identity, but satisfies the fundamental theorem of algebra (and can represent 2D points or vectors)

Quaternions - lose commutativity (but can represent 3D rotation, which isn’t commutative)

Octonions - lose associativity, except for each of aab and abb

Sedenions - lose associativity of aab and abb

John Baez’s “This Week's Finds in Mathematical Physics (Week 59)” http://math.ucr.edu/home/baez/week59.html concludes with a letter by Toby Bartels explaining why. An excerpt:

I will prove below that the 2^n onions are a division algebra only if the 2^(n-1) onions are associative. So, the question becomes: why aren't the octonions associative? Well, I've found a proof that 2^n onions are associative only if 2^(n-1) onions are commutative. So, why aren't the quaternions commutative? Again, I have a proof that 2^n onions are commutative only if 2^(n-1) onions equal their own conjugates. So, why don't the complex numbers equal their own conjugates? I have a proof that 2^n onions do equal their own conjugates, but it works only if the 2^(n-1) onions are of characteristic 2. The real numbers are not of characteristic 2, so the complex numbers don't equal their own conjugates, so the quaternions aren't commutative, so the octonions aren't associative, so the hexadecanions aren't a division algebra.

stephencanon · on March 19, 2018

This is great, thanks!

aisofteng · on March 18, 2018

For more on what's different after the octonions, see: https://en.m.wikipedia.org/wiki/Hurwitz%27s_theorem_(composi...

It isn't explicitly stated there, but the statement I recall from algebraic topology is that the octonions are the last normes division algebra.

nabla9 · on March 17, 2018

Cayley–Dickson construction can be carried to infinity.

https://en.wikipedia.org/wiki/Cayley%E2%80%93Dickson_constru...

lomnakkus · on March 17, 2018

Not OP, but...

Awww, I was kinda hoping there'd be ununun-ions.

nabla9 · on March 17, 2018

After octonions come sedenions.

dschuetz · on March 18, 2018

You had me on "The field of deep learning...". Sounds seriously scientific to me. What's the next big flashy field? "Deep thought"? Oh, nope, Douglas Adams already covered that one.