Hacker News new | past | comments | ask | show | jobs | submit login
Deep Quaternion Networks (2017) [pdf] (arxiv.org)
151 points by adamnemecek on March 17, 2018 | hide | past | favorite | 73 comments



There's a lot of space to be explored at the intersection of ml and hypercomplex numbers. There's a Clifford svm that unlike a regular sam which learns a hyperplane, learns any manifold.


That's also what I thought. This is a dissertation on multilayer perceptrons using backprop over Clifford type neurons by Sven Buchholz: https://www.informatik.uni-kiel.de/inf/Sommer/doc/Dissertati...


[flagged]


Look into work by Bayro-Corrochano. I can't find a public link.



The fine article generalizes complex numbers to quaternions. Okay.

But quaternions are themselves generalized by Geometric Algebra. And there is a plenty of information about use of GA in the field of neural computing: https://arxiv.org/pdf/1305.5663.pdf (page 3). For example, universal approximation theorem for GA is presented at https://www.informatik.uni-kiel.de/inf/Sommer/doc/Dissertati...

I think that fine article is a step back.


Thanks for sharing this. There's a lot to digest in there, but there were a few highlights that stood out as possibly relevant to the OP paper.

> Theorem 6.4 ([2]) Complex FCMLPs having (6.9) as activation function are only universal approximators in L∞ for the class of analytic functions, but not for the class of complex continuous functions.

> ... the complex numbers (C0,1) are a subalgebra of the quaternions (C0,2). Hence the quaternionic logistic function is also unbounded. Neither could it give rise to universal approximation (w.r.t. L∞) since this does not hold for the complex case. One may argue that such things become more and more less important when proceeding to higher dimensional algebras since less and less components are affected. This is somehow true, but it hardly justify the efforts.

> ... Summarising all of the above the case of Complex FCMLPs looks settled down in a negative way. ... Hence Complex FCMLPs remain not very promising.

Unless I'm misreading, it seems already known that you _can_ use complex numbers (or quaternions) in neural networks...but you don't really gain anything from doing it.


One of the authors here. One thing about quaternion convolution is that you can write a color image into quaternion space by considering each channel as an imaginary axis. This lets the convolution act on the entire color space in a different way compared to real-valued networks, which may it do better for things like segmentation where you need to be more sensitive to changes in the color space.


geometric algebra is the Rust of mathematics. You guys cannot simply let people do their work without trying to evangelize ?


In my opinion, GA is Haskell of mathematics. Just like Haskell subsumes Rust as a library (OCaml does [1], thus Haskell does too), GA subsumes quaternions (which are Rust here). Using subpar tool is not a good practice if better tool is readily available.

And it is better tooling what I am trying to advertise here.

[1] https://news.ycombinator.com/item?id=16597329


less general does not necessarily mean subpar. On the contrary.


I skipped to the table at the end. The gains don't seem enormous. Is there a kind of problem where we would expect quaternions to perform dramatically better than other kinds of numbers?


Not to mention that it appears they're comparing against networks of the same architecture. If you build your quaternion components with with same number types as your reals you effectively have 4 times the number of parameters, which could be most of the benefit. They should also benchmark against similar architectures with equivalent parameter counts.


Hi I'm one of the authors of this paper. Sorry if it unclear, but we reduce the number of filters per layer to account for this. The quaternion networks actually have fewer parameters.


Can you provide some technical details on what you do? Do you divide the number of channels on each real layered network by 4? I don't see anything describing this in the paper.


Yes that is exactly what we do.


This paper was mostly to lay out the framework and give out the Keras layers for others to use. We expect that the biggest improvements will come from segmentation where the gains may come from treating each color channel as an imaginary axis. And from architectures like PointNet https://arxiv.org/abs/1612.00593.


I’m able to follow neither the article nor the discussion. What would I have to learn in order to be able to?

Even if it was, like, years of studying. I’m just curious how deep this rabbit hole is.


Quaternions are an extension of the idea of complex numbers. Complex numbers have a real and an imaginary part, while quaternions have a real part and multiple imaginary parts (3). So the basic idea is that these richer types of number, when used to build a network (instead of plain real numbers) have benefits.

So to get started with reading this paper you just need to learn about deep learning, and then also the very basics of quaternions, which would be taught in, for example, a first course on abstract algebra.


Quaternions are used to describe a rotation in 3D-space. Three numbers give the rotation axis and the magnitude of all four is the rotation angle. They are used instead of euler angles because they don't have a singularity problem (gimbal lock). I don't know if this plays a role here. Could these neural networks represent 3D transformations really well?


Geoffrey Hinton's idea of "capsules", which I don't really know anything concrete about, tries to address the recognition of objects subject to rotations, etc. That's a topological/structural strategy within the neural network, though, so quite removed from an idea like quaternions.

It's worth observing that what distinguishes complex numbers from 2-vectors like (x,y) is that there's a multiplication rule that corresponds to rotation around the origin. Similarly with quaternions. But you can also just use them as glorified vectors of 2 or 4 elements.

See this https://www.technologyreview.com/s/610278/why-even-a-moths-b... for a recent interesting finding on dimensionality.


I believe you missed a step there. After learning deep learning, GP would want to learn about how complex numbers are an improvement. Only then would one want to consider how quaternions are an improvement.


I don’t think quaternions would be taught in a typical first course of abstract algebra. Do you know of a textbook where they are featured prominently?


I don't know of any book which features them "prominently", but I also don't think you'd really need one. They are taught in various abstract algebra books, they're just taught in the fashion of, "Here's an exercise that introduces a peripheral topic it's useful to know about." For example, groups and rings of quaternions show up in MacLane & Birkhoff's Algebra (62, 426; 282) and Lang's Algebra (9, 545, 723, 758).

Edit: In an effort to find more applied information I put down my math books and picked up the information theoretic ones. You can find more information about the use of quaternions in the two volume Handbook of Digital Signal Processing and Salomon's Data Compression. More generally, when quaternions aren't explicitly referred to it's helpful to look up the coverage of complex rotations, especially with respect to the Discrete Fourier Transform.

For a discussion of rotations with quaternions in the context of animation, this is a reasonably short paper: http://www.cs.cmu.edu/~kiranb/animation/p245-shoemake.pdf.


https://en.wikibooks.org/wiki/Abstract_Algebra/Quaternions or https://books.google.ie/books?id=ouvZKQiykf4C&pg=PA98&lpg=PA...

I am from Dublin, where quaternions were invented, so they get mentioned a lot by mathematicians and physicists here, maybe getting a higher billing than they do elsewhere. Computer graphics is obviously a place to go for introductions also, but it is typically going to be a more applied and less rigorous treatment.


When I took abstract algebra in Berkeley years ago, they were taught but they weren't the focus of the course. Basically, they're an example of skew field (division ring) so they have some interesting properties that was briefly studied. But obviously, one has to study more to understand their applications.


Probably some of the introductory level Computer Graphics books. The field uses a lot of quaternions, so they might have a good explanation about it.


In summary: Just as the complex numbers are defined by adding a new element i such that i^2 = -1, the quaternions are defined by adding elements i, j, k such that i^2 = j^2 = k^2 = ijk = -1.



Turns out we can approximate many-body wavefunctions using networks with complex weights: https://arxiv.org/abs/1606.02318.


Does use of complex numbers really provide improvement? How does this work? (other than cramming 2 numbers into 1... which itself is suspect... the complex plane has the same cardnality as the reals...)


Cardinality is a complete red herring here, we don't care about the set theoretic structure, and ultimately we're taking finite approximations anyway. The structures we care about are the metric which tells us which solutions (neural nets in this case) are nearby to each other and the algebra which tells us how to compose solutions.

The algebra of real numbers is simply less structured than the complex numbers. One of the key properties of the complex numbers is that they naturally have both a magnitude and a phase. This lets them capture phenomena that have a notion of superposition and interference.

As you correctly pointed out, you can simulate a complex number with two real numbers. The key is to exploit the particular geometric and algebraic properties of the complexes. One example in neural networks is the phenomenon of synchronization, where the outputs of neurons depending on the presence of a particular stimulus all have the same phase. This can be exploited for applications such as object segmentation.

So the widest possible view of this line of research is that putting more algebraic structure on your parameters can improve the behavior of your learning algorithms. My extremely hot take on how far this can go is a full fledged integration of harmonic analysis and representation theory into the theory of deep learning.


Ah, fantastic explanation, thanks. =)

I'm coming from a signal processing background, so thinking in terms of magnitude and phase is comfortable to me. Does synchronization, in the sense you're describing, really happen in deep learning (ANN) systems? I'd love a link or reference.



Any suggestions for readable introductions to how complex numbers help neural networks?


https://arxiv.org/pdf/1312.6115.pdf

There aren't really any introductions, just research papers. If you have some understanding of real valued neural networks you'll probably be able to work your way through the literature.


Cardinality has nothing to do with this discussion. Quarternions have different structure than real numbers. Over every set, you can find any (reasonable) structure you want, but that's not a very interesting question. For example Q and Z have the same cardinality but they look nothing like each other other than the fact that Q is analogous to the fraction field of Euclidean Domain Z.


It isn't just the size of the space that matters, but how smooth and connected it is.


Are quaternions more smooth and connected than complex numbers? My understanding was that higher-dimensional hypercomplex numbers tend to lose useful structure. I'm also curious what being connected in this context means.


I'm still having trouble wrapping my head around how complex numbers are an improvement over the reals. >.<


Maybe it would help to think of the turing machine as analogous. Many programming languages are Turing complete, you can express any computation in any of them, but some languages are more expressive than others and let you reach and work with ideas you wouldn't conceive in a less expressive language.

Lots of things in math are similar. Simon Altmann's Icons and Symmetries makes a case that using representations with insufficient symmetry impeded our learning of the laws of magnetism.


Complex numbers are a particular 2d slice of 2x2 matrices that happen to capture rotation and other periodic phenomena very well. If you are trying to solve some problem that you suspect to involve periodicity, focusing on complex numbers helps you get there faster.


You can use complex numbers number to represent higher dimensional objects using only primitive operations, scalar values, and an imaginary number for each dimension. However, computing with these values is significantly more challenging that real vectors. This book on 'Geometric Algebra' starts to explain: http://www2.montgomerycollege.edu/departments/planet/planet/...



They have some pretty useful properties. For example every polynomial of degree n has exactly n complex roots and if a complex-valued function is differentiable wrt a complex variable then it's also infinitely differentiable and analytical.


I remember being introduced to quaternions recently by this post[0] which recommended this book[1].

[0]: https://www.haroldserrano.com/blog/best-books-to-develop-a-g...

[1]: https://www.amazon.com/Quaternions-Computer-Graphics-John-Vi...


This assumes numerics is free at very large scale, which is not reasonable if you want to create efficient biologically inspired AI.


Mathoma videos on geometric algebra: https://youtu.be/ERpcSJzX448


May I ask its applications?


if quaternions, why not octonions?


Quaternions are known to represent spatial transforms, and there is a little bit of prior work that demonstrates quaternion filters 'make sense'.

However, octonions are the obvious next step here: if you look at Appendix Figure 1, of "Deep Complex Networks" [1] , the authors authors used (Real + Complex), and Figure 1 of our paper[2] with quaternions uses (Real + Complex + Complex + Complex)!

[1] https://arxiv.org/pdf/1705.09792.pdf [2] https://arxiv.org/pdf/1712.04604.pdf


I don't think that octonions are the obvious next step here. They are a non-associative algebra, and therefore incredibly difficult to deal with.

Starting with real n-space, one can form the Clifford algebra, which essentially gives a method of multiplying vectors which "knows" something about the length and angle of vectors. The even subalgebra of the Clifford algebra gives a very convenient way of encoding rotations on real n-space. Furthermore, the Clifford algebra is always associative, and works for any n.

If you apply this construction for n=1, 2, 3, you get back the real numbers, complex numbers, and quaternions respectively. If you apply this for n=4, you get back an 8-dimensional associative algebra encoding rotations in 4-space.


Sounds quite quantum... negative time?


phase shift, a shift in phase space.


The lack of associativity might suck.


quaternions are associative! Anyway, you can clever code all the algebraic operations of quaternions into matrices and go with that!


Yeah, but octonions are not associative. Which is what the poster was saying. (By the way, that also means that there isn't a matrix representation for octonions, since matrix multiplicative is associative).


perhaps, but i would think that if you threw another tool (a different way to relate some inputs) at a neural net, it might figure out some way to exploit it, even if it's not clear to you or me. just like a RL agent sometimes finds + exploits bugs in the environment.


The lack of commutativity for quaternion multiplication might suck, too.


It doesn't because neural networks are already (matrices + nonlinearity) over the ground field and therefore all but the simplest possible neural networks are non-commutative anyway. Non-associativity is much more troublesome because it sabotages the most important property of neural networks for deep learning, that you can compose them by feeding outputs of one as inputs into another.

Take out the associativity and you've taken out the "deep" in "deep learning".

Edit: On further reflection, the non-commutativity of neural networks is also a crucial component of machine learning. Without it, a neural network can't make a decision at one layer that depends on its decisions at a previous level!


I'm not sure I buy this.

If

- your network structure is fixed, and

- you always evaluate the matrix in a given order, and

- you are careful/smart about how you train the weights (Ok, I haven't thought through the ramifications here...)

then I'm not sure you care much about either commutativity or associativity. Maybe a lack of associativity makes backprop impossible, and maybe commutativity makes "Google deep dream" impossible (no idea), but I don't quite agree with the "composability" objection to a lack of associativity and I don't understand the objection to a lack of commutativity sorry.


The problem is with gradient propagation which requires approximately associative mathematics. You will be training your network something else than you wanted to.

Toy evaluate the network in two orders - forward (use) and backward (training).


Floating point math itself is commutative, but it's not associative as it is implemented in current processors so you've already lost associativity from the get-go.


Floating point's non-associativity is certainly a problem for composability and stability of neural networks, you're right. However this non-associativity is well behaved in the sense that there's an ideal arithmetic we wish to approximate and we have techniques for mitigating the discrepancy between floating point and that arithmetic.

The non-associativity of the octonions is fundamental to their structure, not something to be worked around. In particular, there's no way to consider an octonion-valued network as comprising several layers plugged in serial.


and to that - why not N-nions for large values of N?


Because after octonions there isn't anything more.


https://en.wikipedia.org/wiki/Sedenion

But they're pretty weird; very few of the rules you're used to for "numbers" apply. Ditto for the further constructions.


Many references itemize what properties one gives up with each successive application of the Cayley-Dickson construction:

Complex numbers - lose self-conjugate identity, but satisfies the fundamental theorem of algebra (and can represent 2D points or vectors)

Quaternions - lose commutativity (but can represent 3D rotation, which isn’t commutative)

Octonions - lose associativity, except for each of aab and abb

Sedenions - lose associativity of aab and abb

John Baez’s “This Week's Finds in Mathematical Physics (Week 59)” http://math.ucr.edu/home/baez/week59.html concludes with a letter by Toby Bartels explaining why. An excerpt:

I will prove below that the 2^n onions are a division algebra only if the 2^(n-1) onions are associative. So, the question becomes: why aren't the octonions associative? Well, I've found a proof that 2^n onions are associative only if 2^(n-1) onions are commutative. So, why aren't the quaternions commutative? Again, I have a proof that 2^n onions are commutative only if 2^(n-1) onions equal their own conjugates. So, why don't the complex numbers equal their own conjugates? I have a proof that 2^n onions do equal their own conjugates, but it works only if the 2^(n-1) onions are of characteristic 2. The real numbers are not of characteristic 2, so the complex numbers don't equal their own conjugates, so the quaternions aren't commutative, so the octonions aren't associative, so the hexadecanions aren't a division algebra.


This is great, thanks!


For more on what's different after the octonions, see: https://en.m.wikipedia.org/wiki/Hurwitz%27s_theorem_(composi...

It isn't explicitly stated there, but the statement I recall from algebraic topology is that the octonions are the last normes division algebra.


Cayley–Dickson construction can be carried to infinity.

https://en.wikipedia.org/wiki/Cayley%E2%80%93Dickson_constru...


Not OP, but...

Awww, I was kinda hoping there'd be ununun-ions.


After octonions come sedenions.


You had me on "The field of deep learning...". Sounds seriously scientific to me. What's the next big flashy field? "Deep thought"? Oh, nope, Douglas Adams already covered that one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: