Quaternions are known to represent spatial transforms, and there is a little bit of prior work that demonstrates quaternion filters 'make sense'.
However, octonions are the obvious next step here: if you look at Appendix Figure 1, of "Deep Complex Networks" [1] , the authors authors used (Real + Complex), and Figure 1 of our paper[2] with quaternions uses (Real + Complex + Complex + Complex)!
I don't think that octonions are the obvious next step here. They are a non-associative algebra, and therefore incredibly difficult to deal with.
Starting with real n-space, one can form the Clifford algebra, which essentially gives a method of multiplying vectors which "knows" something about the length and angle of vectors. The even subalgebra of the Clifford algebra gives a very convenient way of encoding rotations on real n-space. Furthermore, the Clifford algebra is always associative, and works for any n.
If you apply this construction for n=1, 2, 3, you get back the real numbers, complex numbers, and quaternions respectively. If you apply this for n=4, you get back an 8-dimensional associative algebra encoding rotations in 4-space.
Yeah, but octonions are not associative. Which is what the poster was saying. (By the way, that also means that there isn't a matrix representation for octonions, since matrix multiplicative is associative).
perhaps, but i would think that if you threw another tool (a different way to relate some inputs) at a neural net, it might figure out some way to exploit it, even if it's not clear to you or me. just like a RL agent sometimes finds + exploits bugs in the environment.
It doesn't because neural networks are already (matrices + nonlinearity) over the ground field and therefore all but the simplest possible neural networks are non-commutative anyway. Non-associativity is much more troublesome because it sabotages the most important property of neural networks for deep learning, that you can compose them by feeding outputs of one as inputs into another.
Take out the associativity and you've taken out the "deep" in "deep learning".
Edit: On further reflection, the non-commutativity of neural networks is also a crucial component of machine learning. Without it, a neural network can't make a decision at one layer that depends on its decisions at a previous level!
- you always evaluate the matrix in a given order, and
- you are careful/smart about how you train the weights (Ok, I haven't thought through the ramifications here...)
then I'm not sure you care much about either commutativity or associativity. Maybe a lack of associativity makes backprop impossible, and maybe commutativity makes "Google deep dream" impossible (no idea), but I don't quite agree with the "composability" objection to a lack of associativity and I don't understand the objection to a lack of commutativity sorry.
The problem is with gradient propagation which requires approximately associative mathematics. You will be training your network something else than you wanted to.
Toy evaluate the network in two orders - forward (use) and backward (training).
Floating point math itself is commutative, but it's not associative as it is implemented in current processors so you've already lost associativity from the get-go.
Floating point's non-associativity is certainly a problem for composability and stability of neural networks, you're right. However this non-associativity is well behaved in the sense that there's an ideal arithmetic we wish to approximate and we have techniques for mitigating the discrepancy between floating point and that arithmetic.
The non-associativity of the octonions is fundamental to their structure, not something to be worked around. In particular, there's no way to consider an octonion-valued network as comprising several layers plugged in serial.
Many references itemize what properties one gives up with each successive application of the Cayley-Dickson construction:
Complex numbers - lose self-conjugate identity, but satisfies the fundamental theorem of algebra (and can represent 2D points or vectors)
Quaternions - lose commutativity (but can represent 3D rotation, which isn’t commutative)
Octonions - lose associativity, except for each of aab and abb
Sedenions - lose associativity of aab and abb
John Baez’s “This Week's Finds in Mathematical Physics (Week 59)” http://math.ucr.edu/home/baez/week59.html concludes with a letter by Toby Bartels explaining why. An excerpt:
I will prove below that the 2^n onions are a division algebra
only if the 2^(n-1) onions are associative.
So, the question becomes: why aren't the octonions associative?
Well, I've found a proof that 2^n onions are associative
only if 2^(n-1) onions are commutative.
So, why aren't the quaternions commutative?
Again, I have a proof that 2^n onions are commutative
only if 2^(n-1) onions equal their own conjugates.
So, why don't the complex numbers equal their own conjugates?
I have a proof that 2^n onions do equal their own conjugates,
but it works only if the 2^(n-1) onions are of characteristic 2.
The real numbers are not of characteristic 2,
so the complex numbers don't equal their own conjugates,
so the quaternions aren't commutative,
so the octonions aren't associative,
so the hexadecanions aren't a division algebra.