Yeah, but octonions are not associative. Which is what the poster was saying. (By the way, that also means that there isn't a matrix representation for octonions, since matrix multiplicative is associative).
perhaps, but i would think that if you threw another tool (a different way to relate some inputs) at a neural net, it might figure out some way to exploit it, even if it's not clear to you or me. just like a RL agent sometimes finds + exploits bugs in the environment.
It doesn't because neural networks are already (matrices + nonlinearity) over the ground field and therefore all but the simplest possible neural networks are non-commutative anyway. Non-associativity is much more troublesome because it sabotages the most important property of neural networks for deep learning, that you can compose them by feeding outputs of one as inputs into another.
Take out the associativity and you've taken out the "deep" in "deep learning".
Edit: On further reflection, the non-commutativity of neural networks is also a crucial component of machine learning. Without it, a neural network can't make a decision at one layer that depends on its decisions at a previous level!
- you always evaluate the matrix in a given order, and
- you are careful/smart about how you train the weights (Ok, I haven't thought through the ramifications here...)
then I'm not sure you care much about either commutativity or associativity. Maybe a lack of associativity makes backprop impossible, and maybe commutativity makes "Google deep dream" impossible (no idea), but I don't quite agree with the "composability" objection to a lack of associativity and I don't understand the objection to a lack of commutativity sorry.
The problem is with gradient propagation which requires approximately associative mathematics. You will be training your network something else than you wanted to.
Toy evaluate the network in two orders - forward (use) and backward (training).
Floating point math itself is commutative, but it's not associative as it is implemented in current processors so you've already lost associativity from the get-go.
Floating point's non-associativity is certainly a problem for composability and stability of neural networks, you're right. However this non-associativity is well behaved in the sense that there's an ideal arithmetic we wish to approximate and we have techniques for mitigating the discrepancy between floating point and that arithmetic.
The non-associativity of the octonions is fundamental to their structure, not something to be worked around. In particular, there's no way to consider an octonion-valued network as comprising several layers plugged in serial.