The problem is with gradient propagation which requires approximately associativ...

varelse · on March 18, 2018

Floating point math itself is commutative, but it's not associative as it is implemented in current processors so you've already lost associativity from the get-go.

danharaj · on March 18, 2018

Floating point's non-associativity is certainly a problem for composability and stability of neural networks, you're right. However this non-associativity is well behaved in the sense that there's an ideal arithmetic we wish to approximate and we have techniques for mitigating the discrepancy between floating point and that arithmetic.

The non-associativity of the octonions is fundamental to their structure, not something to be worked around. In particular, there's no way to consider an octonion-valued network as comprising several layers plugged in serial.