The problem is with gradient propagation which requires approximately associative mathematics. You will be training your network something else than you wanted to.
Toy evaluate the network in two orders - forward (use) and backward (training).
Floating point math itself is commutative, but it's not associative as it is implemented in current processors so you've already lost associativity from the get-go.
Floating point's non-associativity is certainly a problem for composability and stability of neural networks, you're right. However this non-associativity is well behaved in the sense that there's an ideal arithmetic we wish to approximate and we have techniques for mitigating the discrepancy between floating point and that arithmetic.
The non-associativity of the octonions is fundamental to their structure, not something to be worked around. In particular, there's no way to consider an octonion-valued network as comprising several layers plugged in serial.
Toy evaluate the network in two orders - forward (use) and backward (training).