People typically set the weights of a neural network using heuristic approximati...

nyrikki · on Sept 22, 2023

A turing complete system doesn't necessarily mean it's useful, it just means that it's equivalent with a turing machine. The ability to describe any possible algorithm is not that powerful in itself.

As an example, algebraic type systems are often TC simply because general recursion is allowed.

Feed forward networks are effectively DAGs and while you may be able to express any algorithms using them they are also pairwise linear in respect to inputs.

Statistical learning is powerful in finding and matching patterns, but graph rewriting, which is what your doing with initial random weights and training is not trivial.

More importantly it doesn't make issues like the halting problem decidable.

I don't see why the same limits in graph rewriting languages which were explored in the 90s won't hit using feed forward networks as computation systems outside of the application of nation-state scale computing power.

But I am open to understanding where I am wrong.

shawntan · on Sept 22, 2023

Short version of this without the caveats: It's not even Turing complete.

I review a few papers on the topic here: https://blog.wtf.sg/posts/2023-02-03-the-new-xor-problem/

Legend2440 · on Sept 22, 2023

The point of training is to create computer programs through optimization, because there are many problems (like understanding language) that we just don't know how to write programs to do.

It's not that we don't know how to set the weights - neural networks are only designed with weights because it makes them easy to optimize.

There is no reason to use them if you plan to write your own code for them. You won't be able to do anything that you couldn't do in a normal programming language, because what makes NNs special is the training process.

PartiallyTyped · on Sept 22, 2023

Why would you do that when it's better to do the opposite? Given a model quantize it and compile it to direct code objects that do the same thing much much much faster?

The generality of the approach [NNs] implies that they are effectively a union of all programs that may be represented, and as such there needs to be the capacity for that, this capacity is in size, which makes them wasteful for exact solutions.

it is fairly trivial to create FFNNs that behave as decision trees using just relus if you can encode your problem as a continuous problem with a finite set of inputs. Then you can very well say that this decision tree is, well, a program, and there you have it.

The actual problem is the encoding, which is why NNs are so powerful, that is, they learn the encodings themselves through grad descent and variants.

justo-rivera · on Sept 22, 2023

That makes no entropy or cyber-netic sense at all. You would just get a neural network that outputs the exact formula, or algo. Like, if you would just do a sine it would be a taylor series encoded into neurons

Its like going from computing PI as a constant to computing it as a giantic float.

You lose info

shawntan · on Sept 22, 2023

Something close exists:

RASP https://arxiv.org/abs/2106.06981

python implementation: https://srush.github.io/raspy/