So how is this more powerful than normal programming?

joshmarlow · on April 9, 2020

My read is that this is a technique that can make it easier to write certain types of machine learning applications - which means less coding, more giving the application data so that it can fine-tune it's behavior. There are also applications we don't know how to write, but we can train a program (a machine learning model) to behave that way.

There would be domains where this would be a major advantage and (probably) applications where it wouldn't be particularly helpful.

Best to think of it as another tool in the engineering toolbox.

withtypes · on April 9, 2020

Obviously "more powerful" doesn't imply the capability of solving the halting problem. What exactly do you want to say? You are not impressed with "differentiable programming"?

yters · on April 9, 2020

It is misnamed. Makes it sound like some kind of Turing complete approach, like when DeepMind invented differentiable Turing machines. Tgis post just describes differentiating mathematical functions, which has been around forever, and you can find in matlab, mathematica, python libs, explained in CISCP, etc. Why doesn't Google just buy a Matlab license and be done with it? Why bother embedding in a whole new programming language?

ddragon · on April 9, 2020

Technically, RNNs, including LSTMs, are already turing complete, neural turing machines mostly decoupled memory from the hidden layers so it can grow the memory size without a square increase on the number of parameters, which helps with the unbounded part of turing machines. it also helped inspire many attention based models that came after. Also matlab isn't really related here, and automatic differentiation is different from symbolic differentiation, the direct comparison are those python libs like tensorflow, pytorch and jax.

Differentiable programming in the end is just a way of making something that you can already do better (just like you could create neural network long before theano/tensorflow/torch but it was not as streamlined). With a differentiable programming approach you can get something as dynamic as pytorch, with the performance optimizations and deploy capabilities of a tensorflow graph and with an easy way to plug any new operation and it's gradient by writing in the same host language (so no need to learn or restrict yourself the tensorflow/pytorch defined methods/DSL).

You don't even need to change the compiler or define a new language for it. Julia's Zygote [1] is just a 100% Julia library you can import, for which you can add at any point any custom gradient even if the library creators never added them, and them run either on CPU or GPU (for which you can also fully extend using just pure Julia [2]). And of course, you can also use a higher level framework like Flux [3] which is also high level Julia code.

I think the heart of differentiable programming is just another step in the evolution, from early (lua) torch-like libraries that gave you the high level blocks to compose, to autodiff libraries that gave you easy access to low level math operators to build the blocks to the point where you can easily create your own operators to create the high level blocks.

[1] https://github.com/FluxML/Zygote.jl

[2] https://github.com/JuliaGPU/CUDAnative.jl

[3] https://github.com/FluxML/Flux.jl