Lots of interesting things I'd want to try here. Eg, the formulation of thinking of your probability distribution as a charge distribution opens up the possibility of adding negative charges. So you could have a regular positive data examples, but also some negative examples that the generative model tries extra hard to avoid.
I'd also be interested to see what happens when you "add the fields" for two different distributions over the same space. Seems like it would be interestingly different than regular superposition, and I'd really like to see what happens when you add together a dog image field to a cat image field.
I guess the current formulation where you only get the direction of the electric field and not the magnitude would cause problems there. Seems like an improvement might be to predict the actual potential instead of the field and then take the gradient. That would also guarantee that the field is curl-free, which it should be, but it doesn't seem to be enforced in the current paper. I think they don't try this because it's hard to get NNs to produce numbers across a wide range of orders of magnitude. Maybe separately predicting the normalized direction and something like the log of the magnitude would help here.
I had this thought earlier when thinking about negative prompts in SD. Also, it would be really cool to plug in a PFGM as the base in Imagen while maintaining Diffusion Models for the super-resolution chain
As for the addition, in principle it would be as simple as regular superposition given that maxwell's equations in free space are a linear system! Would be very interesting indeed though!
All the novelty appears to boil down to "this is a new sampler for diffusion-style models".
Apparently, they are inspired by a different physical process. But that doesn't change the fact that they start with noise and then iteratively solve a differential equation to get to its end point. 2nd-order samplers like Heun already massively increased sampling speed over the original diffusion models, but sadly the article doesn't compare to that but merely to the original 1st-order samplers. So it might be that this one doesn't even create a speedup in practice.
Does anyone else have more info on how this is different from "traditional" numerical differential-equation-based de-noising?
The big this seems to be that their noising/denoising process is invertible (to some extent) so it can be used to generate uncertainties, which is something the article says diffusion models can't do. The intuition is that the "flow" process used here is reversible (imagine going backwards in time and watching a cloud of charged particles coalescing into a bunch), which gives some nice properties that random blurring used in more common diffusion models does not.
I think it's the reversible flow part that's important for the results, not the connection to the physical electrostatics system. It just happens to be that electrostatics have this nice flow behavior too and are a pretty approachable analogy for what the model's doing.
I also noticed that they present it as if other diffusion models were not invertible, but they are. Also, they appear to be using DDPM++, the exact same neural network architecture as Stable Diffusion?
Article says: "to train the neural network, which in this case is a U-Net (DDPM++ backbone)."
Also it's kinda weird that they show videos generating images from noise (like Imagen / Stable Diffusion) but don't cite any recent diffusion paper.
EDIT: In fact they don't cite ANY 2022 paper. So my guess would be that they submitted this last year and now it's been made public because now the NeurIPS 2022 conference is taking place. But most likely, the actual research here predates the Stable Diffusion release.
There is a stochastic element to diffusion models which is not present in PFGMs. Further, recall that Diffusion Models and PFGMs alike are "metamodels" - mathematical frameworks in which any function approximator can, in theory, be placed. They are not an architecture. In this case, the authors use the DDPM++ backbone which is the specific function approximator that they implement in their PFGM framework.
> PFGMs constitute an exciting foundation for new avenues of research, especially given that they are 10-20 times faster than Diffusion Models on image generation tasks, with comparable performance.
hard to imagine anyone sticking with diffusion models if this is completely true
Lots of interesting things I'd want to try here. Eg, the formulation of thinking of your probability distribution as a charge distribution opens up the possibility of adding negative charges. So you could have a regular positive data examples, but also some negative examples that the generative model tries extra hard to avoid.
I'd also be interested to see what happens when you "add the fields" for two different distributions over the same space. Seems like it would be interestingly different than regular superposition, and I'd really like to see what happens when you add together a dog image field to a cat image field.
I guess the current formulation where you only get the direction of the electric field and not the magnitude would cause problems there. Seems like an improvement might be to predict the actual potential instead of the field and then take the gradient. That would also guarantee that the field is curl-free, which it should be, but it doesn't seem to be enforced in the current paper. I think they don't try this because it's hard to get NNs to produce numbers across a wide range of orders of magnitude. Maybe separately predicting the normalized direction and something like the log of the magnitude would help here.