One thing people might not realize (I'm not sure how obvious it is) is that these renders depend strongly on the statistics of the training data used for the ConvNet. In particular you're seeing a lot of dog faces because there is a large number of dog classes in the ImageNet dataset (several hundred classes out of 1000 are dogs), so the ConvNet allocates a lot of its capacity to worrying about their fine-grained features.
In particular, if you train ConvNets on other data you will get very different hallucinations. It might be interesting to train (or even fine-tune) the networks on different data and see how the results vary. For example, different medical datasets, or datasets made entirely of faces (e.g. Faces in the Wild data), galaxies, etc.
It's also possible to take Image Captioning models and use the same idea to hallucinate images that are very likely for some specific sentence. There are a lot of fun ideas to play with.
Since you have this all set up, can you make some feedback loop animations for example with zooming? Or apply this to each frame of a movie? For example something famous like Charlie Bit My Finger. Hopefully using the deeper more horrifying setting.
The visuals generated by the neural network remind me of visuals experienced under the influence of psilocybin or LSD. I wonder if I am making an unjust leap or if there is a similar organic process (searching for familiar patterns) taking place in the mind? Fascinating, thanks for sharing.
No hypothesis is unjust! It could also be related some of the experiences people have in sensory deprivation tanks. Your brain attempting to see structure in noise and hallucinates. One hypothesis would be that on LSD, and other psychoactive substances, this feedback loop is somehow enhanced. There might be a few doctorates to be earned in testing these hypotheses.
"Be careful running the code above, it can bring you into very strange realms!"
Reminds me of Charlie Stross's new novel,
"A brief recap: magic is the name given to the practice of manipulating the ultrastructure of reality by carrying out mathematical operations. We live in a multiverse, and certain operators trigger echoes in the Platonic realm of mathematical truth, echoes which can be amplified and fed back into our (and other) realities. Computers, being machines for executing mathematical operations at very high speed, are useful to us as occult engines. Likewise, some of us have the ability to carry out magical operations in our own heads, albeit at terrible cost."
You might also like Shadowfist (http://shadowfist.com), a card game that used to have the Purists, a playable faction powered by esoteric, math-centric magic.
Great, I got the dependencies installed on OSX and I'm already monsterifying a head shot for LinkedIn. Now, to find a way to get this working in real time with a webcam...
That guide is mostly correct, assuming your reply here means that you're also using OS X. It should get you all the way to a working Caffe install. The one thing it doesn't get right is that your PYTHON_INCLUDE and PYTHON_LIB variables should both point to the relevant folders from your Homebrew Python install (I had no luck attempting to compile pycaffe against either Anaconda or system Python, both would just segfault when I imported the module). In my case, that was (assuming you've already installed numpy with Homebrew pip):
PYTHON_LIB is exactly as it is in the example Makefile.config on that page, except adjusted for version number if you've installed Python via Homebrew since 2.7.10 was released.
Amazing that it easily runs on consumer hardware, this dispels suspicions that a Google cluster was necessary for these results.
I'm wondering if it's possible to use this with a model that was trained on a database without labels, just pictures. Is such a thing even possible? For this particular application, labeling and categories are ultimately superfluous, but are they required in order to get there?
A simpler version of this idea (making an image A out of matching pieces of a set of images B) was implemented in the early 90s and released as open source: http://draves.org/fuse/
I always wonder why sometimes the system finds faces and other elements in essentially untextured / homogeneous parts of images. Wouldn't there be some sort of "data term" in the energy functional that would suppress these results and/or move them to other parts of the image?
Perhaps this is working entirely differently and I'm thinking too much in the classical computer vision realm. Would love some explanation though.
Does anyone know if this technique can be used to slurp up a database and produce "typical" records for populating a test database? This is a problem that I struggled with a few years ago and still haven't found a good automated solution.
Could you refine your question? This is a post about image processing via neural network. Do you mean take an existing database, learn via neural network, and populate a fresh one with "learned" attributes?
yes that sounds correct. I'm thinking of something where I take an existing db and train a NN on it, then populate a test db with things like "typical account", "typical delinquent account", etc.
This db could then be used for automated testing.
I have seen approaches like Factory Girl in rails but the new rows are just incrementing fields somehow.
Another approach would be to model a column statistically then generate random values that conform to that model.
I'm thinking of something that is so general it can find and model relationships in a db. For example it should be able to see that most people have 2 or 3 credit cards on file and generate test data that is like that. This may not be a problem for NN but the idea of running the networks backwards and "imagining" things it has learned seems like a good fit.
I have played around with markov chains trained on a first + last name that could generate made up names but that is as far as I got with it.
In particular, if you train ConvNets on other data you will get very different hallucinations. It might be interesting to train (or even fine-tune) the networks on different data and see how the results vary. For example, different medical datasets, or datasets made entirely of faces (e.g. Faces in the Wild data), galaxies, etc.
It's also possible to take Image Captioning models and use the same idea to hallucinate images that are very likely for some specific sentence. There are a lot of fun ideas to play with.