It's showing a visualization of all the intermediate activations of the style transfer network. The intermediate pictures are 4D, so they're visualized as a sequence of tiles.
There's a sequence of 9x9 and 3x3 convolutions that transforms that one big input image into a bunch of smaller images. They're processed by a sequence of residual convolutions. Finally, these tiny tiles are merged together back into a stylized image of the same size as the original input with a few deconvolution operations.
The network being run is defined here https://github.com/lengstrom/fast-style-transfer/blob/master...
This post provides a pretty good explanation of what's happening: https://shafeentejani.github.io/2017-01-03/fast-style-transf...
There's a sequence of 9x9 and 3x3 convolutions that transforms that one big input image into a bunch of smaller images. They're processed by a sequence of residual convolutions. Finally, these tiny tiles are merged together back into a stylized image of the same size as the original input with a few deconvolution operations.