Using Deep Learning to model personal visual aesthetics

nefitty · on March 31, 2017

Woah, figures 7 and 8 blew my mind. This reminds me of the various Pinterest boards I've curated in the past. I would set a theme and try to collect visual items in that theme.

Sometimes I could not describe the theme in mind clearly, for example "foresty-earthy suburban adolescent feelings with little-to-no ruggedness but with a bit of a punk edge". Of course, no single image could fulfill the entirety of that theme (probably), so it's fascinating to wonder how aesthetic preferences emerge in the mind, though it's possible that with a description like that another person could filter images to match that description.

Are we combining various specific preferences (the color green, for example), or are we driven by the emotional flavor of a whole aesthetic object (a haze-covered mountain range evoking nostalgia for childhood hikes with siblings leading to the specific preference for pine trees leading to the specific preference for the color green, etc), basically top-down, bottom-up or a combo? Just some thoughts...

bbctol · on March 31, 2017

Yeah, this is applicable to huge swathes of Internet culture. Tons of Pinterest boards, tumblrs, and subreddits are just collections of images that fit a particular aesthetic. What will the web look like when almost all curation can be automated?

erichocean · on April 1, 2017

> What will the web look like when almost all curation can be automated?

Better?

gallerdude · on March 31, 2017

Man, this is crazy awesome stuff. I wonder if you could use a deep dream type thing to make an image more like your own style - that'd be next level.

aantix · on March 31, 2017

Total side note, did anyone check out the EyeEm website (the apparent authors of the article)?

Their curation algorithms are doing a pretty good job! Their "selected" Galaxy photos look amazing. https://www.eyeem.com/en/pictures/galaxy

andreyk · on March 31, 2017

Summary of approach: they embed the photos (convert photo->vector of numbers using T-SNE or CNNs [the details are actually here https://devblogs.nvidia.com/parallelforall/understanding-aes...) and then train a small-ish classifier (three fully connected layers) on top of it to capture a user's preference. A pretty obvious approach, basic version should be doable in a hackathon, but cool result nonetheless.

"We chose a three-layer multilayer perceptron (MLP) network as a good ranker, since it is able to capture the inherent non-linear shift in distribution between the user’s choices and the original training set. Notably, an MLP can be trained rapidly by leveraging GPU computation to obtain near-real-time results. This is important because it enables us to build interactive interface, as we’ll explain. We typically precompute a set of negative features (about 40,000 negative samples) and extract the positive features from the user-provided input."