Hacker News new | past | comments | ask | show | jobs | submit login
Using Deep Learning to model personal visual aesthetics (nvidia.com)
144 points by shackenberg on March 31, 2017 | hide | past | favorite | 6 comments



Woah, figures 7 and 8 blew my mind. This reminds me of the various Pinterest boards I've curated in the past. I would set a theme and try to collect visual items in that theme.

Sometimes I could not describe the theme in mind clearly, for example "foresty-earthy suburban adolescent feelings with little-to-no ruggedness but with a bit of a punk edge". Of course, no single image could fulfill the entirety of that theme (probably), so it's fascinating to wonder how aesthetic preferences emerge in the mind, though it's possible that with a description like that another person could filter images to match that description.

Are we combining various specific preferences (the color green, for example), or are we driven by the emotional flavor of a whole aesthetic object (a haze-covered mountain range evoking nostalgia for childhood hikes with siblings leading to the specific preference for pine trees leading to the specific preference for the color green, etc), basically top-down, bottom-up or a combo? Just some thoughts...


Yeah, this is applicable to huge swathes of Internet culture. Tons of Pinterest boards, tumblrs, and subreddits are just collections of images that fit a particular aesthetic. What will the web look like when almost all curation can be automated?


> What will the web look like when almost all curation can be automated?

Better?


Man, this is crazy awesome stuff. I wonder if you could use a deep dream type thing to make an image more like your own style - that'd be next level.


Total side note, did anyone check out the EyeEm website (the apparent authors of the article)?

Their curation algorithms are doing a pretty good job! Their "selected" Galaxy photos look amazing. https://www.eyeem.com/en/pictures/galaxy


Summary of approach: they embed the photos (convert photo->vector of numbers using T-SNE or CNNs [the details are actually here https://devblogs.nvidia.com/parallelforall/understanding-aes...) and then train a small-ish classifier (three fully connected layers) on top of it to capture a user's preference. A pretty obvious approach, basic version should be doable in a hackathon, but cool result nonetheless.

"We chose a three-layer multilayer perceptron (MLP) network as a good ranker, since it is able to capture the inherent non-linear shift in distribution between the user’s choices and the original training set. Notably, an MLP can be trained rapidly by leveraging GPU computation to obtain near-real-time results. This is important because it enables us to build interactive interface, as we’ll explain. We typically precompute a set of negative features (about 40,000 negative samples) and extract the positive features from the user-provided input."




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: