Woah, figures 7 and 8 blew my mind. This reminds me of the various Pinterest boards I've curated in the past. I would set a theme and try to collect visual items in that theme.
Sometimes I could not describe the theme in mind clearly, for example "foresty-earthy suburban adolescent feelings with little-to-no ruggedness but with a bit of a punk edge". Of course, no single image could fulfill the entirety of that theme (probably), so it's fascinating to wonder how aesthetic preferences emerge in the mind, though it's possible that with a description like that another person could filter images to match that description.
Are we combining various specific preferences (the color green, for example), or are we driven by the emotional flavor of a whole aesthetic object (a haze-covered mountain range evoking nostalgia for childhood hikes with siblings leading to the specific preference for pine trees leading to the specific preference for the color green, etc), basically top-down, bottom-up or a combo? Just some thoughts...
Yeah, this is applicable to huge swathes of Internet culture. Tons of Pinterest boards, tumblrs, and subreddits are just collections of images that fit a particular aesthetic. What will the web look like when almost all curation can be automated?
Summary of approach: they embed the photos (convert photo->vector of numbers using T-SNE or CNNs [the details are actually here https://devblogs.nvidia.com/parallelforall/understanding-aes...) and then train a small-ish classifier (three fully connected layers) on top of it to capture a user's preference. A pretty obvious approach, basic version should be doable in a hackathon, but cool result nonetheless.
"We chose a three-layer multilayer perceptron (MLP) network as a good ranker, since it is able to capture the inherent non-linear shift in distribution between the user’s choices and the original training set. Notably, an MLP can be trained rapidly by leveraging GPU computation to obtain near-real-time results. This is important because it enables us to build interactive interface, as we’ll explain. We typically precompute a set of negative features (about 40,000 negative samples) and extract the positive features from the user-provided input."
Sometimes I could not describe the theme in mind clearly, for example "foresty-earthy suburban adolescent feelings with little-to-no ruggedness but with a bit of a punk edge". Of course, no single image could fulfill the entirety of that theme (probably), so it's fascinating to wonder how aesthetic preferences emerge in the mind, though it's possible that with a description like that another person could filter images to match that description.
Are we combining various specific preferences (the color green, for example), or are we driven by the emotional flavor of a whole aesthetic object (a haze-covered mountain range evoking nostalgia for childhood hikes with siblings leading to the specific preference for pine trees leading to the specific preference for the color green, etc), basically top-down, bottom-up or a combo? Just some thoughts...