Hacker News new | past | comments | ask | show | jobs | submit login

>Be female. Women are consistently ranked higher than men. In particular, notice that there is not a single guy in the top 100.

This sounds true, but it can't be the real reason—selfies are ranked relative to the other images by the same user. So unless users are taking a lot of #selfies of people of different genders, we can assume the dataset is already controlled for the gender of the person in the image, no? Unless there's some confounding factor at play, such as some demographic segment being more likely to optimize for good selfies occasionally but have boring feeds the rest of the time.

would be super interesting, if the data is available, to normalize this by exposure. Of the people that saw an image, how many clicked "like"?




Well, one of the other factors is long hair and the tendency to oversaturate the face. Those factors don't seem independent to me, men are less likely to sport long hair and they're also less likely to oversature the face to measure up to some skin perfection standards (think of it as the photographic equivalent of makeup).

> but it can't be the real reason

Can't? Ontop of the above-listed aspects it is entirely possible that there is a bias that both sexes find female appearance somewhat more aesthetically pleasing.

Similar to how focus group testing for computer voices tends to result in female voices being chosen (at least that's what I often hear, couldn't find a solid source).

Even if the bias is small the correlated factors would amplify it when you're optimizing for a maximum, i.e. for the top selection.


Neither of those explain why it would rank above the average of other female faces, in general.

Discussion about this with the author reveals that I was misinterpreting how they were collecting averages. I was assuming the "like" count was coming from each photo collected, but instead they collected the photos and average likes in individual steps, where the average likes were across recent posts by that user, rather then the selfies by that user.


I screwed up on this point by the way - I had done this part of the experiment a few months ago and I incorrectly remembered the details. I went back and looked through the code and adjusted the post with more regarding this important point. In particular:

"Now it is time to decide which ones of those selfies are good or bad. Intuitively, we want to calculate a proxy for how many people have seen the selfie, and then look at the number of likes as a function of the audience size. I took all the users and sorted them by their number of followers. I gave a small bonus for each additional tag on the image, assuming that extra tags bring more eyes. Then I marched down this sorted list in groups of 100, and sorted those 100 selfies based on their number of likes. I only used selfies that were online for more than a month to ensure a near-stable like count. I took the top 50 selfies and assigned them as positive selfies, and I took the bottom 50 and assigned those to negatives. We therefore end up with a binary split of the data into two halves, where we tried to normalize by the number of people who have probably seen each selfie. In this process I also filtered people with too few followers or too many followers, and also people who used too many tags on the image."


Still no men in the top 100 ? There must be something deep to learn about the difference in sexes there, I am just not sure what it is.


> focus group testing for computer voices tends to result in female voices being chosen

I personally prefer the Alex voice from Mac OS to female voices. It has nice intonation. If only I could make it correct some of the mistakes it makes, for example not being able to distinguish "read" in past tense from "read" in present tense which makes it sound silly. Another error it makes is confusing "live" as in "live concert" with "live" as in "live in USA" (they are called heteronyms and are a special case in TTS).


You can fix this by misspelling your input text. Use 'red' as the read past tense. Use 'laif' and 'lif' in the latter.


Yeah, female users probably post more pictures and also probably have more friends.


This also would be controlled for by the tools the blog author used though—if a women has more friends, then they would also probably get more likes on all of the rest of their images. Not sure if posting more photos would drive the average up or down, but it would probably drive the "above the baseline" selfies in the same way.


More friends, but then the likes are not uniformly distributed with the increase of friends. Also more pictures means the "best picture" could be more of an outlier.

So best pictures might rise further above baseline for that person. That is, top picture gets 1000 likes, but most pictures get zero. Sort of like Zipfian distribution of words.

Anyway, these things are actually really hard to control for particularly because different types of friends/people have different effects on the likes. Now add to this cultural differences between countries/states/universities/rural-urban, etc.

I think the best method that is actually practical was the one okcupid did at some point with "my best face" where you rate a bunch of people's pictures and they rate yours. Then you figure out what pictures are good from the data.

If they kept the data for all these contests, it would be much easier to interpret in aggregate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: