Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: flirt140, how are they determining the gender?
27 points by vijayr on May 14, 2009 | hide | past | favorite | 36 comments
http://www.techcrunch.com/2009/05/12/fked-company-and-adbrite-founder-launches-twitter-dating-site-flirt140/

flirt140 - Yet another twitter app :-(

one line caught the attention though

"Kaplan says a proprietary algorithm is used to determine gender of Twitter users and claims that it’s pretty accurate"

Any idea how they are determining the gender?




I don't know about Twitter, but determining gender (on average) from English language text is pretty much a solved problem in natural language processing.

http://www.hackerfactor.com/GenderGuesser.html

See this paper:

http://u.cs.biu.ac.il/~koppel/papers/male-female-text-final....

Sidenote: this was an epic win for me in Gender Studies class, when I said that not only was the professor's claim that the genders communicate "essentially identically but with some individual variation" wrong, but I would constructively demonstrate it by blind-assigning a stack of any papers she cared to give me. (I also gave her a copy of the above paper, which she read with interest.) The class then retreated from the scary notion of binary decisions on the basis of the scientific method to the more comfortable one of endless arguing whether the differences were socially constructed or biological.


There's a similar service at genderanalyzer.com, based on UClassify's API - http://blog.uclassify.com/gender-text-analysis/.

It thinks Sergey Brin writes like a woman: http://www.genderanalyzer.com/?url=too.blogspot.com


Maybe that ad for 23andme was actually written by Sergey's wife.


What a load of crap. According to this tool, Linus Torvalds writes like a woman, and so do Zed Shaw and Steve Yegge.


But does this work this same way with a 140 character limit since people are often forced to express themselves in a different way?


As long as you have an appropriate training set, I don't see why it would be significantly different. You might run into problems if you train the classifier on some radically different dataset but then try to run it against twitter accounts.


The twitter API gives access to a large number of a user's latest tweets. So it's a series of 140 characters which makes the gender classifier more accurate.


Yeah, but the API doesn't give you access to those users' gender (since Twitter doesn't ask/store that info), which means you have no way to tell the classifier "Here are 500 males' tweets, here are 500 females' tweets."

I suppose you could manually find males and females and train based on their body of works, but it won't be great; you'll likely run into a selection bias.

But if you seeded with that approach, and then used a SpamAssassin-style auto-learner...maybe you'd have a chance?

I suppose this is a case where you don't want "Perfect" to get in the way of "Good enough", especially since it will never be perfect...


That was great. Gender Studies class, huh?

By the way, Gender Guesser gave me a good laugh when I ran it on a comment from HN. It said, "Verdict: Weak MALE Weak emphasis could indicate European."

A scientific basis for the effeminate European stereotype??

Edit: I was joking about the stereotype.


Gender Studies class, huh?

It was a requirement for graduation with a Arts & Sciences degree (I dual-degreed with CS from Engineering) that I demonstrated an interest in subjects covering things other than dead white males. For reasons which were never really clear to me, "I'm an East Asian Studies major!" was not sufficient, so my choices were either Gender Studies or The History of Jazz.

I will say this for required common curricula: if you only take courses which you're interested in or whose built-in intellectual biases are flattering to yours, then you're missing out on a good deal of the educational experience.

I will also say this against required common curricula: there were no gender studies majors in AI class.


Oh, I wasn't criticizing your choice. But surely your experience led you to question certain things.


Thanks for the links!

I find it pretty cool that the first site only does the guessing based on a pretty small list of words - you can see it by viewing the source.

Some words are more commonly male and others are more commonly female - looks as if that's all it is doing.


I submitted this HN post and it says you're Male for both formal and informal.

So, whats the verdict Patio11, are you Male or Female?

Great link nonetheless.


Gender Guesser is less accurate (60%-70%) for HN then a Perl one-liner (>95%):

    say 'Male'


Maybe they integrated that bias into their system?


patio11 is male. I submitted a few text too, its quite accurate.


I have always found it kind of... (weird? An amusing quirk of the culture?) that my HN account is tied very closely with my real identity and yet I'm always addressed by login name rather than real name here. I guess you can file that away as yet another example of the power of defaults.

On other business-related forums, like the Business of Software board, everyone either calls me Patrick or "that bingo guy". That might be because of the display name alone.


Oddly enough, given the post subject, I knew who you were immediately after reading a comment of yours about a week ago (and it had nothing to do about bingo, though it may have been entrepreneurial). I read it and immediately though it sounded like your writing style but didn't recognize the username. Clicked through, and lo and behold, there you were. Just had to register and mention that after your comment.


I figured you had the choice of username, so even though you're Patrick in your profile and if I ever met / emailed you I would use that, you had chosen to be patio11 in this forum.

I also find it odd at times - my username is my full name, yet people responding to me will call me JacobAldridge not Jacob.


Just looked at a few dozen tweets, the use of exclamation points seems to be a dead giveaway.


I hate that!!!!!! It's one of my myriad pet peeves!!!!!!! Along wif spellen like ur a morun, wevver it's on perpose or not!!!!! It makes you look like a drama queen!!!!!!! Which, to me at least, is a huge insult!!!!!!!!!!!!!! And it's some inane piece of misspelled, new-age bullshit far too often!!!!!!! Shock!!! Horror!!!!!!! Shock!!!!! Horror!!!!!!!! <slap>

Sorry. Has anyone built the total perspective vortex yet?


that is a neat link. thank you for posting it.


want to know the secret? you're going to love it.

when you login, you have to run a search. it asks whether you're a guy/girl looking for a guy/girl. i'm guessing they might just save that first parameter, no?

that said, that natural language processing link is really sweet.


Ugh I logged in through OAuth and it automatically made me follow their Twitter account. There should have more fine-grained control over what applications can do without asking/telling me.


What was their twitter account name?


flirt140


My first step would be with first names. Michael = Boy and Amy = Girl. You'd be able to cover most of the users like this.

Look for He/She in messages directed @User.


The company I work for can determine gender based on first name with 80 percent accuracy. I never bothered asking how but most names are gender-specific. As mentioned on other comments on this page there are significant difference between the written expression of men and women although that gap may be narrowed when you are limited to 140 characters.


yes, I thought about it. but how many names can you save/check like that? won't it be a tedious process? Also, what about names from Europe, muslim names etc?


You build a database and program to match the names to a list. The database would contain only a few thousand records which would be easy to manage and speedy. Very simple process for a programmer. None of this work should be done manually.

For names you don't know you keep using other means to dig further. But no matter what tools I would use, matching names to gender would be highest weighted procedure and the first thing I'd try. Of course, there would have to be second and third things to try too.

Remember, these guys didn't say they were 100%. Who even knows what their definition of "pretty accurate" is.


Guessing randomly you have a 50% chance of being right. Weight the odds based on a few rules of thumb and voila?


Perhaps they're using something like this method of guessing gender based on browsing history:

http://www.mikeonads.com/2008/07/13/using-your-browser-url-h...


they are probably doing simple stuff like checking the type of words they used. Like if twit has "xoxoxo" then its a female or they check the user's friends and look for references to the user in their twits. i.e. "vijayr posted that [he] created a new thread on hackernews"


hmm. that much isn't difficult to do - I checked out the site, didn't find a single mistake in 7-8 attempts of searching. Does this mean, that they also determine businesses/organizations and drop them from their list?


Why does "yet another twitter app" make you :(


nothing against twitter, just that there is too much twitter news these days. it is a great, fun and useful service, but not exactly world changing




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: