Ask HN: flirt140, how are they determining the gender?

patio11 · on May 14, 2009

I don't know about Twitter, but determining gender (on average) from English language text is pretty much a solved problem in natural language processing.

http://www.hackerfactor.com/GenderGuesser.html

See this paper:

http://u.cs.biu.ac.il/~koppel/papers/male-female-text-final....

Sidenote: this was an epic win for me in Gender Studies class, when I said that not only was the professor's claim that the genders communicate "essentially identically but with some individual variation" wrong, but I would constructively demonstrate it by blind-assigning a stack of any papers she cared to give me. (I also gave her a copy of the above paper, which she read with interest.) The class then retreated from the scary notion of binary decisions on the basis of the scientific method to the more comfortable one of endless arguing whether the differences were socially constructed or biological.

aneesh · on May 14, 2009

There's a similar service at genderanalyzer.com, based on UClassify's API - http://blog.uclassify.com/gender-text-analysis/.

It thinks Sergey Brin writes like a woman: http://www.genderanalyzer.com/?url=too.blogspot.com

jdale27 · on May 14, 2009

Maybe that ad for 23andme was actually written by Sergey's wife.

blurry · on May 14, 2009

What a load of crap. According to this tool, Linus Torvalds writes like a woman, and so do Zed Shaw and Steve Yegge.

kwamenum86 · on May 14, 2009

But does this work this same way with a 140 character limit since people are often forced to express themselves in a different way?

jey · on May 14, 2009

As long as you have an appropriate training set, I don't see why it would be significantly different. You might run into problems if you train the classifier on some radically different dataset but then try to run it against twitter accounts.

daveying99 · on May 14, 2009

The twitter API gives access to a large number of a user's latest tweets. So it's a series of 140 characters which makes the gender classifier more accurate.

kngspook · on May 14, 2009

Yeah, but the API doesn't give you access to those users' gender (since Twitter doesn't ask/store that info), which means you have no way to tell the classifier "Here are 500 males' tweets, here are 500 females' tweets."

I suppose you could manually find males and females and train based on their body of works, but it won't be great; you'll likely run into a selection bias.

But if you seeded with that approach, and then used a SpamAssassin-style auto-learner...maybe you'd have a chance?

I suppose this is a case where you don't want "Perfect" to get in the way of "Good enough", especially since it will never be perfect...

johnnybgoode · on May 14, 2009

That was great. Gender Studies class, huh?

By the way, Gender Guesser gave me a good laugh when I ran it on a comment from HN. It said, "Verdict: Weak MALE Weak emphasis could indicate European."

A scientific basis for the effeminate European stereotype??

Edit: I was joking about the stereotype.

patio11 · on May 14, 2009

Gender Studies class, huh?

It was a requirement for graduation with a Arts & Sciences degree (I dual-degreed with CS from Engineering) that I demonstrated an interest in subjects covering things other than dead white males. For reasons which were never really clear to me, "I'm an East Asian Studies major!" was not sufficient, so my choices were either Gender Studies or The History of Jazz.

I will say this for required common curricula: if you only take courses which you're interested in or whose built-in intellectual biases are flattering to yours, then you're missing out on a good deal of the educational experience.

I will also say this against required common curricula: there were no gender studies majors in AI class.

johnnybgoode · on May 14, 2009

Oh, I wasn't criticizing your choice. But surely your experience led you to question certain things.

dangoldin · on May 14, 2009

Thanks for the links!

I find it pretty cool that the first site only does the guessing based on a pretty small list of words - you can see it by viewing the source.

Some words are more commonly male and others are more commonly female - looks as if that's all it is doing.

MichaelApproved · on May 14, 2009

I submitted this HN post and it says you're Male for both formal and informal.

So, whats the verdict Patio11, are you Male or Female?

Great link nonetheless.

d0mine · on May 14, 2009

Gender Guesser is less accurate (60%-70%) for HN then a Perl one-liner (>95%):

    say 'Male'

wlievens · on May 14, 2009

Maybe they integrated that bias into their system?

vijayr · on May 14, 2009

patio11 is male. I submitted a few text too, its quite accurate.

patio11 · on May 14, 2009

I have always found it kind of... (weird? An amusing quirk of the culture?) that my HN account is tied very closely with my real identity and yet I'm always addressed by login name rather than real name here. I guess you can file that away as yet another example of the power of defaults.

On other business-related forums, like the Business of Software board, everyone either calls me Patrick or "that bingo guy". That might be because of the display name alone.

Poiesis · on May 14, 2009

Oddly enough, given the post subject, I knew who you were immediately after reading a comment of yours about a week ago (and it had nothing to do about bingo, though it may have been entrepreneurial). I read it and immediately though it sounded like your writing style but didn't recognize the username. Clicked through, and lo and behold, there you were. Just had to register and mention that after your comment.

JacobAldridge · on May 14, 2009

I figured you had the choice of username, so even though you're Patrick in your profile and if I ever met / emailed you I would use that, you had chosen to be patio11 in this forum.

I also find it odd at times - my username is my full name, yet people responding to me will call me JacobAldridge not Jacob.

jhamburger · on May 14, 2009

Just looked at a few dozen tweets, the use of exclamation points seems to be a dead giveaway.

badger7 · on May 14, 2009

I hate that!!!!!! It's one of my myriad pet peeves!!!!!!! Along wif spellen like ur a morun, wevver it's on perpose or not!!!!! It makes you look like a drama queen!!!!!!! Which, to me at least, is a huge insult!!!!!!!!!!!!!! And it's some inane piece of misspelled, new-age bullshit far too often!!!!!!! Shock!!! Horror!!!!!!! Shock!!!!! Horror!!!!!!!! <slap>

Sorry. Has anyone built the total perspective vortex yet?

vijayr · on May 14, 2009

that is a neat link. thank you for posting it.

ashishk · on May 14, 2009

want to know the secret? you're going to love it.

when you login, you have to run a search. it asks whether you're a guy/girl looking for a guy/girl. i'm guessing they might just save that first parameter, no?

that said, that natural language processing link is really sweet.

tlrobinson · on May 14, 2009

Ugh I logged in through OAuth and it automatically made me follow their Twitter account. There should have more fine-grained control over what applications can do without asking/telling me.

MichaelApproved · on May 14, 2009

What was their twitter account name?

tlrobinson · on May 14, 2009

flirt140

MichaelApproved · on May 14, 2009

My first step would be with first names. Michael = Boy and Amy = Girl. You'd be able to cover most of the users like this.

Look for He/She in messages directed @User.

kwamenum86 · on May 14, 2009

The company I work for can determine gender based on first name with 80 percent accuracy. I never bothered asking how but most names are gender-specific. As mentioned on other comments on this page there are significant difference between the written expression of men and women although that gap may be narrowed when you are limited to 140 characters.

vijayr · on May 14, 2009

yes, I thought about it. but how many names can you save/check like that? won't it be a tedious process? Also, what about names from Europe, muslim names etc?

MichaelApproved · on May 14, 2009

You build a database and program to match the names to a list. The database would contain only a few thousand records which would be easy to manage and speedy. Very simple process for a programmer. None of this work should be done manually.

For names you don't know you keep using other means to dig further. But no matter what tools I would use, matching names to gender would be highest weighted procedure and the first thing I'd try. Of course, there would have to be second and third things to try too.

Remember, these guys didn't say they were 100%. Who even knows what their definition of "pretty accurate" is.

jaydub · on May 14, 2009

Guessing randomly you have a 50% chance of being right. Weight the odds based on a few rules of thumb and voila?

ryanwaggoner · on May 14, 2009

Perhaps they're using something like this method of guessing gender based on browsing history:

http://www.mikeonads.com/2008/07/13/using-your-browser-url-h...

vaksel · on May 14, 2009

they are probably doing simple stuff like checking the type of words they used. Like if twit has "xoxoxo" then its a female or they check the user's friends and look for references to the user in their twits. i.e. "vijayr posted that [he] created a new thread on hackernews"

vijayr · on May 14, 2009

hmm. that much isn't difficult to do - I checked out the site, didn't find a single mistake in 7-8 attempts of searching. Does this mean, that they also determine businesses/organizations and drop them from their list?

MichaelApproved · on May 14, 2009

Why does "yet another twitter app" make you :(

vijayr · on May 14, 2009

nothing against twitter, just that there is too much twitter news these days. it is a great, fun and useful service, but not exactly world changing