Hacker News new | past | comments | ask | show | jobs | submit login
Amount of profanity in git commit messages per programming language (andrewvos.com)
285 points by AndrewVos on Feb 22, 2011 | hide | past | favorite | 184 comments



Out of 929857 commit messages, I found 210 swear words (using George Carlin's Seven dirty words).

That set only includes [shit, piss, fuck, cunt, cocksucker, motherfucker, tits], so these are probably not meaningful results.

I have personality commented "asshole forgot to increment the counter" 527 times in 4 different languages.

[EDIT: 528 times in 5 different languages. Sorry, bitches.]


Yeah, any viable list of swear words has to include "damn" (and derivatives), "hell", and "ass" (and derivatives). I'd even go so far as to say that "crap" and "retard" (and derivatives) are sufficiently unprofessional that they belong on the list.


Also was fuck searched as a word on its own? Because if you include compound words with fuck you are going to catch a long tail of interesting profanity, especially amongst programmers.

You'd want to whitelist against the Scunthorpe Problem though.


Add "wtf" to this list and you match more than 50% of all commit messages ;-)


Oh no, wtf is professional terminology, cannot remove.


And if one goes that far, one would have to include variant spellings, because in the heat of the moment of cursing an opaque or broken piece of code, a programmer is likely to froth at the keyboard.


we may need to implement Levenshtein Distance for this


then you run into the Duck Hunt problem.


Swearing and cursing aren't the same things.

Damn is a curse, but not actually considered a swear word in most parts of the world.


Hmm, that's not very comprehensive for this purpose.

Years ago at a former employer, we discovered just after shipping a large quantity of quite sensitive demo materials that an outside contractor had managed to slip a hidden profanity into it. Oh, the joy that caused....

The immediate reaction though was that a few of the less polite and proper members of the organisation were tasked with producing as close to an exhaustive profanity list as they could so we could do a relatively thorough sweep. From memory that list was pushing 40 terms - I think I might still have a copy somewhere but the last thing this discussion needs is more swearing!


Years ago I developed a custom CRM for a client that included reminders. While testing the production version of the system pre-launch, I added many fake reminders, for users who happened to be real, along with a lot of other test data. I then cleared all of the test data...except I accidentally skipped the reminders table.

For weeks afterwards, my clients would receive profanity-laden reminders, helpfully labeled as "from Adrian".

Luckily, they had a good sense of humour about it. But since then, I've never used profanity in any kind of test data.


I never use anything remotely non-serious in my test data. I even try make comments type on forms in tested seem at least close to real. Silly names are out too. You never know when some fool is going to show a visiting client a development system instead of one of the official demos, and I've been stung by someone not being amused by the existence of Don Kiddick and Mike Hunt in my sample dataset.


or at least not used your real name...


We have a table full of unsavory words due to generating more than one password that upset someone. At one point someone decided we needed "pronounceable" passwords for our new users without considering the consequences.


I had a brief play once for this sort of thing with doing stem-based pronounceable word generation based off a starting dictionary; the idea being that it broke them into fragments marked as beginning, middle or end and used intersections of these to build words that followed letter use rules of known words but yet weren't known words themselves. You start getting distance problems though - a lot of words it generates are too close to or are trivial encapsulations of existing words, so that'd need taking account of. One day I'll have a proper play and do something more comprehensive for this.

Tip for anyone cacheing unsavoury words - from memory the bulk of what we ended up checking for were racial slurs, of which there's an astonishing (and unsurprisingly heavily localised) variety.


Random numbers can also you in trouble in this way. E.g. anything with 666. The hard part is cross culture references that you may be unaware of.


bitches

I cannot explain how much the injection of this in current use irks me.

Is this lack of civility, and appalling misogyny, really needed for satisfying self-expression nowadays?


Is this overreaction and appalling misuse of the word "misogyny" really needed for satisfying self-expression nowadays?


Why is it not misogyny? Would you drop the phrase "Sorry, niggers." so casually into conversation? Probably not (after all, it's racist), but it still uses a segment of society in derogatory fashion.


It's not misogyny because, according to Merriam-Webster's 11th edition, misogyny means "a hatred of women," and the original comment wasn't about that at all.

And it has nothing to do with a segment of society. You're complaining about someone half-jokingly calling us all whiny babies.


The meanings of words change over time, get the fuck over it.

People like you were complaining about the common usage of "damn" years ago too.


Nice ad-hominem there.

"damn" is no longer an issue because most people aren't terribly religious nowadays, at least not enough for people to censor themselves. That's not the case for "bitch".


You can only take offense, not give it. Nobody can be responsible for your feelings but yourself.


What your argument tries to do is sneak a false dichotomy onto the table.

It's absolutely true that people are responsible for their feelings. But that doesn't mean you aren't also responsible for your choice of words. By saying that I am responsible for how I feel when I hear the word "bitches," you are trying to imply that you aren't also responsible for choosing to say it. It's perfectly valid to say that we are BOTH responsible for our choices. I should choose to ignore you, and you should choose another word. There is no need to say that one or the other but not both of us should be responsible.

I find your arguments along these lines to be passive-agressive. If you want to hurt other people with words, own up to wanting to hurt other people with words. Don't pretend that it's everyone else's fault. Because surely, if nobody took offense to these particular words, you would hunt around until you could find words that would cause offense.

You choose to use these words and phrases precisely because they have shock value. It's not like you use the word and are surprised it carries some special meaning that offends people. I see elsewhere you have told people to "fuck off." Are you seriously suggesting you weren't trying to give offense? Because if it isn't possible to give offense, why are you trying so hard to offend people??


Greeting groups of people as "sup bitches" is a popular culture phenomenon. It's not used for shock value or to give offence (at least in my social circles....).

My use of both "profane" and non-profane offensive comments in my comments is to drive home my point. It is not intended to hurt others, but merely make them think to themselves "frack it, I'm ignoring him." "What he says would normally offend me, but I'm going to have a nice evening with my wife and kids instead" ...or something along those lines.

If I get anyone to that point then maybe they'll know that in the future, if some anonymous dude dares use the dreadful word "bitch" on the internet, it's best for the sanity of everyone to let it slide.

"Because if it isn't possible to give offense, why are you trying so hard to offend people??"

-gets back in character...- Well I'd say there is a keen difference between "giving offence", which I am of course not doing, and "presenting others the opportunity to take offence". The choice is theirs.


Well I'd say there is a keen difference between "giving offence", which I am of course not doing, and "presenting others the opportunity to take offence". The choice is theirs.

That's the most cowardly thing I've heard all day. But it's only lunch time, so we'll see how it goes.


So much for my attempts to discuss this intelligently, instead of emotionally. I going to consider this one more point of data in support of my hypothesis...


That's hair splitting terminology. If you say or do something which you know the majority of people will find offensive, then that's effectively giving offense.

You didn't answer the original question, either: would you drop the phrase "Sorry, nigger." into casual conversation? If not, why not?


Your question is irrelevant. The word being discussed is "bitch" or "bitches", which I most certainly do use often and causally.

You are in charge of if you get offended. It is a decision you and only you can make.


Hardly irrelevant - it's called reductio ad absurdum. If you limit yourself by not using certain words in conversation, then you're confirming that those words are generally offensive, and that using them would give offense.

It's why the legal system has the concept of the "reasonable person", but I prefer "Do you talk to your mother with that mouth?"


The only reason I do not casually use the word nigger in public is because unlike god, HR departments appear to be omnipresent.

"Do you talk to your mother with that mouth?"

Yes, I do. And if that bothers you, kindly fuck off. You have no place telling me what I may or may not say in the presence of my mother.

If an adult loses their temper in public, they are rightfully looked down upon. If an adult publicly becomes offended, I similarly look down upon them.


Right - so you're a racist, misogynistic idiot who'd rather put up pithy one liners and tell me to fuck off than address the point. Glad we got that cleared up.


Says the poster who was previously complaining about ad hominems...

No, I'm just an adult who recognizes that words only have the power that you give them. If you wish to continue to allow your emotions to be a slave to the language of others, that is your prerogative; but don't expect others to follow you.

And yes, I addressed your question. Read the parent comment of your post again.


You didn't address the question, you dropped a pithy one liner about HR departments. And you're still avoiding the issue by playing silly philosophical games. Of course words have meaning and power, particularly ones which are derogatory towards sections of society.

And that's all I'm going to say - this thread has gone on long enough.


The meaning of my "pithy one line", which you seem to be completely blinded to for some reason, is that I do not say the word "nigger" casually, but not out of concern for anybodies feelings. Rather, I avoid the word solely because I am paranoid of HR departments, and all things associated. This meaning seemed clear enough to other observers/participants.

Says a lot about you that the inclusion of minorly offensive words or implications can cloud your ability to critically interpret what others are saying. When you allow somebody else to offend you, or make you angry, you are allowing them to impair your thought processes. You are giving others the power to control you.

In order to defend yourself, you must realize that although somebody says something mean, it is up to you if you get angry, and although somebody may say something offensive, it is up to you if you become offended. In the real world, people are going to say shit you don't like. Trying to "correct" their behaviour is the wrong approach.


Not sure the HR department point is a 'pithy one liner'. If they didn't have their power I'd be singing along with Kanye too (there's a difference between using the word 'nigger' which is not racist, and calling someone a 'nigger', which is).


I love the implication here that were more people religious these days, it would be improper to say "damn".


Well, wouldn't it? "Improper" means "will offend people." If nobody is offended by something, it's not improper. If lots of people are, it is.

Whether you care is a different matter. If you don't, that makes you a rude person. Which you're free to be, if you wish.


It's improper to not adhere to the religion of the majority? Are you nuts?


You're showing your disconnection with popular vernacular with your assertion. I'm 22 and girls around me call each other bitches casually (in a friendly, fun way) all the time.

You certainly don't speak on the behalf of any women I know.

Anyways, holla out to my HN bitches.


Why can't you explain it?

Is it because your post is a bunch of poorly conceived bullshit? Is it because you don't really give any time to consider the intent of what's being said and the social context of the terms employed? Is it cause you're looking to be offended?

If you can't provide some analysis of the origins of thoughts and feelings which are presumably your own, maybe you should keep them to yourself till you can.


If you're gonna disagree, have the courage to do so in writing.


I never curse in my commit messages. That doesn't mean I don't want to! Cursing is a vice of mine, acquired through summers of cleaning bathrooms and picking up trash at a state park in high school. I use euphamisms when coding professionally, but it's easy to map my commit messages at old companies back to my original swear.

"Blameless" bug:

   Original: Now recalculates the height of the container element after repopulating
       the content.
   Translation: Did Bob test this fucking thing ONCE before he committed this?
Fixing my own mistake:

   Original: Tweaks the NUM_PATHS config value.
   Translation: Wow, I apparently have shit-for-brains. I hope nobody ran a build in
       the past 20 minutes.
Overdesigning:

   Original: Updates the object creation code per Bob's feedback.
   Translation: Another Goddamn FactoryFactoryBuilder?! I officially don't 
       understand this codebase.
Major cleanup needed:

   Original: Style tweaks needed for GCC compilation.
   Translation: OMFG. This isn't even valid C++. It doesn't even compile.
OK, I'm not perfect:

   Original: Fuck IE7.
   Translation: No seriously, fuck IE7.


I'd kinda like to see which swear words appear most often in commit messages. I'm guessing that "shit" and "fuck" are much more common than "cocksucker" and "motherfucker", and if that's not true, I want to know which language has the most cocksuckers and motherfuckers.


Yeah, the pie chart doesn't quite cover it - I'd like to see both swear words per commit per language (if, say, Java has 10% of the swear words but 3% of the commits) and complexity of the swear words - a simple "Fuck" implies far less frustration than a "Motherfucking Cocksucker!"

Could develop quite a nice Programming Language Pain Index…


From what I remember there was only one or two "motherfuckers".

I will post up some more data if anyone is interested.


I run a slang dictionary website which lets users assign an offensiveness score to each term. That would be an interesting bit of data to add: not only the raw word count, but how offensive the swearing is for each language.

(For sample data, the 100 most vulgar words on the site are in a table here: http://onlineslangdictionary.com/lists/most-vulgar-words/ )


If it's legal, it would be awesome of you to make this commit message dataset available on Infochimps or something.


i'd say alter the list of swear words in general to a list possibly more tuned to programming. agreeing with above, word breakdown would be nice as well.

actually, maybe just a github swear browser.


How to offend members of 3 different programmer communities in 9 different ways with just one sentence: "It somehow makes sense that C++, Ruby, and JavaScript are all equally profane."


Or that they are so relevant in 2011 :-)


If you think that's clever, then perhaps you're lacking a few bits of programming language history. If you think it's only partly clever, you're perhaps lacking fewer pieces. If the parent comment just makes you roll your eyes and think it's probably not worth explaining, you probably understand what I'm getting at.


Given that there were only 210 total swear words, the accuracy of this seems pretty questionable. It's possible that one guy could be responsible for a large percentage of swearing for a given language.


It's code comments vs. commit messages, but the prevalence of profanity in the Linux kernel tree suggests developers' use of blue speech is pretty widespread.


> It's possible that one guy could be responsible for a large percentage of swearing for a given language.

Sorry, Ruby!


I want to know the proximity of the curse to 'IE' in the Javascript commits


Pie chart? I have no idea how to interpret this...

http://www.flickr.com/photos/amit-agarwal/3196386402/sizes/l...


Pie charts have a lot of drawbacks, sure, but it's ridiculous that we're at the point now where the first (and highest rated) response to a pie chart is always a negative comment about pie charts, regardless how good or bad the pie chart is.

This one in particular is very clear:

C++, Ruby and Javascript have the most profanity. They're relatively equal to each other and collectively account for more than 50% of the swearing in commit messages.

C is next, with significantly less swearing.

C# and Java are roughly tied a bit below C.

Python and PHP have, comparatively, almost no swearing.

Was that really so hard? When the data is already subjective (what is and isn't a swear word) and intended almost solely for humor, do we really need more precision than a pie chart offers?

It is at best hyperbolic and at worst dishonest to say you "have no idea" how to interpret this. You have an idea. You just don't have precision.


> Python and PHP have, comparatively, almost no swearing.

Of course. Python users are happy people.

I wonder what happens with PHP... ;-)


For PHP, you'd have to search for "ass_hole", "fuckmother", "cock_sucker", and so on to be fair.


Reading the ones with the underscores makes me think this mode of swearing deserves its very own accent for when it's read aloud.


thanks, I almost spit tea on my computer :)


Complete lack of comments?


> I wonder what happens with PHP... ;-)

Ignorance is bliss?

> Of course. Python users are happy people.

Heh, I've been using Python lately and have felt lots of urges to swear, but I can't get myself to commit it. Shoot.


PHP code gets delivered to the customer :)


So does Python.


much fewer lines of it.


Did they check swear words in other languages? ;-)


maybe they don't know about github?


The same number of commits were taken from each language.


> C++, Ruby and Javascript have the most profanity. They're relatively equal to each other and collectively account for more than 50% of the swearing in commit messages.

this is the problem. In the pie chart it's almost impossible to determine which of those three has the most. In the bar chart, it's fairly obvious to my eye that C++ wins, though JS/Ruby are very close.


Rather than being organized by language names, the items in the pie graph should have been grouped by size (largest at 12, proceeding clockwise to the smallest at 11:59, for example). What relationship is there to show between the grouped names of programs that outweighs making this clear?


Dude, no. I think he was talking about how you can't tell how the size of the user base of a language is affecting the ranking. So, for example, only 1% of all projects could be in Java, but the swearing could be frequent enough to make it have ~15% of all curse words.


> Note that I ripped an equal amount of commit messages per language so the results aren't based on how many projects there are per language.

All the languages are equally represented by commit count.


but his total number is 929857, which is not divisible by 8


What a bummer... The percentages might be off by a fraction of a fraction of a percent...


I see no reason to believe that, given his process for ripping an "equal" number of commit messages per language was broken, that anything else even approaches validity. It's simple arithmetic; a grade schooler who notices that the last number is 7 would realize something's off.


What about the process is broken? Did you read the code and find bugs? With a total commit count of 929857 missing a single commit to round out to a perfectly even number of commits in each language is insignificant.


Or he had 929857 commit's and then he randomly sampled an equal number for each language. Thus, no division etc.


Then, of course, you have problems of sample size. Nearly a million commits is a pretty good sample size; a hundred, not so much.


Sorry, my bad.


[deleted]


Ahem, "Note that I ripped an equal amount of commit messages per language". I do think this is a bad place for a pie graph, but your specific criticism here is misplaced.


Yes, and? The amount of messages is equal, the amount of profanity per a set of commit messages (which is what is measured here) is not.


This was in response to a now deleted comment that claimed that more popular languages would show up as having more profanity because they have more commits, even if the profanity per commit was constant.


I get it now. Sorry, my bad.


What are you getting at?


But he said he sampled equal number of commit messages from each language.



A good weekend project would be to take an existing graphing library and make a wizard for it that would create a correct type of graph based on the data and your stated intentions with the data, as shown in the flowchart above.


Thanks for reminding me again why I don't bother reading these forums. One day I'll quit clicking links too.


Note that I ripped an equal amount of commit messages per language so the results aren't based on how many projects there are per language.

I like how he had to tweak the data collection process to make the visualization method fit.


That is not the case. He wanted to compare curse words across languages independent of language popularity. If he did not collect the same amount of data per language, then he would have two variables: number of curse words and number of commits. Then there would be the danger that a more popular language would have far more curse words simply because it has far more commits.


I wouldn't call it to 'tweak' the data collection. He is simply normalizing the results to ignore the differences in language distribution.

This is normal and has nothing to do with how you choose to represent it.

It would have been meaningless to show any graph or table saying 'Python has the most messages with profanity" if the amount of Python projects is 80% of all the projects out there.


He is right to normalize the results, but parent's point is that he is wrong to do that by modifying his data collection.

He should just collect as many commit messages as possible, then divide the profanity count for each language by the commit message count. Because that has lower standard error [and no more bias] than what he did.


That's not the case, that's just his personal choice. He could just as well have gone with %age of swear words per commit which would have made the number of commits per language irrelevant (as long as that number was kept above statistical nonsense) and would have yield the same result.


What are you talking about?


That flow chart there is helpful, thanks.


I added a pie chart :)


A neat idea, although I think the pie chart isn't really the right format. I'd prefer to see a bar graph, with the y-axis as (swears/million messages) or similar.


and please the colors. Why the two greens and two blues? Use either two colors (to differentiate alternative elements) - you would not need this if this were a bar chart (which it should be) or use 8 markedly different colors.


I wrote a post on my blog 4 years ago (!) with lots of examples of profanity in code comments.

It took a half-hour to write and has consistently gotten more traffic than the rest of my blog.

Ah well, give the people what they want: http://codeulate.com/2007/12/fcking-programming/


To be fair, some of those are hysterical.

In particular:

    # no, no, no, no, no, no, no, no
    # no. fuck no. I am a fucking
    # moron.
I think it's the punctuation. It's like angry japanese poetry.


My favourite that I have seen in a codebase is

# This is hideous # I want to poke my eyes out with a fork!


I'm completely stunned that PHP is on the bottom here.


PHP is very international - I wonder if it would rank higher if, say, German swearing was counted?


That's the most intellectual point anybody has made in this anti-php thread.


A small child doesn't think a crayon is badly designed until he has used a pencil or a pen. Without a frame of reference, a PHP developer has little reason to swear at the code.


Am I the only one on HN that's about sick to death of the "PHP developers are children/idiots/bad programmers" meme? I can (and have) been paid to code in all of the following languages:

C Perl JavaScript PHP

I also occasionally code in Python and assembly for fun.

When it comes to web projects PHP is my goto language because it works in all web environments without a bunch of bullshit surrounding installation, versioning or deployment. It's 2011 people, are you telling me developers are still trying to establish some sense of shiny self importance based on language choice? Sad.


> Am I the only one on HN that's about sick to death of the "PHP developers are children/idiots/bad programmers" meme?

Most of the developers are bad. Assuming a normal distribution, pretty much half of them are below the average and average is not usually something to brag about.

That said, languages that stand a higher chance of generating employment should attract those that don't want to learn new programming languages (something good programmers tend to do) and want to make their bets on sure technologies. I expect programmers interested in Lisp, Haskel, Erlang, Python, Ruby, Lua, Scala, Forth, Smalltalk to be, on average, better than those who choose Java, C#, PHP or Visual Basic. Of course, not all of those who choose these languages are bad programmers (and the opposite is also true), but I bet you will find more mediocre programmers on the second-group.

(burn, karma, burn)


I'm sorry but you're full of shit. "Most of the developers are bad" is a faith-based unquantifiable statement. In all the project work I've done (open source and proprietary) over the last four years I've encountered two (2) truly miserable PHP developers. One works for a newspaper company (I guess they couldn't afford talent), the other is so new he's still learning the implications of for() vs while() and since the guy's actually really smart just a smidge of mentoring and some experience and he'll train up nicely.

"languages that stand a higher chance of generating employment should attract those that don't want to learn a new programming language" I think you're confusing PHP and Visual Basic/.NET here. Based on hit counts of job posts there are way more job opportunities for MS developers than PHP developers and based on median salary data coding .NET is worth about 20k a year more, so if they're rational actors the "just in it for a paycheck" crowd should be migrating to Windows development.

Your language salad comparisons are likewise faith-based. How about we trade anecdotes instead? I've met a total of six Ruby developers during my professional career. To a man they were immature, arrogant and totally in love with the hype surrounding their language of choice. Four of the six could be fingered for running multi-million dollar projects into the ground courtesy of a year and a half long series of hipster love-in's masquerading as a SCRUM meetings. Meanwhile during the same time period myself and one other developer launched 65 sites on four platforms with a combined total of around 1 million in annual ad revenue generated for the company. You just keep singing your song, man. I'll keep shipping shit.

Edited to add: just keep on downvoting, truth hurts.


Calm down. I agree with your anecdotes and have observed similar events and people.

Still, if the highly abstract programmer competency follows a normal distribution (and I admit a leap of faith there), you'll agree with me that half of the programmers out there will end up below the average. You'll also agree with me that just above the average doesn't make a good programmer, so, most of the programmers around us will be bad ones.

I don't believe you were downvoted for saying painful truths (which I don't think you did), but more on the form of your statement. I believe most downvoters didn't pass the first sentence. I believe this is a good thing - poor form is bad for discussion. We shouldn't get emotional.


> Calm down.

It's hard to remember this is HN when walking into threads like these. As I mentioned elsewhere, if PHP is mentioned, HN maturity takes a nose dive as people make what amounts to dick and fart jokes. It's embarrassing.

> I don't believe you were downvoted for saying painful truths (which I don't think you did), but more on the form of your statement. I believe most downvoters didn't pass the first sentence. I believe this is a good thing - poor form is bad for discussion. We shouldn't get emotional.

It's not because of his tone. It's because he's defending PHP. Look at other posts in this thread. Look at the bitches thread. Littered with immaturity and a far worse tone than the parent here. Sure, the tone didn't help here, but if his tone had been the same, but defending Ruby, or Lisp, or some other fashionable language, he wouldn't have been. Even your post smells like this.

One more thing...

> I expect programmers interested in Lisp, Haskel, Erlang, Python, Ruby, Lua, Scala, Forth, Smalltalk to be, on average, better than those who choose Java, C#, PHP or Visual Basic.

Ruby is the language people go to for work now. Why? Because it's the in language, especially in the context of HN. And its' not just Ruby, but RoR specifically.


> Ruby is the language people go to for work now

There are more jobs about Java and C# and PHP than on RoR. RoR jobs may be more frequent on sexy startups (the type that appears here).


Which is why I remarked about HN. =)

I'm not suggesting RoR has a lot of jobs compared to the Java or C#, but rather, it's not exactly devoid of job offers either. Especially within the context of HN.

So, apologies for not being clear about the context.


That's assuming that most profanity is directed at the language. It might be the case, but one may also be annoyed by a codebase, by an algorithm, by a bug, by a co-worker, by the business reason for the commit, or by something totally unrelated.

There can also be humourous swearing, and swearing for emphasis (see DHH), neither of which convey annoyance at all.

Hypothesis A: PHP projects tend to be smaller and simpler, with fewer interactions with other code/requirements/people, and therefore fewer opportunities for annoyance.

Hypothesis B: PHP is actually a very simple and close fit to its use-case, of unambitious webapps, and so often the tool just does its job, and disappears. There's no space to swear at a tool that you don't notice.


At what point does a webapp become "ambitious" and not suited for PHP? What about PHP with a framework such as, e.g., Cake? I'm curious because I'm embarking on a project and probably choosing Rails over Cake, but only because the other developer is a Rails guy.


They could be swearing in joy.

    # This new PHP library is fucking awesome!


I observe lots of groups in this situation. It took me eight years from first contact to first real project in C++ because I learned Smalltalk first. Hadn't I seen Smalltalk before, I would have thought C++ was just fine, instead of the abomination, the gigantic leap backwards it really is, and would happily use it like so many people do.

A lot of people love their tools just because they don't know better ones.


yawn

Speaking of frame of reference, people belittling PHP devs are mostly showing their own ignorance.


Your point holds true for all languages in the graph.


The only PHP devs using git are all super professional with the patience of saints?


Perhaps they don't write meaningful commit messages?


How is swearing more meaningful than not swearing in a commit message? Perhaps some developers don't like the idea of the entire world seeing them swear at - basically - text.


Naah. My joke - you got it was a joke, right? - was:

They don't write meaningful comments at all. Including those with swears, meaning at least something.

Joke #2: What would they have to put into commit messages, if you consider so much of PHP code is not written but copypasted? What to write then - "I copypasted it from X"?


I didn't get it.

But your point being that php code doesn't get commented much is right on.


There is not much PHP on Github.

It's the same reason there is no profanity it FORTRAN or COBOL in this graph.

The author makes a terrible mistake: he does not normalize the graph relative to the total amount of code in each language.

There's a lot of Ruby on Github, which is why you can find plenty of Ruby profanity.

The graph is just about useless without this normalization.


That's rendered irrelevant by the author using a sample with the same number of commits in each language.

Although I think it's fair to say the proportion of beginners and casual projects using source control and Github is probably lower for PHP than, say, Ruby.


Yep, my bad, I did not carefully read the article. :-(


The author said he ripped the same number of commits from each language, so it's naturally normalized.

The number of different projects ripped, on the other hand, could help skew the results of the unpopular languages.


There are maybe 10 to 15 ways the normalization doesn't manage to do what it's supposed to. Certain committers happen to use profanity more, for instance. Commit style and frequency. Tendency of programmers of language X to be acculturated somewhere that dampens their natural tendency to curse when frustrated.

For instance, my style of coding means I branch off, commit early and often, with flippant and meaningless messages, and then I squash and rewrite my entire change as a single, formal entity before releasing it for coworker consumption. I curse up a storm in my commit messages, but you won't see any of them unless you look inside my computer before I delete the working branch.

This is fun and funny. Nothing about it even starts approaching validity except for one statistic: "210 out of 929857", and even that gets shot down. The entire thing would have been better if he had included "hate" as a curse word.


I think you've forgotten what was being discussed, because your objection isn't relevant. The claim in dispute is: "he does not normalize the graph relative to the total amount of code in each language".


Gaming companies, start-ups, anyone looking for a "Ninja" or "Rock Star" all seem more likely to tolerate swearing and less likely to be using PHP. Additionally, I'd wager that PHP projects on GitHub tend to get fewer commits from hobbyists and other non-professionals.


My interpretation is that anyone still working in PHP is long resigned to its frustrations.


While it's fun to mock PHP for the uninitiated, it's actually probably something a bit more mundane.

Those working on pubic GitHub projects are probably doing so willingly. After all, if you hate PHP and are forced to use it at your day job, do you really want to go home and commit to PHP projects? Probably not.

So, what you're seeing are people not merely resigned, but who enjoy working in PHP, for all it's warts. They aren't just resigned.


But that should apply equally to all languages. One would expect the same to be true of, say, Ruby or C++, but PHP's slice of the pie is minuscule by comparison. So your mundane answer is probably correct, but it's answering the wrong question. The question isn't, "Why don't PHP developers swear at their language all the time?" It's actually, "Why do PHP developers swear at their language less than most others?"

And I posit that it's because PHP programmers at that level are less likely to be surprised or scandalized by the language's well-known quirks than somebody working with Ruby or C++ (i.e. they are resigned to its frustrations).


Good point. Good question.

Honestly, and I don't meant to stereotype here, but I think it's because the quality of PHP developer proportionate to the number of PHP developers on github is higher than in other languages. Granted, their are fewer total. But the reason for this is pretty simple. PHP is not the popular language, so most new developers taking up programming in the past 2-3 years see a different landscape then a decade ago. Github is SourceForge and Ruby is PHP. Poor PHP developers aren't flocking to github. They are still on SourceForge (not suggesting that all SF users are bad).

Basically, the PHP developer's on github are probably the more experience developers that have been doing this for a while. On the flip side, most Ruby developer's are still comparatively new to the language.

This is a lot of assuming, and I don't mean to disparage anyone.

That all being said, while I equated the quality of the programmer with the amount of swearing being done, I fully realize it's not a fair comparison. Granted, this entire thing is based on numbers that, I feel, are fairly meaningless.

Finally, what I find more interesting is how HN's quality takes a nose dive when the population is allowed to bash a particular population: in this case, people who enjoy using PHP.

Still, good follow up. =)


I hope it didn't look like I was bashing anyone. I think for the most part we're violently in agreement here.

Besides PHP Githubbers maybe being at a higher level on average, I think PHP's gotchas are just generally better known, so people go into it with eyes open. It doesn't take as high a level of experience to realize, say, that the standard library has wonky argument ordering as it does to find all the weird edge cases that will bite you in Ruby and horrific error messages that await in C++. Ask anyone with a decent knowledge of a few languages to criticize PHP and he'll have something memorized. Ask a Ruby programmer with five years' experience to compare and contrast "reduce(&:+)", "reduce(:+)" and "reduce('+')" and it's 50/50 whether he'll even get two right (even though the truth is that there's no difference!).


> I hope it didn't look like I was bashing anyone.

No, not you. It's why you got a reply from me that at least tried to sound intelligent. =)

But it does get tiring hearing the same useless tripe repeated, especially here on HN. PHP's worst quality is whenever it's mentioned, a bunch of know-nothing's start making what amounts to the technical equivalent of dick and fart jokes. I'll just stop there though. =)


If you're not deep into complex projects, PHP isn't very frustrating. It's pretty simple and to-the-point.

There are a lot of people out there who think it's truly fabulous, also. Usually, PHP is their first language and they've never seriously tried anything else. They like the accomplishment of programming, and connect that with PHP.


[deleted]


Or maybe they don't curse as much?


[deleted]


True.

Though, I wonder why Ruby programmers are immature and careless?

Stereotyping is fun!


No Perl? I fucking swear all the time...


I ran it for Javascript, Ruby, and Perl, and I got this:

{"JavaScript"=>48, "Ruby"=>46, "Perl"=>28}


Is it ok statistically to get for example all Ruby commits and 25% of C++ ones and compare them ? Another kind of chart would be nice... also some other params.


Why not? As long as the sample is random and equal in size.


Well played!

I wonder if this has anything to do with: http://news.ycombinator.com/item?id=2247962

I know I plan to comment my python code a little differently now! Maybe that will help balance the numbers?

I know I'd be pretty vulgar if I programmed in C++/Javascript all day!


I'm surprised there are so few commit messages with curse words in them. 210 out of 929857, thats like 0.02%, I would have thought that developers were more vulgar then that(I know I am).

Maybe if we looks at comments in source code we would get a better representation of the vulgarness of developers.


Interesting. Of course I am thinking of the many ways that the results might not be representative, but that doesn't make it any less of a cool weekend project.

Would be great to see some context around where the most _profanities_ occur by language, and the kind used.


One that came to mind is certain languages being more popular in english vs. non-english speaking countries.


Most swear words are international.

Especially the F bomb.


Yeah, this is funny, there is novelty here, etc. A story counting profanities in source code/commits/etc. pops up every now and then.

I've found that the only real profanity in a source code comment is "HACK".

My swear jar overflows with quarters.


Next time they do a test they should include "git". Let's see what happens.


My favorite:

- fuck it. let's release



Relevant:

https://gist.github.com/198320

A one-liner I wrote that uses git blame to seek out who swears the most in a given codebase. Pretty fun.


When I ran it just said "5" :S


This is an example of bad poor graphical representation. The proper way to do this would be to take the swear words per word for each language and then map this to a bar graph, then you could easily see which has the highest vs the lowest.

A pie chart is good for things that add to 100%. The number of swear words that occur in something is not appropriate for this type of graphic.


Considering that there were an equal number of commit messages for each language, this is a perfectly adequate representation. In this case, the pie chart's total is the number of swears, with each slice representing a language's share of that.


You’re entirely correct, except for this: The pie chart applies here because the author made no attempt to correlate number of profane commits with number of commits, total, per language. This post means literally nothing, so he could use whatever graph he damn well felt like.

This is not HN worthy.


Not statistically signifant, but interesting idea.


Hah, maybe I should add that to http://langpop.com - are you interested, Andrew?


This would be more interesting if it scanned comments from source files for profanity.


This is per commit message, which is mandatory in git, not occurrences inside the source of the project.

Perhaps PHP and Python has fewer occurrences because the employers of people that use Python or PHP are less likely to tolerate inappropriate language in the code.


> Perhaps PHP and Python has fewer occurrences

Maybe Python programmers are just happy and well adjusted people...


Or we swear in comments not commits :D


Or, perhaps people developing in these languages are in a more zen like state? :)


Seems that python devs are practicing the Zen of Python :-)


[deleted]


PHP is a fine language once you understand the warts and accept the few that exist. While it's not everyone's cup of tea, what it does, it does well. Far too many people program PHP for a couple years (or not even) and proclaim it's a horrid language. Ruby is the it language, and Python is close behind. These are the Pop languages of the day. But PHP developers who are still using the language know it. They know it really well. And just consider how long PHP has been around and been popular?

I think a lot of people that would whine and complain about PHP aren't doing so on gh. Instead, they are using python, rails, or other languages. The ones using PHP, however, know it, and probably aren't swearing because they are probably just writing good software.


github.


Even ignoring that which other commenters have pointed out, I simply don't buy it for one reason: that PHP slice is way too small.


This goes along with my observations, having read a lot of PHP code.

There's a certain level of professionalism in PHP projects, believe it or not, that is different from what you see in Ruby, Python, C++, etc. The difference? Many PHP programmers are not really hackers, they're just trying to get something done. They're not programming for the joy of it, for the most part, they're just trying to get some plug-in done for a CMS for some client.

The code and comments often end up structured and informational, in a Java kind of way.


Yeah it's oddly disturbing. My thoughts are that they just don't care that much about life anymore to swear in their commits?


> Yeah it's oddly disturbing. My thoughts are that they just don't care that much about life anymore to swear in their commits?

Or maybe we're all just mature enough to realise swearing in comments is stupid, and trying to read anything from it is equally stupid?


It'd be interesting to see the original dataset. At the risk of stereotyping, what's the chances php programmers simply don't leave useful commits? (and perl programmers don't leave anything parsable ..)


Hmm, what's the % of PHP code on github?


No Objective-C love for the iPhone/iPad crowd?


"Shit, I missed a ]"


Sample size is too fucking small. Also, as others have pointed out, you've allowed far too few swearwords.

Good job though :-)


One might assume that more profanity in a langage = more frustration with that language. But I'd bet that proportion of business use has something to do with it, too.

If you're hacking on a personal project, you might feel freer to swear in your commits. And my guess is that you're more likely to code Ruby for personal projects than C#. But I could be wrong.


Numbers of committers sampled per language might be helpful in identifying potential bias.


Perl might beat C++ if it was included.


i'd like to see this same comparison, except comparing different version control systems


how about 'chainsaw'?


Go Ruby go!! ;D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: