Yeah, any viable list of swear words has to include "damn" (and derivatives), "hell", and "ass" (and derivatives). I'd even go so far as to say that "crap" and "retard" (and derivatives) are sufficiently unprofessional that they belong on the list.
Also was fuck searched as a word on its own? Because if you include compound words with fuck you are going to catch a long tail of interesting profanity, especially amongst programmers.
You'd want to whitelist against the Scunthorpe Problem though.
And if one goes that far, one would have to include variant spellings, because in the heat of the moment of cursing an opaque or broken piece of code, a programmer is likely to froth at the keyboard.
Hmm, that's not very comprehensive for this purpose.
Years ago at a former employer, we discovered just after shipping a large quantity of quite sensitive demo materials that an outside contractor had managed to slip a hidden profanity into it. Oh, the joy that caused....
The immediate reaction though was that a few of the less polite and proper members of the organisation were tasked with producing as close to an exhaustive profanity list as they could so we could do a relatively thorough sweep. From memory that list was pushing 40 terms - I think I might still have a copy somewhere but the last thing this discussion needs is more swearing!
Years ago I developed a custom CRM for a client that included reminders. While testing the production version of the system pre-launch, I added many fake reminders, for users who happened to be real, along with a lot of other test data. I then cleared all of the test data...except I accidentally skipped the reminders table.
For weeks afterwards, my clients would receive profanity-laden reminders, helpfully labeled as "from Adrian".
Luckily, they had a good sense of humour about it. But since then, I've never used profanity in any kind of test data.
I never use anything remotely non-serious in my test data. I even try make comments type on forms in tested seem at least close to real. Silly names are out too. You never know when some fool is going to show a visiting client a development system instead of one of the official demos, and I've been stung by someone not being amused by the existence of Don Kiddick and Mike Hunt in my sample dataset.
We have a table full of unsavory words due to generating more than one password that upset someone. At one point someone decided we needed "pronounceable" passwords for our new users without considering the consequences.
I had a brief play once for this sort of thing with doing stem-based pronounceable word generation based off a starting dictionary; the idea being that it broke them into fragments marked as beginning, middle or end and used intersections of these to build words that followed letter use rules of known words but yet weren't known words themselves. You start getting distance problems though - a lot of words it generates are too close to or are trivial encapsulations of existing words, so that'd need taking account of. One day I'll have a proper play and do something more comprehensive for this.
Tip for anyone cacheing unsavoury words - from memory the bulk of what we ended up checking for were racial slurs, of which there's an astonishing (and unsurprisingly heavily localised) variety.
Why is it not misogyny? Would you drop the phrase "Sorry, niggers." so casually into conversation? Probably not (after all, it's racist), but it still uses a segment of society in derogatory fashion.
It's not misogyny because, according to Merriam-Webster's 11th edition, misogyny means "a hatred of women," and the original comment wasn't about that at all.
And it has nothing to do with a segment of society. You're complaining about someone half-jokingly calling us all whiny babies.
"damn" is no longer an issue because most people aren't terribly religious nowadays, at least not enough for people to censor themselves. That's not the case for "bitch".
What your argument tries to do is sneak a false dichotomy onto the table.
It's absolutely true that people are responsible for their feelings. But that doesn't mean you aren't also responsible for your choice of words. By saying that I am responsible for how I feel when I hear the word "bitches," you are trying to imply that you aren't also responsible for choosing to say it. It's perfectly valid to say that we are BOTH responsible for our choices. I should choose to ignore you, and you should choose another word. There is no need to say that one or the other but not both of us should be responsible.
I find your arguments along these lines to be passive-agressive. If you want to hurt other people with words, own up to wanting to hurt other people with words. Don't pretend that it's everyone else's fault. Because surely, if nobody took offense to these particular words, you would hunt around until you could find words that would cause offense.
You choose to use these words and phrases precisely because they have shock value. It's not like you use the word and are surprised it carries some special meaning that offends people. I see elsewhere you have told people to "fuck off." Are you seriously suggesting you weren't trying to give offense? Because if it isn't possible to give offense, why are you trying so hard to offend people??
Greeting groups of people as "sup bitches" is a popular culture phenomenon. It's not used for shock value or to give offence (at least in my social circles....).
My use of both "profane" and non-profane offensive comments in my comments is to drive home my point. It is not intended to hurt others, but merely make them think to themselves "frack it, I'm ignoring him.""What he says would normally offend me, but I'm going to have a nice evening with my wife and kids instead" ...or something along those lines.
If I get anyone to that point then maybe they'll know that in the future, if some anonymous dude dares use the dreadful word "bitch" on the internet, it's best for the sanity of everyone to let it slide.
"Because if it isn't possible to give offense, why are you trying so hard to offend people??"
-gets back in character...- Well I'd say there is a keen difference between "giving offence", which I am of course not doing, and "presenting others the opportunity to take offence". The choice is theirs.
Well I'd say there is a keen difference between "giving offence", which I am of course not doing, and "presenting others the opportunity to take offence". The choice is theirs.
That's the most cowardly thing I've heard all day. But it's only lunch time, so we'll see how it goes.
So much for my attempts to discuss this intelligently, instead of emotionally. I going to consider this one more point of data in support of my hypothesis...
That's hair splitting terminology. If you say or do something which you know the majority of people will find offensive, then that's effectively giving offense.
You didn't answer the original question, either: would you drop the phrase "Sorry, nigger." into casual conversation? If not, why not?
Hardly irrelevant - it's called reductio ad absurdum. If you limit yourself by not using certain words in conversation, then you're confirming that those words are generally offensive, and that using them would give offense.
It's why the legal system has the concept of the "reasonable person", but I prefer "Do you talk to your mother with that mouth?"
The only reason I do not casually use the word nigger in public is because unlike god, HR departments appear to be omnipresent.
"Do you talk to your mother with that mouth?"
Yes, I do. And if that bothers you, kindly fuck off. You have no place telling me what I may or may not say in the presence of my mother.
If an adult loses their temper in public, they are rightfully looked down upon. If an adult publicly becomes offended, I similarly look down upon them.
Right - so you're a racist, misogynistic idiot who'd rather put up pithy one liners and tell me to fuck off than address the point. Glad we got that cleared up.
Says the poster who was previously complaining about ad hominems...
No, I'm just an adult who recognizes that words only have the power that you give them. If you wish to continue to allow your emotions to be a slave to the language of others, that is your prerogative; but don't expect others to follow you.
And yes, I addressed your question. Read the parent comment of your post again.
You didn't address the question, you dropped a pithy one liner about HR departments. And you're still avoiding the issue by playing silly philosophical games. Of course words have meaning and power, particularly ones which are derogatory towards sections of society.
And that's all I'm going to say - this thread has gone on long enough.
The meaning of my "pithy one line", which you seem to be completely blinded to for some reason, is that I do not say the word "nigger" casually, but not out of concern for anybodies feelings. Rather, I avoid the word solely because I am paranoid of HR departments, and all things associated. This meaning seemed clear enough to other observers/participants.
Says a lot about you that the inclusion of minorly offensive words or implications can cloud your ability to critically interpret what others are saying. When you allow somebody else to offend you, or make you angry, you are allowing them to impair your thought processes. You are giving others the power to control you.
In order to defend yourself, you must realize that although somebody says something mean, it is up to you if you get angry, and although somebody may say something offensive, it is up to you if you become offended. In the real world, people are going to say shit you don't like. Trying to "correct" their behaviour is the wrong approach.
Not sure the HR department point is a 'pithy one liner'. If they didn't have their power I'd be singing along with Kanye too (there's a difference between using the word 'nigger' which is not racist, and calling someone a 'nigger', which is).
You're showing your disconnection with popular vernacular with your assertion. I'm 22 and girls around me call each other bitches casually (in a friendly, fun way) all the time.
You certainly don't speak on the behalf of any women I know.
Is it because your post is a bunch of poorly conceived bullshit? Is it because you don't really give any time to consider the intent of what's being said and the social context of the terms employed? Is it cause you're looking to be offended?
If you can't provide some analysis of the origins of thoughts and feelings which are presumably your own, maybe you should keep them to yourself till you can.
I never curse in my commit messages. That doesn't mean I don't want to! Cursing is a vice of mine, acquired through summers of cleaning bathrooms and picking up trash at a state park in high school. I use euphamisms when coding professionally, but it's easy to map my commit messages at old companies back to my original swear.
"Blameless" bug:
Original: Now recalculates the height of the container element after repopulating
the content.
Translation: Did Bob test this fucking thing ONCE before he committed this?
Fixing my own mistake:
Original: Tweaks the NUM_PATHS config value.
Translation: Wow, I apparently have shit-for-brains. I hope nobody ran a build in
the past 20 minutes.
Overdesigning:
Original: Updates the object creation code per Bob's feedback.
Translation: Another Goddamn FactoryFactoryBuilder?! I officially don't
understand this codebase.
Major cleanup needed:
Original: Style tweaks needed for GCC compilation.
Translation: OMFG. This isn't even valid C++. It doesn't even compile.
OK, I'm not perfect:
Original: Fuck IE7.
Translation: No seriously, fuck IE7.
I'd kinda like to see which swear words appear most often in commit messages. I'm guessing that "shit" and "fuck" are much more common than "cocksucker" and "motherfucker", and if that's not true, I want to know which language has the most cocksuckers and motherfuckers.
Yeah, the pie chart doesn't quite cover it - I'd like to see both swear words per commit per language (if, say, Java has 10% of the swear words but 3% of the commits) and complexity of the swear words - a simple "Fuck" implies far less frustration than a "Motherfucking Cocksucker!"
Could develop quite a nice Programming Language Pain Index…
I run a slang dictionary website which lets users assign an offensiveness score to each term. That would be an interesting bit of data to add: not only the raw word count, but how offensive the swearing is for each language.
i'd say alter the list of swear words in general to a list possibly more tuned to programming. agreeing with above, word breakdown would be nice as well.
How to offend members of 3 different programmer communities in 9 different ways with just one sentence: "It somehow makes sense that C++, Ruby, and JavaScript are all equally profane."
If you think that's clever, then perhaps you're lacking a few bits of programming language history. If you think it's only partly clever, you're perhaps lacking fewer pieces. If the parent comment just makes you roll your eyes and think it's probably not worth explaining, you probably understand what I'm getting at.
Given that there were only 210 total swear words, the accuracy of this seems pretty questionable. It's possible that one guy could be responsible for a large percentage of swearing for a given language.
It's code comments vs. commit messages, but the prevalence of profanity in the Linux kernel tree suggests developers' use of blue speech is pretty widespread.
Pie charts have a lot of drawbacks, sure, but it's ridiculous that we're at the point now where the first (and highest rated) response to a pie chart is always a negative comment about pie charts, regardless how good or bad the pie chart is.
This one in particular is very clear:
C++, Ruby and Javascript have the most profanity. They're relatively equal to each other and collectively account for more than 50% of the swearing in commit messages.
C is next, with significantly less swearing.
C# and Java are roughly tied a bit below C.
Python and PHP have, comparatively, almost no swearing.
Was that really so hard? When the data is already subjective (what is and isn't a swear word) and intended almost solely for humor, do we really need more precision than a pie chart offers?
It is at best hyperbolic and at worst dishonest to say you "have no idea" how to interpret this. You have an idea. You just don't have precision.
> C++, Ruby and Javascript have the most profanity. They're relatively equal to each other and collectively account for more than 50% of the swearing in commit messages.
this is the problem. In the pie chart it's almost impossible to determine which of those three has the most. In the bar chart, it's fairly obvious to my eye that C++ wins, though JS/Ruby are very close.
Rather than being organized by language names, the items in the pie graph should have been grouped by size (largest at 12, proceeding clockwise to the smallest at 11:59, for example). What relationship is there to show between the grouped names of programs that outweighs making this clear?
Dude, no. I think he was talking about how you can't tell how the size of the user base of a language is affecting the ranking. So, for example, only 1% of all projects could be in Java, but the swearing could be frequent enough to make it have ~15% of all curse words.
I see no reason to believe that, given his process for ripping an "equal" number of commit messages per language was broken, that anything else even approaches validity. It's simple arithmetic; a grade schooler who notices that the last number is 7 would realize something's off.
What about the process is broken? Did you read the code and find bugs? With a total commit count of 929857 missing a single commit to round out to a perfectly even number of commits in each language is insignificant.
Ahem, "Note that I ripped an equal amount of commit messages per language". I do think this is a bad place for a pie graph, but your specific criticism here is misplaced.
This was in response to a now deleted comment that claimed that more popular languages would show up as having more profanity because they have more commits, even if the profanity per commit was constant.
A good weekend project would be to take an existing graphing library and make a wizard for it that would create a correct type of graph based on the data and your stated intentions with the data, as shown in the flowchart above.
That is not the case. He wanted to compare curse words across languages independent of language popularity. If he did not collect the same amount of data per language, then he would have two variables: number of curse words and number of commits. Then there would be the danger that a more popular language would have far more curse words simply because it has far more commits.
I wouldn't call it to 'tweak' the data collection. He is simply normalizing the results to ignore the differences in language distribution.
This is normal and has nothing to do with how you choose to represent it.
It would have been meaningless to show any graph or table saying 'Python has the most messages with profanity" if the amount of Python projects is 80% of all the projects out there.
He is right to normalize the results, but parent's point is that he is wrong to do that by modifying his data collection.
He should just collect as many commit messages as possible, then divide the profanity count for each language by the commit message count. Because that has lower standard error [and no more bias] than what he did.
That's not the case, that's just his personal choice. He could just as well have gone with %age of swear words per commit which would have made the number of commits per language irrelevant (as long as that number was kept above statistical nonsense) and would have yield the same result.
A neat idea, although I think the pie chart isn't really the right format. I'd prefer to see a bar graph, with the y-axis as (swears/million messages) or similar.
and please the colors. Why the two greens and two blues? Use either two colors (to differentiate alternative elements) - you would not need this if this were a bar chart (which it should be) or use 8 markedly different colors.
A small child doesn't think a crayon is badly designed until he has used a pencil or a pen. Without a frame of reference, a PHP developer has little reason to swear at the code.
Am I the only one on HN that's about sick to death of the "PHP developers are children/idiots/bad programmers" meme? I can (and have) been paid to code in all of the following languages:
C
Perl
JavaScript
PHP
I also occasionally code in Python and assembly for fun.
When it comes to web projects PHP is my goto language because it works in all web environments without a bunch of bullshit surrounding installation, versioning or deployment. It's 2011 people, are you telling me developers are still trying to establish some sense of shiny self importance based on language choice? Sad.
> Am I the only one on HN that's about sick to death of the "PHP developers are children/idiots/bad programmers" meme?
Most of the developers are bad. Assuming a normal distribution, pretty much half of them are below the average and average is not usually something to brag about.
That said, languages that stand a higher chance of generating employment should attract those that don't want to learn new programming languages (something good programmers tend to do) and want to make their bets on sure technologies. I expect programmers interested in Lisp, Haskel, Erlang, Python, Ruby, Lua, Scala, Forth, Smalltalk to be, on average, better than those who choose Java, C#, PHP or Visual Basic. Of course, not all of those who choose these languages are bad programmers (and the opposite is also true), but I bet you will find more mediocre programmers on the second-group.
I'm sorry but you're full of shit. "Most of the developers are bad" is a faith-based unquantifiable statement. In all the project work I've done (open source and proprietary) over the last four years I've encountered two (2) truly miserable PHP developers. One works for a newspaper company (I guess they couldn't afford talent), the other is so new he's still learning the implications of for() vs while() and since the guy's actually really smart just a smidge of mentoring and some experience and he'll train up nicely.
"languages that stand a higher chance of generating employment should attract those that don't want to learn a new programming language" I think you're confusing PHP and Visual Basic/.NET here. Based on hit counts of job posts there are way more job opportunities for MS developers than PHP developers and based on median salary data coding .NET is worth about 20k a year more, so if they're rational actors the "just in it for a paycheck" crowd should be migrating to Windows development.
Your language salad comparisons are likewise faith-based. How about we trade anecdotes instead? I've met a total of six Ruby developers during my professional career. To a man they were immature, arrogant and totally in love with the hype surrounding their language of choice. Four of the six could be fingered for running multi-million dollar projects into the ground courtesy of a year and a half long series of hipster love-in's masquerading as a SCRUM meetings. Meanwhile during the same time period myself and one other developer launched 65 sites on four platforms with a combined total of around 1 million in annual ad revenue generated for the company. You just keep singing your song, man. I'll keep shipping shit.
Edited to add: just keep on downvoting, truth hurts.
Calm down. I agree with your anecdotes and have observed similar events and people.
Still, if the highly abstract programmer competency follows a normal distribution (and I admit a leap of faith there), you'll agree with me that half of the programmers out there will end up below the average. You'll also agree with me that just above the average doesn't make a good programmer, so, most of the programmers around us will be bad ones.
I don't believe you were downvoted for saying painful truths (which I don't think you did), but more on the form of your statement. I believe most downvoters didn't pass the first sentence. I believe this is a good thing - poor form is bad for discussion. We shouldn't get emotional.
It's hard to remember this is HN when walking into threads like these. As I mentioned elsewhere, if PHP is mentioned, HN maturity takes a nose dive as people make what amounts to dick and fart jokes. It's embarrassing.
> I don't believe you were downvoted for saying painful truths (which I don't think you did), but more on the form of your statement. I believe most downvoters didn't pass the first sentence. I believe this is a good thing - poor form is bad for discussion. We shouldn't get emotional.
It's not because of his tone. It's because he's defending PHP. Look at other posts in this thread. Look at the bitches thread. Littered with immaturity and a far worse tone than the parent here. Sure, the tone didn't help here, but if his tone had been the same, but defending Ruby, or Lisp, or some other fashionable language, he wouldn't have been. Even your post smells like this.
One more thing...
> I expect programmers interested in Lisp, Haskel, Erlang, Python, Ruby, Lua, Scala, Forth, Smalltalk to be, on average, better than those who choose Java, C#, PHP or Visual Basic.
Ruby is the language people go to for work now. Why? Because it's the in language, especially in the context of HN. And its' not just Ruby, but RoR specifically.
I'm not suggesting RoR has a lot of jobs compared to the Java or C#, but rather, it's not exactly devoid of job offers either. Especially within the context of HN.
So, apologies for not being clear about the context.
That's assuming that most profanity is directed at the language. It might be the case, but one may also be annoyed by a codebase, by an algorithm, by a bug, by a co-worker, by the business reason for the commit, or by something totally unrelated.
There can also be humourous swearing, and swearing for emphasis (see DHH), neither of which convey annoyance at all.
Hypothesis A: PHP projects tend to be smaller and simpler, with fewer interactions with other code/requirements/people, and therefore fewer opportunities for annoyance.
Hypothesis B: PHP is actually a very simple and close fit to its use-case, of unambitious webapps, and so often the tool just does its job, and disappears. There's no space to swear at a tool that you don't notice.
At what point does a webapp become "ambitious" and not suited for PHP? What about PHP with a framework such as, e.g., Cake? I'm curious because I'm embarking on a project and probably choosing Rails over Cake, but only because the other developer is a Rails guy.
I observe lots of groups in this situation. It took me eight years from first contact to first real project in C++ because I learned Smalltalk first. Hadn't I seen Smalltalk before, I would have thought C++ was just fine, instead of the abomination, the gigantic leap backwards it really is, and would happily use it like so many people do.
A lot of people love their tools just because they don't know better ones.
How is swearing more meaningful than not swearing in a commit message? Perhaps some developers don't like the idea of the entire world seeing them swear at - basically - text.
Naah. My joke - you got it was a joke, right? - was:
They don't write meaningful comments at all. Including those with swears, meaning at least something.
Joke #2:
What would they have to put into commit messages, if you consider so much of PHP code is not written but copypasted? What to write then - "I copypasted it from X"?
That's rendered irrelevant by the author using a sample with the same number of commits in each language.
Although I think it's fair to say the proportion of beginners and casual projects using source control and Github is probably lower for PHP than, say, Ruby.
There are maybe 10 to 15 ways the normalization doesn't manage to do what it's supposed to. Certain committers happen to use profanity more, for instance. Commit style and frequency. Tendency of programmers of language X to be acculturated somewhere that dampens their natural tendency to curse when frustrated.
For instance, my style of coding means I branch off, commit early and often, with flippant and meaningless messages, and then I squash and rewrite my entire change as a single, formal entity before releasing it for coworker consumption. I curse up a storm in my commit messages, but you won't see any of them unless you look inside my computer before I delete the working branch.
This is fun and funny. Nothing about it even starts approaching validity except for one statistic: "210 out of 929857", and even that gets shot down. The entire thing would have been better if he had included "hate" as a curse word.
I think you've forgotten what was being discussed, because your objection isn't relevant. The claim in dispute is: "he does not normalize the graph relative to the total amount of code in each language".
Gaming companies, start-ups, anyone looking for a "Ninja" or "Rock Star" all seem more likely to tolerate swearing and less likely to be using PHP. Additionally, I'd wager that PHP projects on GitHub tend to get fewer commits from hobbyists and other non-professionals.
While it's fun to mock PHP for the uninitiated, it's actually probably something a bit more mundane.
Those working on pubic GitHub projects are probably doing so willingly. After all, if you hate PHP and are forced to use it at your day job, do you really want to go home and commit to PHP projects? Probably not.
So, what you're seeing are people not merely resigned, but who enjoy working in PHP, for all it's warts. They aren't just resigned.
But that should apply equally to all languages. One would expect the same to be true of, say, Ruby or C++, but PHP's slice of the pie is minuscule by comparison. So your mundane answer is probably correct, but it's answering the wrong question. The question isn't, "Why don't PHP developers swear at their language all the time?" It's actually, "Why do PHP developers swear at their language less than most others?"
And I posit that it's because PHP programmers at that level are less likely to be surprised or scandalized by the language's well-known quirks than somebody working with Ruby or C++ (i.e. they are resigned to its frustrations).
Honestly, and I don't meant to stereotype here, but I think it's because the quality of PHP developer proportionate to the number of PHP developers on github is higher than in other languages. Granted, their are fewer total. But the reason for this is pretty simple. PHP is not the popular language, so most new developers taking up programming in the past 2-3 years see a different landscape then a decade ago. Github is SourceForge and Ruby is PHP. Poor PHP developers aren't flocking to github. They are still on SourceForge (not suggesting that all SF users are bad).
Basically, the PHP developer's on github are probably the more experience developers that have been doing this for a while. On the flip side, most Ruby developer's are still comparatively new to the language.
This is a lot of assuming, and I don't mean to disparage anyone.
That all being said, while I equated the quality of the programmer with the amount of swearing being done, I fully realize it's not a fair comparison. Granted, this entire thing is based on numbers that, I feel, are fairly meaningless.
Finally, what I find more interesting is how HN's quality takes a nose dive when the population is allowed to bash a particular population: in this case, people who enjoy using PHP.
I hope it didn't look like I was bashing anyone. I think for the most part we're violently in agreement here.
Besides PHP Githubbers maybe being at a higher level on average, I think PHP's gotchas are just generally better known, so people go into it with eyes open. It doesn't take as high a level of experience to realize, say, that the standard library has wonky argument ordering as it does to find all the weird edge cases that will bite you in Ruby and horrific error messages that await in C++. Ask anyone with a decent knowledge of a few languages to criticize PHP and he'll have something memorized. Ask a Ruby programmer with five years' experience to compare and contrast "reduce(&:+)", "reduce(:+)" and "reduce('+')" and it's 50/50 whether he'll even get two right (even though the truth is that there's no difference!).
> I hope it didn't look like I was bashing anyone.
No, not you. It's why you got a reply from me that at least tried to sound intelligent. =)
But it does get tiring hearing the same useless tripe repeated, especially here on HN. PHP's worst quality is whenever it's mentioned, a bunch of know-nothing's start making what amounts to the technical equivalent of dick and fart jokes. I'll just stop there though. =)
If you're not deep into complex projects, PHP isn't very frustrating. It's pretty simple and to-the-point.
There are a lot of people out there who think it's truly fabulous, also. Usually, PHP is their first language and they've never seriously tried anything else. They like the accomplishment of programming, and connect that with PHP.
Is it ok statistically to get for example all Ruby commits and 25% of C++ ones and compare them ?
Another kind of chart would be nice... also some other params.
I'm surprised there are so few commit messages with curse words in them. 210 out of 929857, thats like 0.02%, I would have thought that developers were more vulgar then that(I know I am).
Maybe if we looks at comments in source code we would get a better representation of the vulgarness of developers.
Interesting. Of course I am thinking of the many ways that the results might not be representative, but that doesn't make it any less of a cool weekend project.
Would be great to see some context around where the most _profanities_ occur by language, and the kind used.
This is an example of bad poor graphical representation. The proper way to do this would be to take the swear words per word for each language and then map this to a bar graph, then you could easily see which has the highest vs the lowest.
A pie chart is good for things that add to 100%. The number of swear words that occur in something is not appropriate for this type of graphic.
Considering that there were an equal number of commit messages for each language, this is a perfectly adequate representation. In this case, the pie chart's total is the number of swears, with each slice representing a language's share of that.
You’re entirely correct, except for this: The pie chart applies here because the author made no attempt to correlate number of profane commits with number of commits, total, per language. This post means literally nothing, so he could use whatever graph he damn well felt like.
This is per commit message, which is mandatory in git, not occurrences inside the source of the project.
Perhaps PHP and Python has fewer occurrences because the employers of people that use Python or PHP are less likely to tolerate inappropriate language in the code.
PHP is a fine language once you understand the warts and accept the few that exist. While it's not everyone's cup of tea, what it does, it does well. Far too many people program PHP for a couple years (or not even) and proclaim it's a horrid language. Ruby is the it language, and Python is close behind. These are the Pop languages of the day. But PHP developers who are still using the language know it. They know it really well. And just consider how long PHP has been around and been popular?
I think a lot of people that would whine and complain about PHP aren't doing so on gh. Instead, they are using python, rails, or other languages. The ones using PHP, however, know it, and probably aren't swearing because they are probably just writing good software.
This goes along with my observations, having read a lot of PHP code.
There's a certain level of professionalism in PHP projects, believe it or not, that is different from what you see in Ruby, Python, C++, etc. The difference? Many PHP programmers are not really hackers, they're just trying to get something done. They're not programming for the joy of it, for the most part, they're just trying to get some plug-in done for a CMS for some client.
The code and comments often end up structured and informational, in a Java kind of way.
It'd be interesting to see the original dataset. At the risk of stereotyping, what's the chances php programmers simply don't leave useful commits? (and perl programmers don't leave anything parsable ..)
One might assume that more profanity in a langage = more frustration with that language. But I'd bet that proportion of business use has something to do with it, too.
If you're hacking on a personal project, you might feel freer to swear in your commits. And my guess is that you're more likely to code Ruby for personal projects than C#. But I could be wrong.
That set only includes [shit, piss, fuck, cunt, cocksucker, motherfucker, tits], so these are probably not meaningful results.
I have personality commented "asshole forgot to increment the counter" 527 times in 4 different languages.
[EDIT: 528 times in 5 different languages. Sorry, bitches.]