Excuse me for not being objective, but a bit more interesting statistics of tags from Stack Overflow is here
https://www.kaggle.com/c/predict-closed-questions-on-stack-o... (average and median reputation of persons posting questions, % of non-closed questions, average number of additional tags, median body length, freshness of the topic).
That comment is for a totally different study. The study only includes users with an age in their stack exchange profile. It is not related in any way to if they have a social media account.
Under "Statistical Analysis" the "Weighted-Mean" tag expected age is 72. Under "Mathematics" the "Non-Standard Analysis" tag expected age is 19. I suspect samples of 1 for each of these tags.
This is fun to look at but I wouldn't base any conclusions on these averages. Providing histograms for each tag along with some summary statistics such as sample size, min, max, and median would give a lot more transparency into the dataset.
The absense of detailed methodology makes me doubt the worth of this number dump. As a start it would be good to know the number of users that led to the average numbers. And anything about the representative quality.
Yeah I too would be curious to see how the age is distributed among ALL the SO users and tags rather than the "top users" and their social media profiles.
Actually you are seeing what you want. The social media stuff is about a related study for mining the social media links for top users and not for the expected age study. The expected age study was for all users that have an age in their StackExchange profile.
Does anybody know how he actually knows the age of the SO members? He wrote "I call the statistic the Expectd Age because it is calculated using Expected Value from statistics" but that doesn't mean much to me.
"I mined the Stack Overflow (and related sites) data dumps to extract the social networking accounts of the top users. Each list includes only the users who have social networking links in their profile ..."
Hm. I would guess younger people are more likely to have their age on their fb profile - to make this more than just a random collection of data, you'd need to check that as well. Also, is there a correlation between age and whether you include your social networking profile on your stack overflow profile in the first place?
Would be interesting to know how he did it in detail.
Twitter does not contain age, neither does LinkedIn (or at least not show it)
And in facebook you need to jump through loops like user_birthday and friends_birthday[1]. I don't know how to get the birthday of 3rd parties.
I think that's an issue in his study. It means that the age is restricted to the age range of Facebook users (and maybe to the younger FB users, who are more likely to give their date of birth).
That was what I was looking for - php and node.js seem most popular with younger people, but not by much. What is surprising is that game development is much more popular with younger people than web development.
That mostly younger generation uses Stack Overflow and thus age-over-languages cannot be extracted from the data analysis.
We are just learning statistics and probability and i think this happens if the sample data is incorrect or not diverse enough.
I would freain' love a "Wikipedia: age of people rapidly using automatic tools (such as rollback; twinkle; etc)", and then extend that to "How many of those were good changes vs how many of those were poor changes vs how many of those are discussable changes".
And my full project, Tag Graph Maps of Stack Exchange: https://github.com/stared/tag-graph-map-of-stackexchange/wik...