I'm very curious about this...how did they collect their data? Did they spider/screen scrape HN, or were they able to get a data feed/dump?
I've mentioned before that I'm interested in making some better algorithms for social news sites (I'm learning Scheme/Arc so that I can play around with news.arc); I think one very neat feature that could be added to social news sites is a sort of 'tracker' -- just like torrent trackers have a standard URL that clients can reach certain data at, it would be very cool for social news sites to decide on a data format whereby spiders and the like can get data feeds on stories, user "karma" and so forth.
To expand on that, you could even normalize the scores from each site, and build a social news "karma aggregator" that monitors lots of these feeds, and ranks your username across all of the sites!
EDIT: When I get up to speed on Scheme/Arc, I think I'm going to write a 'stats.arc' page to link into news.arc somehow. Is that a feature that anyone else thinks is useful? I'm talking detailed stats analysis/heuristics here, not just the 'top XX users' or whatever.
This was submitted 561 days ago (http://news.ycombinator.com/item?id=109654) but I think it deserves a repost. I have been lurking here a while and only just discovered it. There are some real gems (stories/discussions) to be explored this way.
I started looking at 'Users with Most Points per Submission' and lost an hour before I knew it. Oh, and I guess if you want to rig the system, then 'Top Domains by Points per Submission' could be of help.
Maybe some of you haven't seen this either so I thought I'd share.
I agree with this same sentiment, even if some would argue reposts are redundant or even a bug from some communities who aggregate news similarly, but when it comes down to it, I think reposts highlights a key attribute of 'what a we're all about' or 'what kind of products/stories describe us best', reposts serve as an interesting indicator of any community's (at least those communities who measure their users according to karma) core principles and values.
If anything, Top Reposts is certainly worth iterating as an additional list for top.searchyc, although scraping it could be tricky
BTW, it's a bit misleading how many comments my essays have, because for a while I used links to HN comment threads as my "blog comments." Most of those comments aren't from regular HN readers.
I've mentioned before that I'm interested in making some better algorithms for social news sites (I'm learning Scheme/Arc so that I can play around with news.arc); I think one very neat feature that could be added to social news sites is a sort of 'tracker' -- just like torrent trackers have a standard URL that clients can reach certain data at, it would be very cool for social news sites to decide on a data format whereby spiders and the like can get data feeds on stories, user "karma" and so forth.
To expand on that, you could even normalize the scores from each site, and build a social news "karma aggregator" that monitors lots of these feeds, and ranks your username across all of the sites!
EDIT: When I get up to speed on Scheme/Arc, I think I'm going to write a 'stats.arc' page to link into news.arc somehow. Is that a feature that anyone else thinks is useful? I'm talking detailed stats analysis/heuristics here, not just the 'top XX users' or whatever.