Amazing work chidog12,
I've tied using TextRank for text summarization before, but the results are mostly hit or miss.
I've found that, elements in the page like "Follow this reader", Share on twiiter, facebook frequently oocur in these articles and due to the voter based ranking of TextRank , they get picked up as high ranked sentences.
Which of the algorithms mentioned in your site were most effective in extracting useful summaries?
I'm curious about your semantic analysis process. Do you employ a particular threshold for neutrality, or does the team provide input?
No doubt neutral reporting is advantageous; personally I would find it helpful to also see both a highly positive and negative take. Coming at a topic from both high and low may, imho, improve accuracy of comprehension.
Hmm, I really like that last point you've made... both highly positive and negative takes, I'll look into that.
For the semantic analysis process, we are using Microsoft's API - scores from 0-1. And for the most part, it is pretty good based on our views on neutrality (which of course has some inherent biases).
0.5 is considered pretty indifferent, however, we allow for a threshold of 0.35 - 0.75 as neutral, but we specify it's lean on our end.
On Reddit, we've tested our summaries in the comments for tl;dr and we actually indicate the articles lean for example "This article is neutral with a negative lean on the topic".
In Finance, top publications do a great job on maintaining neutrality. However, if we ever get into politics... we'll need to take this a little more seriously!