Hacker News new | past | comments | ask | show | jobs | submit login

I'd be curious to see how this does compared to models trained on more professional datasets than reddit tldr.

For example, train a model(s) by reading every single article (including paywall/cache replacement) of https://www.techmeme.com/river https://www.mediagazer.com/river https://www.memeorandum.com/river https://www.wesmirch.com/river https://ballbug.com/river and comparing it to the summary headline.




https://github.com/CurationCorp/curation-corpus

There's a dataset that would help there.


I’d still be curious how that compares to techmeme’s headline rewriting since 2013/2015.


What are these websites??


https://www.cnbc.com/amp/2017/03/22/meet-the-man-whose-site-...

https://www.businessinsider.com/techmeme-growth-2014-3

The river is the reverse chronological order similar to https://hckrnews.com

If you go back to the main page of tech meme, and hover over a story an arrow will appear on the left. Click it to see follow up stories to the first story.

The first two techmeme and mediagazer at least, do headline rewriting to debuzzfeed/deupworthy clickbait headlines. https://finance.yahoo.com/news/aggregators-attack-techmeme-h... https://www.poynter.org/reporting-editing/2015/techmeme-is-p...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: