Hacker News new | past | comments | ask | show | jobs | submit login

Yeah, what is the algorithm?

What would be cool is if you implemented something like Netflix, you know like a recommendation engine. Something that bases of your tastes and preferences and also uses the preferences of those users that have similar tastes as you do. Base on that you can recommend what's "worth reading." :)




And you can figure out what people have read using the (soon to be closed by Mozilla) CSS history hack. You can easily index every story on HN by pinging and scraping newest every 20 or minutes. Don't worry they wont block you ;)


The problem when indexing isn't getting the new content; it is getting the ~1,240,000 older posts that aren't on newest anymore.


You can grab that from searchyc.com, or just ignore it. You can get reasonable recommendations based on the latest articles. This would not be a general purpose recommendation engine, so you would actually need much less data. Attempting to analyze 1 million+ articles seems like overkill in this case. You could just scrape for a week. If you are building this I actually have several months worth of articles indexed, and would be happy to provide the db dump of 80,000+ articles.

I'm actually thinking about hacking this up this weekend as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: