Hacker News new | past | comments | ask | show | jobs | submit login

Awesome stuff. Did you write a script to find all the Ask HN posts?



I used a couple of different script for all of this. The first intelligently crawled HN, trying to minimize the amount of total hits to the server while still grabbing every post. It also dumped most of its internal state to file every few seconds, allowing me to kill and restart it at my convenience and to easily grab only the new posts when I update the archive.

The second took each post on HN (all of which were saved to disk), found those that were ask posts, and then generated a document with the title of the post, the score and number of comments it received (allowing to me rule out looking at 90% of posts, as they had < 3 comments on average), and markdown ready for copy/pasting into the final document.


SearchYC has a page that shows all of the Ask HN posts: http://ask.searchyc.com/, so you wouldn't have to write a script to do this, although filtering out ones that weren't upvoted very much might have made this job a lot easier.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: