Hacker News new | past | comments | ask | show | jobs | submit login
HN2JSON: A ruby gem for HackerNews (rubygems.org)
43 points by jcla1 on Oct 7, 2012 | hide | past | favorite | 13 comments



Be careful not to hammer the site. Your IP could be added to the blocklist if you are too aggressive:

"Yes, we block IPs that seem to be crawlers ignoring robots.txt. We've always blocked abusive IPs, but I tightened up the blocking a few weeks ago. A lot of people were crawling HN, most of them unnecessarily because they were doing things they could have done more efficiently through HNSearch's API[1]." --pg[2]

[1] http://www.hnsearch.com/api

[2] http://news.ycombinator.com/item?id=3196298


I've written a script that extracts HN, which anyone is welcome to use. I use it for the Hacker News iPhone app:

http://api.thequeue.org/hn/frontpage.xml

http://api.thequeue.org/hn/new.xml

http://api.thequeue.org/hn/best.xml


Going through the code on github to see how a HN page is parsed, was informative. I may use this to create one using Node.js. My interest is in building an intelligent agent that filters content based on my interests (example: coding, customer acquisition, hiring etc.) and notifies me on a daily or weekly basis.


I have written a program that is similar to what you just explained, also on GitHub https://github.com/jcla1/hackernews


item = HN2JSON.find 4623690

NoMethodError: undefined method `url=' for #<HN2JSON::Entity:0x007fb84cd63a88>

from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json/parser.rb:92:in `block in get_attrs_post' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json/entity.rb:92:in `add_attrs' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json/parser.rb:91:in `get_attrs_post' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json/entity.rb:71:in `get_attrs' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json/entity.rb:56:in `initialize' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json.rb:35:in `new' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json.rb:35:in `find'


Sorry, I forgott to update the gem, on rubygems.org. Just install the gem again now.


Cool thanks. Might be nice to override the inspect method to display something nicer.


Yeah! The idea is to return the object in JSON


Checkout apify - http://apify.heroku.com/resources & scrapify - https://github.com/sathish316/scrapify Library to scrap HTML content as JSON data.


I wrote a small, ScraPy based HN crawler available at http://github.com/mvanveen/hncrawl in case anyone is interested.


Excellent! I know I'm biased but I also know you've put a lot of effort into this. Well done Joseph.


Nice work. Does Cronic have to be a runtime dependency?


Not really, but at the time I didn't want to have to write my own date parser. (HN doesn't show the date, just things like "x days ago")




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: