Hacker News new | past | comments | ask | show | jobs | submit login

I like this personally as someone who submitted something to HN and I don't mind my content used this way as a link is used pointing back to the main site and its obvious where the main site is but if it downloaded every post I made, then I might take issue.

However some of the website owners may not like you're downloading the content of their articles and hosting them elsewhere. There is a great army of spam sites that will just watch http://pingomatic.com/ and scrape each new entry. Then they will on host it on a splog and stick adverts up. Which gets website owners annoyed.

Trouble is Google might not realize which is the main site and won't get the page rank or the visitors don't come to their site but an alternative. They'll get annoyed as they won't be able to monetize them or see who is reading the page through analytics.

So you may want to set up a DMCA page and abuse email address to stay on the right side of the law. Also a robots.txt which denies Google and Bing from crawling the pages you downloaded.




Thanks for the suggestions. robots.txt should now be blocking the /db/, which has all saved content, and a link has been added to the DMCA page on every generated page (putting it at the bottom would be obscure since the pages can get so long).

I'm not planning on copying any of the actual HN content, and don't present copy at all if it is on news.yc. At some point I'll hook into the API to grab comments/points every so often to update into the index pages and probably allow voting from the pages.


Great! Thank you for doing that.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: