Hacker News new | past | comments | ask | show | jobs | submit login

Projects like this should respect that if a site's robots.txt contains a long list of Disallow entries for other AI scrapers that they are probably not welcome to scrape either.

(of course this project doesn't do that)




Good point! Thanks for the feedback.

Let me add that to my todos.


It boggles my mind that you would launch without that as a prime directive.


OP just graciously accepted that feedback, no need to be condescending :)


Let me translate what OP wrote:

> Good point! Thanks for the feedback.

> Nothing like this will be added to the product. Money comes from scraping content and thus content will be scrapped regardless any non-scrapping hints and we will be actively working on countering anti-scraping measures.


well... while true how do you reconcile this with OP's statements on another thread?

> It has an extensive proxy IP and retry system in place to bypass bot detection.

Seems like a bit of "talking out of both sides of your mouth".


It's kind of tone deaf to launch a tool like this without considering this in the current climate. Not a popular take on hackernews but everyone outside the tech space is pretty pissed about this stuff.


And proxy farms exist solely to get around this problem. If you believe the rights of content creators is the end all be all, don't complain next time Disney tries to extend the IP expiration dates.


Using the behavior of one bad actor to excuse the abuse of everyone else is pretty bad.


I was recently on a project and out of the 10+ devs on it I was the only one who really knew about robots.txt, or at least the only one who said hey that robots.txt needs to handle internationalized routes, the default ones we disallow are all in English.

I don't say that makes them bad, they just knew other things, so I can totally not have my mind boggled that someone launched a product like this and didn't take obeying robots.txt into consideration and then adds it to the todos when someone complains.


Programmers have no institutional memory.


agreed, I had to often explain lower level simple stuff was there for them, because they didn't happen to know about that thing and were surprised.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: