Projects like this should respect that if a site's robots.txt contains a long li...

sachou · 2024-12-13T04:56:40 1734065800

Good point! Thanks for the feedback.

Let me add that to my todos.

keyle · 2024-12-13T05:13:02 1734066782

It boggles my mind that you would launch without that as a prime directive.

tesrx · 2024-12-13T05:27:12 1734067632

OP just graciously accepted that feedback, no need to be condescending :)

pxtail · 2024-12-13T11:12:35 1734088355

Let me translate what OP wrote:

> Good point! Thanks for the feedback.

> Nothing like this will be added to the product. Money comes from scraping content and thus content will be scrapped regardless any non-scrapping hints and we will be actively working on countering anti-scraping measures.

vunderba · 2024-12-13T07:07:55 1734073675

well... while true how do you reconcile this with OP's statements on another thread?

> It has an extensive proxy IP and retry system in place to bypass bot detection.

Seems like a bit of "talking out of both sides of your mouth".

__loam · 2024-12-13T05:37:25 1734068245

It's kind of tone deaf to launch a tool like this without considering this in the current climate. Not a popular take on hackernews but everyone outside the tech space is pretty pissed about this stuff.

Onavo · 2024-12-13T06:22:15 1734070935

And proxy farms exist solely to get around this problem. If you believe the rights of content creators is the end all be all, don't complain next time Disney tries to extend the IP expiration dates.

__loam · 2024-12-13T09:52:29 1734083549

Using the behavior of one bad actor to excuse the abuse of everyone else is pretty bad.

bryanrasmussen · 2024-12-13T07:01:24 1734073284

I was recently on a project and out of the 10+ devs on it I was the only one who really knew about robots.txt, or at least the only one who said hey that robots.txt needs to handle internationalized routes, the default ones we disallow are all in English.

I don't say that makes them bad, they just knew other things, so I can totally not have my mind boggled that someone launched a product like this and didn't take obeying robots.txt into consideration and then adds it to the todos when someone complains.

freeone3000 · 2024-12-13T07:25:22 1734074722

Programmers have no institutional memory.

bryanrasmussen · 2024-12-13T08:41:32 1734079292

agreed, I had to often explain lower level simple stuff was there for them, because they didn't happen to know about that thing and were surprised.