Hacker News new | past | comments | ask | show | jobs | submit login

Known knowledge doesn't disappear. Once it knows how to apply an FFT and when, it doesn't need to continue to read about it. It's not a human needing continuing education. Once it knows that Henry VIII had many wives, it doesn't need to keep reading that he had those wives.

Sure, if something new happens, then it's not like SO is the only place it's scraping for new information. If you honestly think that you/we will get to a place to block all scraping, I will just politely disagree.




> Once it knows that Henry VIII had many wives, it doesn't need to keep reading that he had those wives.

That's actually incorrect, it needs to constantly ingest new data. If it ingests enough data (from other LLMs that are hallucinating, for example), then suddenly when it has enough bad data it'll start telling you that Henry VIII was a famous video game on the Sony 64.

It has no concept of 'truthfulness' beyond the amount of data that it can draw correlations from. And by nature LLMs have to ingest as much data as possible in order to draw accurate results from new things. LLMs cannot function without parasitizing off of user generated content, and if user generated content vanishes then it collapses in on itself.


So fill the entire internet with factually incorrect, useless knowledge? This would be a good thing?


Well, that's already happening. Google search has become increasingly useless thanks to SEO-focused AI-generated schlock. It's the inevitable outcome of LLMs. Sites have an incentive to hide that they're AI generated and LLMs have no real way to filter for ingested data made from other LLMs. The only difference is how long the ruse can be kept up.


So you want to pollute the commons just as the people filling the web with SEO-focused AI-generated schlock? Do you feel justified in polluting the commons to serve the ends you desire?


Do you actually have a solution to the problem of companies using LLMs to steal from other people and repurposing it as their own, other than figuring out ways to ensure that LLMs suffer for doing so? And frankly as I mentioned, LLMs are already polluting the commons; you're not offering any solution on that front either other than asking people to keep supplying it with fresh data so that it doesn't poison itself.


Do you realize that your stance is merely your opinion? Does everyone agree that training ANNs is stealing?


Scorched earth policies are always en vogue, and easy to offer as a knee jerk reaction. They do nothing for actually making forward progress in the conversation though.*

*However...there are times where the best solution is a match and some gasoline.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: