Hacker News new | past | comments | ask | show | jobs | submit login

there are several nice libraries that allow you to generate plausible sounding gibberish

this one is particularly nice and easy to use: https://github.com/jsvine/markovify/

you give it a file of existing text and it generates complete rubbish that would pass most automatic filters




These are far more likely to come to moderator attention by user flags on the edited posts.


for the AI to be useful it has to be continuously updated with new good data

so add small bits of rubbish slowly over time, and don't even contribute again

it'll take a while to completely destroy the AI business model, but we'll get there


> but we'll get there

at some point, it'll be too late. the horse has already left the barn.

besides, if the site owner makes a deal with the devil, there's nothing you can do other than quit using the site. people are still using social platforms more than ever, so stopping isn't going to happen.

the more likely to happen is that accounts deemed to be polluting the waters will just get suspended with no recourse to have it re-instated.


> at some point, it'll be too late. the horse has already left the barn.

I don't think this is true: the technology is useless unless it parasitises new knowledge continuously

it sows the seeds of its own destruction by reducing the value of past and future contributions to zero

> the more likely to happen is that accounts deemed to be polluting the waters will just get suspended with no recourse to have it re-instated.

so this is also perfectly acceptable: once they've banned the top 20% the site effectively becomes read-only, and the AI knowledge previously parasitised from it atrophies with no replacement


Known knowledge doesn't disappear. Once it knows how to apply an FFT and when, it doesn't need to continue to read about it. It's not a human needing continuing education. Once it knows that Henry VIII had many wives, it doesn't need to keep reading that he had those wives.

Sure, if something new happens, then it's not like SO is the only place it's scraping for new information. If you honestly think that you/we will get to a place to block all scraping, I will just politely disagree.


> Once it knows that Henry VIII had many wives, it doesn't need to keep reading that he had those wives.

That's actually incorrect, it needs to constantly ingest new data. If it ingests enough data (from other LLMs that are hallucinating, for example), then suddenly when it has enough bad data it'll start telling you that Henry VIII was a famous video game on the Sony 64.

It has no concept of 'truthfulness' beyond the amount of data that it can draw correlations from. And by nature LLMs have to ingest as much data as possible in order to draw accurate results from new things. LLMs cannot function without parasitizing off of user generated content, and if user generated content vanishes then it collapses in on itself.


So fill the entire internet with factually incorrect, useless knowledge? This would be a good thing?


Well, that's already happening. Google search has become increasingly useless thanks to SEO-focused AI-generated schlock. It's the inevitable outcome of LLMs. Sites have an incentive to hide that they're AI generated and LLMs have no real way to filter for ingested data made from other LLMs. The only difference is how long the ruse can be kept up.


So you want to pollute the commons just as the people filling the web with SEO-focused AI-generated schlock? Do you feel justified in polluting the commons to serve the ends you desire?


Do you actually have a solution to the problem of companies using LLMs to steal from other people and repurposing it as their own, other than figuring out ways to ensure that LLMs suffer for doing so? And frankly as I mentioned, LLMs are already polluting the commons; you're not offering any solution on that front either other than asking people to keep supplying it with fresh data so that it doesn't poison itself.


Do you realize that your stance is merely your opinion? Does everyone agree that training ANNs is stealing?


Scorched earth policies are always en vogue, and easy to offer as a knee jerk reaction. They do nothing for actually making forward progress in the conversation though.*

*However...there are times where the best solution is a match and some gasoline.


What's your stance on a future open source model that is as capable as any commercial models?

Also, I'm curious, do you consider LLMs to be incredibly error prone and untrustworthy?

Or do you think they are going to replace software developers?


Sounds about as succesfull as people destroying social media by removing or editing their posts. Only a tiny minority actually do anything like that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: