Hacker News new | past | comments | ask | show | jobs | submit login
Poisoning web-scale training datasets is practical (arxiv.org)
171 points by walterbell on Feb 22, 2023 | hide | past | favorite | 94 comments



When I hear 'poisoning' I think of an internet vandal trying to destroy things for fun.

But a more likely motivation is someone who doesn't like what an AI says about them (think: powerful individual, company, or government) manipulating training data to their advantage.

There are companies who will try to scrub unfavourable web search results for you (for a price). Perhaps this is the next iteration of that.


I'm gonna write (generate I guess using GPT..) a thousand articles about how matsemann is the greatest developer ever. And in 5 years when a recruiter uses recruitGPT to ask "should we hire matsemann" it will give a stellar recommendation.


Yup.

I also invite you to contemplate the potential for brand marketing campaigns if your "poison" is intended to boost the prominence of a well-known product, so that ChatGPT relentlessly pushes it in the face of users. (Brand marketing isn't just about advertising, it's about familiarity.)

There are also darker uses for political propaganda, such as asserting as a positive some contested territorial/ethnic/religious claim, eg. Nation A's claim over the territory of Nation B, or reinforcing Group C's belittling ethnic stereotype of Group D.


I imagine, some sort of fractal pocket dimension, capable to hypnotize a Neural Network. Like a sort of sphinx of all answers, a TRIZ https://en.wikipedia.org/wiki/TRIZ like system claiming to have content for all the problems, occuring again and again, because its optimized to cuddle to the reward function.. This is fascinating concept.

<span> aiCandyland </span>


Perhaps such a pocket dimension can be implemented with another LLM


Just teach them about Infinite Fun Space.


You can also "poison" facial recognition scraping with subtly altered photos.


Finally, the distortion filters in macOS PhotoBooth have a purpose.


I'm fairly convinced this is the next iteration of that. In the last 6 years I feel like there has been a massive focus on 'content moderation' which was just an attempt to direct and bias information that was being used to power a lot of the models we see today.


At what point do we ask if training from “datasets crawled from the internet” is itself the greater poison?


The internet is the representation of the human "meta-mind".

Organisations are seen as a slow form of AI. Their decision making is different to what each individual would make so it represents a different form of "mind".

All humanity (to some definition of all) is also a "mind" - its currently trying to decide on problems like "climate chnage"

The workings of that mind, a brain scan if you like, is the internet. It's a map of the state of each neuron (my twitter history?) and the interconnections between those are how the brain thinks. And we can see into the workings of that mind, and indeed alter it.

AI trained from that "brain scan" is simply an model of human meta mind we can play with faster.

Any problems with ChatGPT are therefore problems with humanity.

Maybe


It's a representation, not the representation.

By looking at the internet, especially web 2 content, you're getting what the engagement algorithms have decided is good for advertisers.

There's plenty of stuff that humanity does that the internet does not incentivize and thus has no representation for


Yea, this.

The same point or a similar critique can be made a few ways, I’d say.

Running with the brain/neuron analogy, there’s a measurement problem (as there is in real neuroscience!). The synaptic activity of the “meta-mind” has been recorded with keyboards, smart phones and plain text. These aren’t the native ways of human communication though, the synapses if you will. That’s more like spoken conversation and physical interaction. All richer phenomena.

To the extent that “textual” communication is now native/normal to humanity, it’s still partial in coverage of all human interaction, new, and shifting with tech developments like video/streaming.

So the internet is a lossy representation, apart from whatever other biases it might have, as suggested above.


Do the datasets follow the algorithmic weightings? I thought they included all content for their domains without weighting by popularity / engagement algorithm.


The internet has about 67% of the world's users, that leaves about 2-3bn not represented. And among those, only about 0.001% actually post and contribute content that is available on the open web, and I'm willing to bet that population of contributors does not represent the world demographic


Remember, the map is not the territory. And we have many types of maps for many kinds of specialized purposes.

https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation


The other day I read that models like stable diffusion can be windows into the human collective subconscious. Not sure if I agree but it's an interesting theory.


The collective unconscious is defined as the shared mental concepts, or archetypes, of humanity/the noosphere. I'd say this perspective is less a theory and more a rephrasing.

https://en.wikipedia.org/wiki/Collective_unconscious


I wonder the same. Also scraping for training data feels like something that should be opt in. I really have a problem with the stance that just because a piece of data is technically accessible, that it’s fair game. It also undermines the lineage and trustworthiness of the final model e.g. how does one verify that a model’s predictive outcomes are in line with expectations.


Conversely, an opt-in dataset would surely consist of 99.99% spam.


I think that's easily avoidable - one wouldn't reach out to "spammy" sources in the first place.


Legally, it seems to me that this is as poisonous as "take a shot of everything in your kitchen cupboard and mix it up". It relies on handwaving away both copyright and GDPR concerns.


LLMs and other learning models have likely already been trained on AI output. These self reinforcing feedback loops are going to have to be accounted for at some point in AI development.


Here's a LLM fine-tuned on its own output that improved on several tasks: https://arxiv.org/abs/2210.11610


Oh this is interesting! Thanks for sharing. To be clear: it’s not that I think it would necessarily degrade the quality of the training, but that the biases of training on other AI output need to be taken into account.


> it’s not that I think it would necessarily degrade the quality of the training

It absolutely would necessarily degrade the quality of the training in the long-term. It's lossy knowledge compression. There is no lossy compression that gets better when you feed its output back into its own input over and over. It's basic information theory.

I admit I don't understand why the linked article had those results -- if the results replicate, it must be somehow squeezing out a bit more usefulness from the training set, similar to the slight perturbations they sometimes give images. Or the model was previously too "unsure" of itself, and it's just amping up its own confidence in itself. If the training set was poisoned, all it would do is squeeze out more poison, or become more confidently wrong. But those are just hunches as to what's going on.


> There is no lossy compression that gets better when you feed its output back into its own input over and over. It's basic information theory

Sure there is. The goal of AI is not to memorize all information, but to make the ability to generalize, so "lossy" doesn't make sense.

For example, noisy data can be improved by successive filtering.

In fact, from an information theory argument, noisy channel coding shows exactly that information can be improved via multiple lossy passes: many modern error correcting codes have iterative decoders, each stage lossy at the level of input to out stages, yet each stage gets closer to the correct original message.

So the "lossy" and "information theory" argument doesn't work.


How was the data created in the first place? A human took their prior knowledge, thought, and wrote something.

There is no conservation law for knowledge. We would expect that when AIs become advanced enough, feeding their own output to them decreases loss, just as it does for humanity. In fact, this is a good definition of intelligence.


"Distillation and amplification" is a somewhat popluar AI technique. For example if you have a chess engine with a heuristic to choose which moves to investigate, you can explore the 20 best paths according to your heuristic, see which moves ended in the best result, and use that to train the heuristic for the first move.

Doing the same thing with LLMs isn't out of the question, but for it to work well you need some kind of reward function that doesn't depend on the model you train. Training on LLM texts that humans conciously chose to publish might already provide that, you just have to somehow filter out the content farms that lack any human review.


Chess is "data mineable" -- you can get more training data just by having computers play chess against themselves. There's clear winners and losers. If you programmed in the rules of chess, a sufficiently powerful AI could learn everything there is to know about the game just by playing itself -- mining its own training data.

There's no analogue with language. The system can't determine whether what was said makes sense or is true on its own. Maybe you could program in "the rules of grammar" and have AIs invent their own languages, but they'd have nothing to say to each other, so don't expect a translation for "a broken clock is right twice a day". Besides, that's not what anyone is doing.

This is why I'm saying any technique like this that works, must work by "squeezing out" more information from the training data (very likely overfitting in the process). You simply cannot data-mine new useful language training data like you can data-mine bitcoin or 1v1 game data.

> for it to work well you need some kind of reward function that doesn't depend on the model you train. Training on LLM texts that humans conciously chose to publish might already provide that

Of course adding more human-curated data can improve the model. But the whole idea of the arxiv article is whether these AIs can improve themselves. It seems patently clear to me that the answer is "only if they're underfit to the data, and only to a limit, after which they will start to overfit on their own excrement". I really just don't see how there's any other possibility that doesn't rely on ChatGPT magically having actually reached the singularity.

Look, even humans don't get perpetually more intelligent just by talking to themselves. After a certain point, all they get is more entrenched in bad ideas, more cult-like, more superstitious. Humans get more intelligent by interacting with the environment and seeing what happens.


AlphaZero plays games of chess against itself over and over, feeding the output of the neural network back into the input, and now it's vastly more powerful than any chess engine that's ever existed.

What's the Kolmogorov complexity of the standard model? If you start with thousands of terabytes of training data, why wouldn't the accurate representation be dramatically smaller than that?

A schoolchild is expected to memorize every word of a text and faithfully repeat it on command. Is that the same thing as understanding a book?


> AlphaZero plays games of chess against itself over and over, feeding the output of the neural network back into the input, and now it's vastly more powerful than any chess engine that's ever existed.

not a good comparison. alphazero's loss function never changed as it was playing itself. it was always just "win this game given these rules". but LLM loss function rewards it for predicting the next token. and now "the next token" might be something that an AI wrote previously.


>alphazero's loss function never changed

Yes, it did.

Alpha zero's loss function changes with every update. The loss function works on the actual outcome versus the predicted outcome, and the predicted outcome is a result of previous learning. So with every step it takes, which are random (and the initial weights are also random), it modifies it's own loss function for future walks in the space of weights. Rerun the entire training with different initial random weights, and look at the sequence of loss functions, and you will get a different sequence.

The paper: https://arxiv.org/pdf/1712.01815.pdf


i mean, there's always an objective ground truth because of the rules of chess never change. did it win or lose?

but the "ground truth" of the english corpus changes all the time. and is changing right now as LLMs emit words into the noosphere. so I don't see how this counters my point.


>the rules of chess never change

Actually, they do under FIDE rules. For example, the 50 move rule has changed many times, even in my lifetime, to different numbers, and there is pressure to change it yet again. They also added a 75 move rule (50 requires player intervention, 75 is automatic).

They recently abolished the automatic draw for insufficient material rule.

They added a new "dead position" rule that forces a draw.

They recently removed a perpetual check draw rule.

They added an automatic fivefold repetition draw rule, to go along with the requiring claim for the threefold repetition rule.

If you don't like FIDE rules, then each national federation has rules that also change.

So claiming the rules never change is simply not true. The rules have changed many, many times, some in pretty big ways (see all the changes in promotion rules since 1800) in the past few hundred years, as well as in even the last decade. Google and read.

>there's always an objective ground truth

That "objective ground truth" is not computable. If it were, chess would be weakly solved (in the game theoretic sense), and it is not, and is expected to never be so. It's too complex. Since no AI can access the "objective truth" of a position, it's no different than what LLMs do - they are measuring next move under some fuzzy AI generated probability distribution over the next move (or token, if you prefer).

>so I don't see how this counters my point

You had a belief, and made a claim to rationalize it, and the claim was false. Usually that should cause to to rethink the belief, not double down on it.

That a game of chess ends with a ternary outcome is irrelevant since AlphaZero is not training on that uncomputable function - it's training on it's own predicted move versus a statistical sampling of the move quality. It never ever knows the "truth" of the outcome of a give position because that cannot be computed - it is far too big.

Your claim:

>but LLM loss function rewards it for predicting the next token. and now "the next token" might be something that an AI wrote previously

is no different than:

"but AlphaZero loss function rewards it for predicting the next move quality. and certainly the next move quality might be something AlphaZero estimated previously".

>is changing right now as LLMs emit words into the noosphere

Chess knowledge is also changing right now as AlphaZero emits new games and even new chess ideas into the noosphere (plenty of GMs have written on how they are rethinking certain positions, and this "knowledge" can be fed into newer engines/AIs as desired....) Not a lot of difference is there?


you are bringing up irrelevant nitpicks and are seemingly intent on misunderstanding/misrepresenting my point. I'm not continuing this discussion.


>intent on misunderstanding/misrepresenting my point

I completely addressed your point - you claim somehow there is a fundamental difference between LLMs and AlphaZero, and you made many claims about why. They were all demonstrably wrong, which is why you misunderstand that there is no fundamental difference, and certainly not the one you claim. Both learn using a fuzzy metric, both can reuse previous things from their own learning, and this is opposite what you claimed.


You're forgetting that not all AI outputs are getting posted. Therefore, training next generation AI on these outputs will reinforce the “share-ability” aspect a bit more.


Could someone explain to me what the advantages are of that? How could an AI model trained on input generated by an AI ever be any better than the original AI?

Is it about costs of getting clean datasets?


It's not about advantages, it's about the fact that if you pull in any wild dataset for a language model, there's an overwhelming chance that some of that wild data was produced by an AI.

E.g. if you're training an LLM on Reddit, there's a good shot that some of the Reddit posts you're pulling in were generated by an AI.


Then train on posts only if they were upvoted.


You are assuming AI-generated content doesn't get upvoted. It's entirely possible that they even get more upvotes than real humans producing organic content.


If the AI content does better than the human generated content then why would training on it be a problem?

Much of the success of chatgpt comes from RLHF, which can be viewed as training on its own data, but only the data which has been determined by a human to be especially good.


The AI content gets no input from the world outside of the internet except through humans.


Well, if it got upvoted (by humans) then isn't it reasonable to train on it?

Of course, one challenge is to make sure AI doesn't interfere with moderation. Perhaps this whole situation will spur more work on moderation systems.


It seems like I keep seeing more about AI being used for moderation. I guess that'd be a form of AI being trained on the output of AI (or I guess the output of humans that an AI found acceptable). With AIs training each other it seems like, given enough time, it would only take a couple "poison" AIs in the loop to make things get really maladaptive. While this isn't what I'd call comparable to the huge models (written by real data scientists that might know what they're doing), I've seen a few nudges to data being fed into extremely basic self-reinforcing ML models (written by fake data scientists aka grad students in an intro AI class that didn't know what they were doing or why, but nonetheless fully understood the underlying math needed to write working models) start getting weird outputs on iterations much further down the line that became increasingly problematic, until eventually there really was no longer a relationship to the original data.


That's one heuristic you could use, yes.

If your goal is "engaging content", that's probably a reasonable one. If your goal is "human-generated content", it's not.


> Could someone explain to me what the advantages are of that? How could an AI model trained on input generated by an AI ever be any better than the original AI? Is it about costs of getting clean datasets?

I think parent was suggesting that the fact that LLM will inadvertently be trained on AI generated output needs to be accounted for.


What is stopping it to be better?

BTW an AI trained on a larger dataset generated by an AI trained on a smaller dataset is still technically only knows the smaller dataset. It can just be a better AI trained on the same dataset.


> How could an AI model trained on input generated by an AI ever be any better than the original AI?

Like a tennis player practicing tennis with a wall?


I suppose the argument is it's not too much yet, and will have to be kept that way by appropriately complicating AI.


This kind of poisoning is already done. Go search "happy white woman" on google images and see if you notice anything strange. Then try other races.


When I search Google images now, I get lots of pictures of white women, 1 picture of a black man with a white woman, and then a LinkedIn article image about how it used to be all black men with white women. I didn't know what you were talking about until I saw the LinkedIn article, so perhaps Google has already addressed it?


Downvoted because of the conspiratorial tone. Why don’t you explain the issue?


Without actually searching, I suspect OP is talking about the race-mixing meme.


Do you have reason to believe that is intentional (i.e. poisoned) versus it being an artifact of stock image site tagging systems and social pressure to avoid treating those subjects differently (i.e. biased)?


Bias certainly, but I wouldn’t even assume social pressure necessarily. The bias in the search can easily be a result of an existing real-world bias.

White has long been the default assumption for many of these tagging systems, so I’d not be surprised if many pictures of white women are just tagged as “happy woman”. Just as automatic soap dispensers that don’t work for dark skinned people.


I can't help but suspect that ChatGPT's clear bias resulted from a fair amount of this.


There's reams of mindless "white women only fuck black guys / white women fuck dogs" text that originated on 4chan and then spread elsewhere. I wonder how many megabytes of it ended up in GPT's dataset.


Try "happy woman". There is your answer.


A Korean TV show?


All of this seems like it's adding up to a promising monetization vector for the Internet Archive.


Historical common-crawl data [1] is available for download for free. Their data was the single most impactful source for GPT-3 [2]

[1] https://commoncrawl.org/the-data/get-started/

[2] https://en.wikipedia.org/wiki/GPT-3#Training_and_capabilitie...


few things would make me happier than seeing the archive thrive


Kind of just skimmed the paper, but isn't it weird that researchers from Google would publish a paper that's basically about how Wikipedia hasn't properly adapted to its contents being used for training data sets?


Wikipedia is interested in being an online encyclopaedia. They are not obliged to accommodate the needs of AI researchers unless they want to.


I haven't seen anything yet on how much recent Wikipedia content is generated by systems like ChatGPT... it appears to be an issue however:

> "Wikipedians like Knipel imagine that ChatGPT could be used on Wikipedia as a tool without removing the role of humanity. For them, the initial text that’s generated from the chatbot is useful as a starting place or a skeletal outline. Then, the human verifies that this information is supported by reliable sources and fleshes it out with improvements. This way, Wikipedia itself does not become machine-written. Humans remain the project’s special sauce."

https://slate.com/technology/2023/01/chatgpt-wikipedia-artic...

This might create a kind of structural-organizational conformity in all articles, which doesn't sound so great.


i've been using it for proposal copywrite in the same manner.


Yeah, my thoughts exactly. I see it as these AI researchers pointing out how to attack Wikipedia, so Wikipedia is getting indirectly forced here.

Feels wrong to me.


Can you elaborate?

I see this paper as explicitly giving Wikipedia useful information and the ability to make decisions.

I think keeping these things transparent is good for Google.


Training AI isn't a core goal of Wikipedia so the notion that this information is useful to them seems questionable.


However, wouldn't a GPT trained exactly on Wikipedia be quite useful? It would be the biggest user-editable training material for a language model that can be asked about things.

In addition to the obvious use of responding to questions based on the material, perhaps it could be a tool for finding e.g. if and how a cited source relates to the article where it was cited. Abuse detection could also be one application.

I couldn't exactly find out what the goal of Wikipedia is from https://en.wikipedia.org/wiki/Wikipedia but it doesn't seem a "better search" would be opposite to those goals.


Claims that GPT produces "better search" are groundless until GPT demonstrably produces "better search" and the resulting product has been observed for unintended consequences. Gonna be a wait.


Oh boy, the best place for subtle errors. A public wiki, being proofread by someone who is likely not a subject matter expert! I don’t see anyway this can go poorly. And obviously search hasn’t been solved for decades.


They paper is also telling them how to poison their databases, so if they really wanted to avoid being used in that way they could do.

The paper also tells them how to solve that issue, which is what google would prefer. But they are letting wikipedia make that choice.


The content on the modern internet should more or less already be proof of the headline.

That said, the idea that $60 is sufficient to disrupt multiple selected datasets is quite interesting, and then leaves me to wonder how the largest AI companies are presumably already aware of and defending against this - how much do you think it would cost to meaningfully attack / poison ChatGPT?


ChatGPT is fine tuned on a curated, relatively small dataset, which is of very high value and would mitigate some of the attacks


Have you come across any specific details on their input datasets? I’ve been assuming CommonCrawl and Pile but if you have anything specific it would be very interesting what else was used.


All the major LLM vendors are vague on datasets because it's quite certainly copyrighted text, (Remember when Google was scanning all the world's books a decade ago?) and they don't want to give the inevitable lawsuits a jump start.


You can just ask ChatGPT what data it was trained on.

> The news groups' concerns arose when the computational journalist Francesco Marconi posted a tweet last week saying their work was being used to train ChatGPT. Marconi said he asked the chatbot for a list of news sources it was trained on and received a response naming 20 outlets including the WSJ, New York Times, Bloomberg, Associated Press, Reuters, CNN and TechCrunch.

https://www.bloomberg.com/news/articles/2023-02-17/openai-is...


Can ChatGPT be trusted on these kind of matters, though? If doesn't really know how not to know.


Would ChatGPT snitch on ChatGPT


Wouldn't ChatGPT name a source even if it did not use that source or if the claim could not be sourced?


>Remember when Google was scanning all the world's books a decade ago?

Yes, and I remember that they won that lawsuit.



This works fine for now. I am however worried that politics will make this much harder in the future. We're already seeing that a certain crowd wishes that AI responses reflect the "average" without any kinds of filtering. If they were to pass a law in this direction (and, let's face it, considering all the other overstepping laws passed in the direction of book banning etc. this is sure to come) we might not have any choice or options left.


Let the Butlerian Jihad commence!


this but unironically


What are we waiting for? Whose side are you on? Let's get poisoning!


I'll take this one further, that poisoning is the only way to protect privacy. With all the means of data collection, and labeling at scale, your privacy is going to be (Already is, but think about the data processing capabilities iin 10 years) complete public knowledge. As hacks of PII become more consistent, pushing a metric fuck ton of fake data into these services will be the only way to ensure no one has your data.


"In light of both attacks, we notify the maintainers of each affected dataset and recommended several low-overhead defenses."

Such a disappointing ending.


It won’t be long until a non trivial part of the economy revolves around people filtering or producing data for the models with known provenance. The idea of being able to train on the raw internet will be a quaint idea.


Shouldn’t the trainers be injecting the expected alignment behaviour into the source text during the pre training. Effectively poisoning their own dataset but with desired behaviour.

You could even have another llm do this.


SMBC comics had one about this recently - https://www.smbc-comics.com/comic/artificial-incompetence - poison the internet training data to prevent AI becoming too advanced.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: