I don't think universal triggers exist, since at that point they are just language features. But there are plenty of less universal triggers
Let's imagine that that in the brain everything goes through a series of models, first tokenization into words, then we build something like an abstract syntax tree, then we analyse meaning in the context etc; and each time one of these steps reaches a nonsensical result we start over with additional parsing time allocated. It's probably not true, but close enough to be a useful model.
Now what you consider an adversarial example depends on how far down the stack it has to go until it's caught:
- "The old man the boat." fails in the early parsing steps. We reliably miscategorize old as adjective when it's a noun.
- "More people have been to Russia than I have, said Escher" goes a step further, it parses just fine but makes no sense. The tricky thing is that you might initially not notice that it makes no sense. This is about the level where AI is today.
- "Time flies like an arrow; fruit flies like a banana" makes perfect sense, but you could notice that the straight forward way to parse it leads to a non-sequitur and parsing it as "time-flies love eating arrows; fruit-flies love eating bananas" is probably a better way to parse it.
Of course that's just the parsing steps. You can trick human "sentiment analysis" by swapping words without changing the meaning. Compare "this bag is made from fake leather" to "this bag is made from vegan leather". PR and marketing have made a science out of how to make bad things sound good. Similarly PR is great at finding adversarial examples for reading comprehension, where they say one thing that's nearly universally understood to mean something different (or to mean nothing at all; or where something that seems to mean nothing at all actually means something very siginicant).
Of course we assume all text to be targeted to humans; so if something is widely misunderstood by humans we blame the sender for writing such a bad message; when it's widely misunderstood by AI we blame the AI for being so bad at reading.
Let's imagine that that in the brain everything goes through a series of models, first tokenization into words, then we build something like an abstract syntax tree, then we analyse meaning in the context etc; and each time one of these steps reaches a nonsensical result we start over with additional parsing time allocated. It's probably not true, but close enough to be a useful model.
Now what you consider an adversarial example depends on how far down the stack it has to go until it's caught:
- "The old man the boat." fails in the early parsing steps. We reliably miscategorize old as adjective when it's a noun.
- "More people have been to Russia than I have, said Escher" goes a step further, it parses just fine but makes no sense. The tricky thing is that you might initially not notice that it makes no sense. This is about the level where AI is today.
- "Time flies like an arrow; fruit flies like a banana" makes perfect sense, but you could notice that the straight forward way to parse it leads to a non-sequitur and parsing it as "time-flies love eating arrows; fruit-flies love eating bananas" is probably a better way to parse it.
Of course that's just the parsing steps. You can trick human "sentiment analysis" by swapping words without changing the meaning. Compare "this bag is made from fake leather" to "this bag is made from vegan leather". PR and marketing have made a science out of how to make bad things sound good. Similarly PR is great at finding adversarial examples for reading comprehension, where they say one thing that's nearly universally understood to mean something different (or to mean nothing at all; or where something that seems to mean nothing at all actually means something very siginicant).
Of course we assume all text to be targeted to humans; so if something is widely misunderstood by humans we blame the sender for writing such a bad message; when it's widely misunderstood by AI we blame the AI for being so bad at reading.