I wonder if we really need to have a paper for every way the technology can be subverted. We know what the problem is and we know it's an architecture shortcoming we have not solved yet.
Generalized: "We rely on a model's internal capabilities to separate data from instructions. The more powerful the model, the more ways exist to confuse the process'.
Not having a clear separation of instruction and data is the root cause for a fair share of computer security challenges we struggle with. From little bobby tables all the way to x86 architecture treating data and code as interchangeable (nevermind NX, other attempts at solving this later).
Autoregressive transformers likely are not capable of addressing this issue with our current knowledge. We need separate inputs and a non turing complete instruction language to address it. We don't know how to get there yet.
But none of this is the actual issue. The issue is that the entire public conversation is consumed by the bullshit details like this at the moment, the culture war is trying to get it's share too and everyone is recycling the same vomit over and over to drive engagement. Everyone is talking symptoms and projecting their hopes and fears into it and much less technically savy people writing regulation, etc are led astray about what the fundamental challenges are .
It's all PR posturing. It's not about security or safety. It's stupid
We discovered technology.
It has limitations.
We know what the problem is.
We know what causes it.
It has nothing to do with safety.
We don't know yet how to fix it.
We need to meet investor expections, so we create an entirely new level of Security Theatre that's a total diversion from the actual problem.
We drown the world in a cesspool of information waste.
We don't know how to fix it yet
If you think https://arxiv.org/abs/1801.01203 is a good paper, I am not sure why this is any different. Yes, we want a paper for every way the technology can be subverted.
… Wait, how is it not about security? Unfortunately, people are using these things in exploitable circumstances, so it would seem to be very much about security.
Of course we have to have these papers, otherwise how could we enumerate these and find solutions that we can show provides benefit against all of these
Enumeration might be endless, which sounds hard, so perhaps we should make a statistical model that generalises over all know examples and gives us the ability to forecast new and not-yet-known cases? :P
Generalized: "We rely on a model's internal capabilities to separate data from instructions. The more powerful the model, the more ways exist to confuse the process'.
Not having a clear separation of instruction and data is the root cause for a fair share of computer security challenges we struggle with. From little bobby tables all the way to x86 architecture treating data and code as interchangeable (nevermind NX, other attempts at solving this later).
Autoregressive transformers likely are not capable of addressing this issue with our current knowledge. We need separate inputs and a non turing complete instruction language to address it. We don't know how to get there yet.
But none of this is the actual issue. The issue is that the entire public conversation is consumed by the bullshit details like this at the moment, the culture war is trying to get it's share too and everyone is recycling the same vomit over and over to drive engagement. Everyone is talking symptoms and projecting their hopes and fears into it and much less technically savy people writing regulation, etc are led astray about what the fundamental challenges are .
It's all PR posturing. It's not about security or safety. It's stupid
We discovered technology. It has limitations. We know what the problem is. We know what causes it. It has nothing to do with safety. We don't know yet how to fix it. We need to meet investor expections, so we create an entirely new level of Security Theatre that's a total diversion from the actual problem. We drown the world in a cesspool of information waste. We don't know how to fix it yet