Imagine some code running in some walled-off privileged space (like the kernel) that looks something like this:
if(i < length) {
value = data[i]
doStuff(lookupTable[value])
}
Let's say you can run this code with an arbitrary value for `i` (maybe it's the parameter to a system call). From a high-level view, this code is perfectly safe. If `i` is out of bounds, nothing happens. If it's in bounds, the system does something with the data at i, presumably something you're allowed to do.
But what really happens on a modern CPU when `i` is out of bounds? It's possible that the value for `length` is not available and has to be loaded from memory, which can take several hundred CPU cycles in the worst case. It's also possible that the CPU will make a guess that the `if` branch will be taken. If that happens, then execution looks something like this:
try to compare i < length
can't, because length is not yet available
guess the branch will be taken, begin speculatively executing it:
initiate the loading of length
load data[i]
load lookupTable[value]
...possibly some more stuff...
eventually, the load of length completes
oops, turns out i >= length
roll back all of the speculative stuff above
Normally none of this is visible. But, the memory loads have side effects. Namely, they modify the CPU's caches. By attempting to load different locations in memory and looking at how long it takes, you can figure out which locations got cached during that speculative execution phase. Since one of those locations was determined by an out-of-bounds load of data[i], that means you can use this to figure out the data stored at an arbitrary location in this privileged space.
It requires speculative execution of a branch that ends up not being taken, because if the speculative execution was correct then you haven't gained access to anything you're not supposed to know, and if there was no speculative execution then you never load anything out of bounds.
This is simplified and probably wrong in some aspects, but that's the rough idea.
But what really happens on a modern CPU when `i` is out of bounds? It's possible that the value for `length` is not available and has to be loaded from memory, which can take several hundred CPU cycles in the worst case. It's also possible that the CPU will make a guess that the `if` branch will be taken. If that happens, then execution looks something like this:
Normally none of this is visible. But, the memory loads have side effects. Namely, they modify the CPU's caches. By attempting to load different locations in memory and looking at how long it takes, you can figure out which locations got cached during that speculative execution phase. Since one of those locations was determined by an out-of-bounds load of data[i], that means you can use this to figure out the data stored at an arbitrary location in this privileged space.It requires speculative execution of a branch that ends up not being taken, because if the speculative execution was correct then you haven't gained access to anything you're not supposed to know, and if there was no speculative execution then you never load anything out of bounds.
This is simplified and probably wrong in some aspects, but that's the rough idea.