It's not supposed to be a particularly valuable model, it's just to show how it ...

gpderetta · on June 16, 2023

Sorry, I'm losing track of what you are proposing. I guess you mean MOESI plus an additional Stale state? Can you describe all the transitions and when they are performed?

throwawaylinux · on June 16, 2023

It seems to be a state that can send stale data to loads, but will get invalidated if the CPU performs a barrier.

It doesn't work of course, obviously because there is no forward progress. CPU1 can order all previous stores, then later store some flag or lock variable, and that will never propagate to CPU2 spinning on that waiting for the value.

But also because CPU2 and CPU3 can see different values depending on the state of their caches. If one had no such line and the other had a valid-stale line, then they will end up seeing different values. And the writeback out of CPU1's cache needs to write back and invalidate all possible older such written-to lines. By the time you make it work, it's not a cache it's a store queue but worse because it has to do all these coherency operations and barriers have to walk it and flash or flush it, etc.

Dylan16807 · on June 16, 2023

I can't say that spinning without a barrier is doing things wrong?

I guess if I need an emergency fix I can make those lines invalidate themselves every thousand cycles.

> But also because CPU2 and CPU3 can see different values depending on the state of their caches. If one had no such line and the other had a valid-stale line, then they will end up seeing different values.

But only if there's no memory barriers, so it shouldn't be a big deal. And the valid-stale lines can't be used as the basis for new writes.

> And the writeback out of CPU1's cache needs to write back and invalidate all possible older such written-to lines.

The only such lines are in this new state that's halfway between S and I. They don't need to be marked any more invalid. Zero bus traffic there.

gpderetta · on June 16, 2023

Yes, saw that flaw, but I think in their model, barriers are always associated with stores (so atomic_thread_fence is not implementable but store_release is), which fixes your first example. I agree that in any case you end up doing more work that in the typical model to make other scenarios work.