> In that example I don't think anything really changes. With cache coherency a release barrier makes sure all previous stores are committed to L1, and by implication any old versions of the cache line have been invalidated.
That is not what a release barrier does.
> Without cache coherency a release barrier makes sure all previous stores are committed to L1, and explicitly says that any old versions of the cache line have been invalidated.
That is not a "normal barrier" though, writeback and invalidate operations are software coherency.
> If you had a design a lot like MESI, you wouldn't really need release semantics,
This is not the case. Release barrier can be required even if your cache coherency operations completed in FIFO order, because reordering could be done before cache coherency.
> That is not a "normal barrier" though, writeback and invalidate operations are software coherency.
I think I was unclear when I said "the cache line". I meant the one containing a releasing store.
Let me try wording it a different way. A store_release requires all previous writes to be ordered before it. This obviously includes other memory addresses, but its own memory address isn't an exception. So even without cache consistency as a general rule, the nature of a release gives you all the ordering you need in this situation.
I'm sorry for mentioning the word "invalidate", because that's the implementation and not the semantics.
> This is not the case. Release barrier can be required even if your cache coherency operations completed in FIFO order, because reordering could be done before cache coherency.
So I meant acquire but I think there's also a clear solution based on exactly what I said.
The lines that could possibly be affected by the FIFO are all in the "Shared but maybe stale" state. Consume turns those into Invalid. So any read that's from after the Consume, reordered before it, should see those lines as Invalid.
> I think I was unclear when I said "the cache line". I meant the one containing a releasing store.
Not sure what you mean by that.
> Let me try wording it a different way. A store_release requires all previous writes to be ordered before it. This obviously includes other memory addresses, but its own memory address isn't an exception. So even without cache consistency as a general rule, the nature of a release gives you all the ordering you need in this situation.
We're talking about cache coherency, not memory consistency. Coherency is not about ordering, it's about ensuring agents don't see stale data.
> So I meant acquire but I think there's also a clear solution based on exactly what I said.
The same goes for acquire though.
> The lines that could possibly be affected by the FIFO are all in the "Shared but maybe stale" state. Consume turns those into Invalid. So any read that's from after the Consume, reordered before it, should see those lines as Invalid.
Implementation of CPU memory pipelines and cache coherency aren't really something you can just get a bit of a feel for and then handwave about.
When I talk about how the cache protocol has to do XYZ to implement a barrier, you complain that that isn't what a barrier is.
When I talk about what memory barriers do in pure terms, you complain that I'm not mentioning the cache protocol.
When I give an example of a cache protocol in isolation, you start talking about which memory barriers I'm missing.
I don't know what you want.
> Coherency is not about ordering, it's about ensuring agents don't see stale data.
Well, if I go by "if it's part of the memory model then it's not stale", then you can allow a relaxed ordering on single addresses without having stale data.
When a core takes Exclusive control of a cache line, put all other Shared copies into the state "might be an old version, but that's allowed by the protocol and the memory model".
Some instructions can read "might be an old version, but that's allowed by the protocol and the memory model" values and some can't. The exact details of "some instructions" are flexible/irrelevant. See the memory model (not provided) for details.
There. Done. Minimal proof established of a design that doesn't always guarantee cache coherency, but can enforce as much cache coherency as you need. You don't need to add any explicit writebacks or flushes to manage it from software, and enforcing it doesn't take any significant effort beyond a normal CPU's coherency protocol.
Agents will never be confused. They know that Shared means everyone has the same value, and "might be an old version, but that's allowed by the protocol and the memory model" does not mean everyone has the same value. They know that transitioning from "might be an old version, but that's allowed by the protocol and the memory model" to Shared or Exclusive requires reading the data anew, just like transitioning from Invalid to Shared to Exclusive.
If agents want to always be as up to date as possible, they can simply not use this state. If an agent wants to be up to date some of the time, then it can allow this state but purge it at will.
This state only allows for "old" values going back to the most recent purge, so it's not a useless act to read from it. And this state can give you data faster than acquiring a Shared state, so there's a reason to use it.
> The same goes for acquire though.
I'm pretty sure the entire point of acquire is that you can't reorder reads from after it to before it.
I don't want anything, I was correcting your misconceptions.
> Well, if I go by "if it's part of the memory model then it's not stale", then you can allow a relaxed ordering on single addresses without having stale data.
I don't know what you're talking about. Memory ordering is not about ordering of a single address. That's cache coherency.
[snip]
> I'm pretty sure the entire point of acquire is that you can't reorder reads from after it to before it.
And you're still wrong. Acquire barrier can be required even if you receive coherency updates in a sequential order. Barriers in modern processors do not flush, invalidate, or perform coherency operations.
This is what I mean by you can't just handwave with a basic idea about the programming semantics (which aren't particularly complicated). It's easy to think up some vaguely plausible sounding implementation of those things, but the reality is infinitely more complicated. Real cache coherency protocols are verified with formal proofs, and not because they are easy. I guarantee if you handwave a new coherency state or give up some property of coherency, you will have bugs.
> I don't know what you're talking about. Memory ordering is not about ordering of a single address. That's cache coherency.
The ordering of a single address is relevant to both the cache protocol and the memory model.
That section is describing a cache protocol.
> And you're still wrong. Acquire barrier can be required even if you receive coherency updates in a sequential order.
I agree. How does that make my statement wrong in any way?
> Real cache coherency protocols are verified with formal proofs, and not because they are easy. I guarantee if you handwave a new coherency state or give up some property of coherency, you will have bugs.
Do you think my description is impossible to fix, or are you just trying to impress on me that it's hard?
I don't feel like spending hours finding and editing a concurrency simulator today.
That is not what a release barrier does.
> Without cache coherency a release barrier makes sure all previous stores are committed to L1, and explicitly says that any old versions of the cache line have been invalidated.
That is not a "normal barrier" though, writeback and invalidate operations are software coherency.
> If you had a design a lot like MESI, you wouldn't really need release semantics,
This is not the case. Release barrier can be required even if your cache coherency operations completed in FIFO order, because reordering could be done before cache coherency.