You do not need a load barrier to observe stores from other cores. You need the load barrier so your own core does not reorder loads otherwise you could see wacky loads even if the stores are totally ordered.
As a example, suppose we have totally ordered stores, but loads can be reordered freely. Suppose X is 0 and Y is 0, then we store X to 1 then store Y to 1. Therefore, Y can only be 1 after X is 1. But, if you can load out of order, then you can load Y prior to the store, then load X after the store and see Y is 0 and X is 1 even though the store ordering guarantees Y is 1 only after X is 1.
> You do not need a load barrier to observe stores from other cores. You need the load barrier so your own core does not reorder loads otherwise you could see wacky loads even if the stores are totally ordered.
Yes. I mean, you need the load barrier to see stores safely. Also, in principle, since AFAIK there are no guarantees about cache invalidation request behaviour, in theory it could fail to be read forever - in which case you would in fact literally never see the writes on other cores (until you issued a read barrier).
As a example, suppose we have totally ordered stores, but loads can be reordered freely. Suppose X is 0 and Y is 0, then we store X to 1 then store Y to 1. Therefore, Y can only be 1 after X is 1. But, if you can load out of order, then you can load Y prior to the store, then load X after the store and see Y is 0 and X is 1 even though the store ordering guarantees Y is 1 only after X is 1.