What would be the fastest (read lowest latency) method for moving several KiB of data from one core to another on the same physical package?
Suppose core 0 writes some data to L3 cache. My understanding from the SDM Vol 3, Section 11.4 is that core 0 gains ownership of the cache lines (if it did not have it already) even if they were previously shared. For core 1 to read the newly written data, the cache lines should normally be flushed with write to ram and fetched by the other core through a cache miss. Is it possible to transfer ownership of the L3 lines from one core to the other to allow core 1 to access the data without waiting for the store and load via ram?
In particular, I'm targeting Xeon E7 processors in the v4 series.