The Performance Analysis Guide for Intel Core i7 Processor and Intel Xeon 5500 Processors claims that the latency for reading a cache line modified on another core is ~75 cycles. This document was written in the Nehalem era (2008). In Skylake, is the core-to-core communication latency still twice as slow as a regular L3 hit, or has Intel added more complicated cache communication infrastructure?
Core-to-core communication latency determines how efficiently threads can communicate; e.g. in a producer-consumer scenario a naive consumer might miss on every unit of work, wasting maybe ~75 cycles. Is reading a modified cache line still a ~75 cycle penalty on the newest architectures?