We are exploring the feature of memory dependency prediction, and we observed a quite confusing effect. Let's assume 2 processes A and B running on 2 logical cores on a same physical core. While process A writes to addressA with something like "mov $4, (addressA)" and process B loads from addressB with "mov (addressB), %rax", if the 2-11 bits of addressA and addressB are the same, we observed a drastic delay of the loads in process B. Can someone kindly explain why there is a dependency here?
We are testing on a i7-6700K cpu, and we think it might not be caused by the cache bank conflict.