I am executing a single threaded copy read program which is pinned to a core. and the program is complied with -O0 -no-vec -no-opt-prefetch options.
static int a[STREAM_ARRAY_SIZE];
for (j=0; j<STREAM_ARRAY_SIZE; j++)
I use VTUNE to read the performance counter. When STREAM_ARRAY_SIZE= 1*10^6 or 2*10^6 Both L2_DATA_READ/WRITE_MISS_CACHE_FILL are 0. and if with 4*10^6 I see a value of 10000.
In xeon phi we have 32KB private L1 and 512 KB * 60 shared L2 cache i,e a total of 30MB of L2 cache. Suppose if i read a static array which is bigger than 512KB will the extra data gets filled in other cores L2 space? Is this the reason for L2_DATA_READ/WRITE_MISS_CACHE_FILL on L2 miss in the currently pinned core?