I am executing a single threaded copy read program which is pinned to a core. and the program is complied with -O0 -no-vec -no-opt-prefetch options.
static int a[STREAM_ARRAY_SIZE];
for (j=0; j<STREAM_ARRAY_SIZE; j++)
I use VTUNE to read the performance counter. When STREAM_ARRAY_SIZE= 1*10^6 or 2*10^6 Both L2_DATA_READ/WRITE_MISS_CACHE_FILL are 0. and if with 4*10^6 I see a value of 10000.