Compile the following program with 'icc -xAVX -std=c99'
#define N 20000000
#define M 200000
double a[N / M], b[N / M], c[N / M];
int main()
{
for (int j = 0; j < M; j++)
for (int i = 0; i < N / M; i++)
c[i] += a[i] * b[i];
}
and measure the SIMD_FP_256.PACKED_DOUBLE event with 'perf stat -r 100 -e r211' on Sandy Bridge. In theory, there should be 10,000,000 counts of that event (2 FLOP/triplet * 20,000,000 triplets / (4 FLOP/instruction) = 10,000,000 instructions). But actual numbers that I got fluctuate in a wide range depending on the value of M. When M = 200,000, the number is very close to 10,000,000; when M = 5, however, it gets as high as 16,800,000, or 68% larger than the expected number. How can I remove such fluctuation?



