I have a small piece of code for which I am analyzing the clock cycles required.
The pseudo code is as follows :
a0 = random() % 256; a1 = random() % 256;
a2 = random() % 256; a3 = random() % 256;
b0 = random() % 256; b1 = random() % 256;
b2 = random() % 256; b3 = random() % 256;
start = timestamp() // using the rdtsc instruction
x0 = array1[a0]; y0 = array2[b0];
x1 = array1[a1]; y1 = array2[b1];
x2 = array1[b2]; y2 = array2[a2];
x3 = array1[b3]; y3 = array2[a3];
stop = timestamp()
totaltime += (stop - start);
1. The array1 and array2 are unsigned char of size 256 bytes (defined globally)
2. I run this loop for 2^23 times, and find the average time.
3. Then I plot a graph of the value of a0 on x-axis and average time taken on the y-axis.
I find that the average time taken is highly dependent on the value of a0.
I am unable to find a reason for this. Does this mean that access times to
the cache depends on the value of the address being accessed.
I am using an Intel Core 2 Duo with 32KB L1 D cache, 8 way associativity,
and 64 byte line size.
Any help in this regard would be greatly appreciated.