Random and consecutively access and L2/L3 cache hit ratio

Random and consecutively access and L2/L3 cache hit ratio

Dear experts,
I am trying to understand the following phenomena. I have plain array of integers (500000 x 4B) which I access randomly (index generated by system rand call) (case 1) and consecutively (case 2). I wanted to see the L2/L3 cache access counters in this two cases. My naive expectations were that the array of that size (2MB) would fit in the L3 cache (3MB) and both cases would give comparable cache-misses counts. In fact I have expected that the counter getL3CacheHitRatio would be around 1.
Surprisingly it is close to 0.8 for the random access and close to 0.3 for iterative access.
I want to repeat this for bigger (200MB) array but before that step I need to know how to interpret this counters.

My hardware is:
Output of "uname -a":
Linux wm 3.5.7-gentoo #5 SMP PREEMPT Thu Dec 13 18:01:42 CET 2012 x86_64 Intel(R) Core(TM) i3-2370M CPU @ 2.40GHz GenuineIntel GNU/Linux

Results are following:

The results are from 100 execution of the code attached below.

And the code:

#include <iostream>
#include <vector>
#include <cpucounters.h>
#include <ctime>
#include <fstream>
#define SIZE 500000

int main()
{
std::ofstream out("data.txt",std::ofstream::app);

int *coll = new int[SIZE];
for (int i=0; i< SIZE; ++i)
{
coll[i] = i;
}
std::cout << "Size of Collection: " << SIZE*sizeof(int)/1000000 << "MB" << std::endl;

int tmp;
PCM * m = PCM::getInstance();
if (m->good()) m->program();
else return -1;
SystemCounterState before_sstate = getSystemCounterState();
tmp = 0;

//Consecutively reading
std::cout << std::endl << "Consecutively access to collection" << std::endl;
for (int i=0; i<SIZE; ++i)
{
tmp += coll[i];
}
std::cout << "Sum:" << tmp << std::endl;
SystemCounterState after_sstate = getSystemCounterState();
std::cout << "Instructions per clock: " << getIPC(before_sstate,after_sstate)
<< "\nNumber of L2 cache hits: " << getL2CacheHits(before_sstate,after_sstate)
<< "\nNumber of L3 cache hits: " << getL3CacheHits(before_sstate,after_sstate)
<< "\nL2 cache hit ratio: " << getL2CacheHitRatio(before_sstate,after_sstate)
<< "\nL3 cache hit ratio: " << getL3CacheHitRatio(before_sstate,after_sstate)
<< "\nBytes read: " << getBytesReadFromMC(before_sstate,after_sstate) << std::endl;
out << getL2CacheHitRatio(before_sstate,after_sstate) << "\t" << getL3CacheHitRatio(before_sstate,after_sstate) << "\t";

SystemCounterState before_sstate2 = getSystemCounterState();
tmp = 0;

//Random reading
std::cout << std::endl << "Random acces to collection" << std::endl;
srand(time(0));
for (int i=0; i<SIZE; ++i)
{
tmp += coll[rand()%SIZE];
}
std::cout << "Sum:" << tmp << std::endl;
SystemCounterState after_sstate2 = getSystemCounterState();
std::cout << "Instructions per clock: " << getIPC(before_sstate2,after_sstate2)
<< "\nNumber of L2 cache hits: " << getL2CacheHits(before_sstate2,after_sstate2)
<< "\nNumber of L3 cahe hits: " << getL3CacheHits(before_sstate2,after_sstate2)
<< "\nL2 cache hit ratio: " << getL2CacheHitRatio(before_sstate2,after_sstate2)
<< "\nL3 cache hit ratio: " << getL3CacheHitRatio(before_sstate2,after_sstate2)
<< "\nBytes read: " << getBytesReadFromMC(before_sstate2,after_sstate2) << std::endl;
out << getL2CacheHitRatio(before_sstate2,after_sstate2) << "\t" << getL3CacheHitRatio(before_sstate2,after_sstate2) << std::endl;

}

AttachmentSize
Downloadimage/png histogram.png0 bytes
Downloadtext/plain data.txt0 bytes
2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Try to use VTune for more comprehensive analysis.

Leave a Comment

Please sign in to add a comment. Not a member? Join today