Hello, I perform a simple experiment, accessing a constant number of addresses (4096) by chasing a pointer and increasing the stride between the addresses. Initialization for given stride:
int** array_seq_f = NULL; size_t stride; // varied from 1 to 256k size_t size = stride*4097; posix_memalign((void**)&array_seq_f, 4096, sizeof(int*) * size); for(size_t k=0; k<4096;k++) array_seq_f[k*stride] = (int*)&(array_seq_f[(k+1)*stride]); array_seq_f[4096*stride] = NULL;Measured Execution:
int* p = array_seq_f; for (size_t i=0; i p = *((int**)p);I measure the L1 (data), L2 (data), L3 (combined) and TLB misses with PAPIon an Intel Xeon X5650. As expected, the L1 misses are 1 per element with a stride of 8 (equals 64 bytes which is the cachline size). However, with further increasing stride sizes the misses go up to 2 per element at a stride of 32KB. The L2 and L3 misses reach 2 at 128KB. I am not sure why the misses go up to 2. My assumption is that it has to do with the TLB misses and that the additional data cache misses are induced by accesses to the paging structures. Is there a possibility to confirm this assumption? And why do the L2/L3 misses reach 2 misses/element at 128KB and the L1 misses at 32KB already? Any help is very much appreciated.