Hello Guys, I'm having performance problems due to DTLB misses, and I'm using the counterDTLB_LOAD_MISSES.WALK_DURATION to measure it. In order to decrease the use of TLB I'm using MAP_HUGETLB, as mmap parameter on Linux. I've created the pool of huge tables, and I can see on /proc/meminfo these 2M pages being alocated, but, surprisingly this counter increases according Vtune analysis. I'm analysing a specific part of the code, accessing arrays allocated using these huge page memory, and I can see a big increment on DTLB misses. It sounds to me very strange. I would like to know if an I misunderstanding the behavior of this counter. Do you have any experience with MAP_HUGETLB? Any suggestion? Is it possible I'm doing something wrong?
For more complete information about compiler optimizations, see our Optimization Notice.