Hello Guys, I'm having performance problems due to DTLB misses, and I'm using the counterDTLB_LOAD_MISSES.WALK_DURATIONto measure it. In order to decrease the use of TLB I'm using MAP_HUGETLB, as mmap parameter on Linux. I've created thepool of huge tables, and I can see on /proc/meminfo these 2M pages being alocated, but, surprisingly this counter increasesaccording Vtune analysis. I'm analysing a specific part of the code, accessing arrays allocated using these huge page memory, and I can see a bigincrement on DTLB misses. It sounds to me very strange. I would like to know if an I misunderstanding the behavior of thiscounter. Do you have any experience with MAP_HUGETLB? Any suggestion? Is it possible I'm doing something wrong?
For more complete information about compiler optimizations, see our Optimization Notice.