Can one measure on a per thread basis the actual cost of memory block access in multi-threaded code on a nehalem-EP platform using VTune? For instance can one measure the %age of memory accesses from the different memories per thread? Can the same infor be broken down per cache level hierarchy ?
Can one measure how often the same cache block ping-pongs among caches when threads running on different sockets compete for write access to it?
Can one measure teh cache miss rates / level per thread using VTune ?