Hello, it seems I have some kind of misunderstanding. I am expecting that PREFETCHNTA prefetchs data to 2nd level cache and doesn't evict anything from L1D. But in vTune I can clearly see that in function that contains only prefetchnta (as a microbenchmark) many L1D.REPLACMENT events atributed to every non-temporal prefetch instruction. So it means prefetched data is actualy reach L1D cache, right?

What is wrong in my undertsanding or what did I miss? My intention is process block of data there every piece  is needed only once, so that is why it would be better to avoid bringing it in L1D and use non-temporal operations. 

Any recomendation for SandyBridge and new Intel platrforms?

BTW does non-temporal load to AVX register available in SB (somthing like MOVNTDQA)?

Thanks in advance.

AORM says

" The non-temporal instruction is:  PREFETCHNTA— Fetch the data into the second-level cache, minimizing cache pollution."


L1D.REPLACEMENT - Replacements in the 1st level data cache.

1 contribution / 0 nouveau(x)
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.