PREFETCHNTA cause L1D eviction (L1D.REPLACMENT)

PREFETCHNTA cause L1D eviction (L1D.REPLACMENT)

Ritratto di Alexander Alexeev

Hello, it seems I have some kind of misunderstanding. I am expecting that PREFETCHNTA prefetchs data to 2nd level cache and doesn't evict anything from L1D. But in vTune I can clearly see that in function that contains only prefetchnta (as a microbenchmark) many L1D.REPLACMENT events atributed to every non-temporal prefetch instruction. So it means prefetched data is actualy reach L1D cache, right?

What is wrong in my undertsanding or what did I miss? My intention is process block of data there every piece  is needed only once, so that is why it would be better to avoid bringing it in L1D and use non-temporal operations. 

Any recomendation for SandyBridge and new Intel platrforms?

BTW does non-temporal load to AVX register available in SB (somthing like MOVNTDQA)?

Thanks in advance.

AORM says

" The non-temporal instruction is:  PREFETCHNTA— Fetch the data into the second-level cache, minimizing cache pollution."

and 

L1D.REPLACEMENT - Replacements in the 1st level data cache.

1 contenuto / 0 new
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione