Prefetch instructions

Prefetch instructions

I'll be interested to have information about the behavior of prefetch hints instructions such as prefetcht0,prefetchnta,prefetchw,... for modern processors such as Sandy Bridge and Ivy Bridge. I ask because there is nothing about it in the optimization guide [1] apparently. It will be arguably a good thing for developers to know to which cache level data are prefetched with the diverse variants. I'll glad if someone provide a pointer to some detailed explanation.

[1] Intel® 64 and IA-32 Architectures Optimization Reference Manual, Order Number: 248966-026, April 2012

 

6 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

>>...I'll be interested to have information about the behavior of prefetch hints instructions such as prefetcht0, prefetchnta, prefetchw,...
>>for modern processors such as Sandy Bridge and Ivy Bridge...

There are also some optimization tips in Intel C++ Compiler User and Reference Guides and please take a look.

I recently experienced some issue with application of _mm_prefetch on a computer with Intel Core i7-3840QM CPU ( Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/compare/70846 ).

A piece of code with prefetching that perfectly works on older computers, for example with Pentium 4 or Atom N270 CPUs, doesn't provide performance gains when used on the computer with Ivy Bridge CPU. I think this is because significantly larger L3, L2 and L1 cache lines and it is clear that I don't fetch data properly in for-loops. Unfortunately, I still didn't have time to investigate it completely ( with VTune ) and _mm_prefetch is commented out for that configuration ( the code works fast with and without prefetching ).

Hi bonxzv,

sorry for off topic,but it is nice to see you again on IDZ forums:)

Quote:

iliyapolak wrote:

Hi bonxzv,

sorry for off topic,but it is nice to see you again on IDZ forums:)

Hi iliyapolak,

indeed it was a moment that I didn't come here, thanks for the warm welcome

hey, I see that in the meantime your black belt points have gone through the roof! 

Quote:

Sergey Kostrov wrote:

>>...I'll be interested to have information about the behavior of prefetch hints instructions such as prefetcht0, prefetchnta, prefetchw,...
>>for modern processors such as Sandy Bridge and Ivy Bridge...

There are also some optimization tips in Intel C++ Compiler User and Reference Guides and please take a look.

thanks for your feedback Sergey,

I haven't found any processor specific details in the C++ documentation so far, basically I have found:

- the documentation for the "prefetch insertion optimization" /Qopt-prefetch[:n], I have remarked that /Qopt-prefetch requires /O3 so I have to test it again, the last time I tried it I rushed my tests: compiled with /O2 with no visible change to my timings

- minimal explanation for the "Cacheability Support Intrinsics" and the _MM_HINT_T0, etc. hints

 Quote:

Sergey Kostrov wrote:

I recently experienced some issue with application of _mm_prefetch on a computer with Intel Core i7-3840QM CPU ( Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/compare/70846 ).

A piece of code with prefetching that perfectly works on older computers, for example with Pentium 4 or Atom N270 CPUs, doesn't provide performance gains when used on the computer with Ivy Bridge CPU. I think this is because significantly larger L3, L2 and L1 cache lines and it is clear that I don't fetch data properly in for-loops. Unfortunately, I still didn't have time to investigate it completely ( with VTune ) and _mm_prefetch is commented out for that configuration ( the code works fast with and without prefetching ).

this is pretty much what I'm experiencing too on Ivy Bridge vs. P4 and older CPUs, I have removed a while ago all explicit prefetch in loops (well handled by the hardware prefetchers), I have only a very few cases still with explicit prefetch, I can see at best a 5% speedup in single thread mode, down to 0% in multithread mode (8 threads with hyperthreading enabled on a Core i7 3770K)

btw Linus Torvalds reports serious slowdown in the Linux kernel due to expliict prefetch here: http://www.realworldtech.com/forum/?threadid=132668&curpostid=132772 

Thanks bronxzv.

Yes I am spending a lot of time on this forum gaining knowledge and sharing my knowledge with other users.

发表评论

登录添加评论。还不是成员?立即加入