Forum Jump

Select Group :
Select Forum :
Sorted By :
Sort Order :
From The :
 
Thread Tools  Search this thread 
jimdempseyatthecove
Total Points:
34,847
Status Points:
34,847
Black Belt
June 24, 2009 10:20 AM PDT
determination of PREFETCH support
In the Intel Archetecture Software Developer's Manual the description for PREFETCHn states that some processors may simply ignore this instruction (NOP it). The CPUID tables do not seem to return information as if the CPU supports PREFETCH or ignores PREFETCH. Is there a programical way of determining this (other than running a benchmark test at application initialization)?

The reason I ask is in a test program on Q6600 PREFETCHn (all variations of n) slow down the program whereas replacing the PREFETCHn FutureAddress with

      trash = *FutureAddress; // copy aligned __int64
      foo = expressionWithDoubleUsingCurrentAddress;

Gets marginal speedup

Note, the integer load will eventually stall for the read. Whereas PREFETCHn will (should) not introduce a stall waiting for memory.

Jim Dempsey


Shih Kuo (Intel)
Total Points:
1,415
Status Points:
915
Brown Belt
June 29, 2009 8:26 AM PDT
Rate
 
#1

Hi Jim

In the case of a prefetch hint that did not result in a fetch (i.e. dropped), it is not the same as a NOP, per se. For example, OOO hardware has to honor a NOP instruction being an instruction not a hint and retire the instruction. but has more latitude in how to treat a hint.

But forgive my nitpicking on ISA definition aside, the thrust of your question is really about should implementation specific behavior of a prefetch hint have an architecturally-defined behavior that is reported via CPUID instruction.

The scenarios of whether a prefetch hint issued sufficiently ahead of subsequent reference is highly dependent on workload characteristic and specific implementation techniques. The same is true of your example of pre-touching the memory location of a subsequent reference. Whether to rely on software prefetch hints or use explicit pre-touches to trigger hardware prefetchers is not a question with easy answers nor universally applicable solutions.

My personal take of asking CPUID to provide additonal definitions about the implementation-specific nature of prefetch hint tend to be negative.

For example, some CPU generations had implemented prefetch hint to be always dropped if the hint is requesting an address beyond page boundary. If this behavior was codified in some hypothetical CPUID flag, it would not be feasible to allow prefetch hint to be honored as a hint for fetching data across page boundary. The latter was implemented in later generations and improves the software's ability to handle TLB misses. 

My experience has been issuing prefetch hint alters the load uop scheduling. Using software prefetch extensively would imply taking suitable precaution that accounts for different CPU implementations (CPUID family/model) can exhibit different performance characteristics. In that sense, asking new flags be added in CPUID to report implementation specific hardware behavior is not really different from using CPUID family and model combinations to dispatch code paths tuned to specific implementation.

This may not be what you want to hear but sometimes the fun about programming is experimentation :)

Shihjong





Intel Software Network Forums Statistics

8285 users have contributed to 31229 threads and 99106 posts to date.
In the past 24 hours, we have 13 new thread(s) 50 new posts(s), and 68 new user(s).

In the past 3 days, the most popular thread for everyone has been comparison cilk++, openmp, pthreads first results The most posts were made to comparison cilk++, openmp, pthreads first results The post with the most views is Very amusing...  Escalated as

Please welcome our newest member tvinni