determination of PREFETCH support

determination of PREFETCH support

In the Intel Archetecture Software Developer's Manual the description for PREFETCHn states that some processors may simply ignore this instruction (NOP it). The CPUID tables do not seem to return information as if the CPU supports PREFETCH or ignores PREFETCH. Is there a programical way of determining this (other than running a benchmark test at application initialization)?

The reason I ask is in a test program on Q6600 PREFETCHn (all variations of n) slow down the program whereas replacing the PREFETCHn FutureAddress with

trash = *FutureAddress; // copy aligned __int64
foo = expressionWithDoubleUsingCurrentAddress;

Gets marginal speedup

Note, the integer load will eventually stall for the read.Whereas PREFETCHn will (should) not introduce a stall waiting for memory.

Jim Dempsey

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Jim

In the case of a prefetch hint that did not result in a fetch (i.e. dropped), it is not the same as a NOP, per se. For example, OOO hardware has to honor a NOP instruction being an instruction not a hint and retire the instruction. but has more latitude in how to treat a hint.

But forgive my nitpicking on ISA definition aside, the thrust of your question is really about should implementation specific behavior of a prefetch hint have an architecturally-defined behavior that is reported via CPUID instruction.

The scenarios of whether a prefetch hint issued sufficiently ahead of subsequent reference is highly dependent on workload characteristic and specific implementation techniques. The same is true of your example of pre-touching the memory location of a subsequent reference.Whetherto rely on software prefetch hints or use explicit pre-touches totrigger hardware prefetchersis not a question with easy answers nor universally applicable solutions.

My personal take of asking CPUID to provide additonal definitions about the implementation-specific nature of prefetch hint tend to be negative.

For example, some CPU generations had implemented prefetch hint to be always dropped if the hint is requesting an address beyond page boundary. If this behavior was codified in some hypothetical CPUID flag, it would not be feasible to allow prefetch hint to be honored as a hint for fetching data across page boundary. The latter was implemented in later generations and improves the software's ability to handle TLB misses.

My experience has been issuing prefetch hint alters the load uop scheduling. Using software prefetch extensively would imply taking suitable precaution that accounts for different CPU implementations (CPUID family/model) can exhibit different performance characteristics. In that sense, asking new flags be added in CPUID to report implementation specific hardware behavior is not really different from using CPUID family and model combinations to dispatch code paths tuned to specific implementation.

This may not be what you want to hear but sometimes the fun about programming is experimentation :)


Leave a Comment

Please sign in to add a comment. Not a member? Join today