determination of PREFETCH support

Shih Kuo (Intel)
Total Points:
1,420
Status Points:
920
Brown Belt
June 29, 2009 8:26 AM PDT
Rate
 
#1

Hi Jim

In the case of a prefetch hint that did not result in a fetch (i.e. dropped), it is not the same as a NOP, per se. For example, OOO hardware has to honor a NOP instruction being an instruction not a hint and retire the instruction. but has more latitude in how to treat a hint.

But forgive my nitpicking on ISA definition aside, the thrust of your question is really about should implementation specific behavior of a prefetch hint have an architecturally-defined behavior that is reported via CPUID instruction.

The scenarios of whether a prefetch hint issued sufficiently ahead of subsequent reference is highly dependent on workload characteristic and specific implementation techniques. The same is true of your example of pre-touching the memory location of a subsequent reference. Whether to rely on software prefetch hints or use explicit pre-touches to trigger hardware prefetchers is not a question with easy answers nor universally applicable solutions.

My personal take of asking CPUID to provide additonal definitions about the implementation-specific nature of prefetch hint tend to be negative.

For example, some CPU generations had implemented prefetch hint to be always dropped if the hint is requesting an address beyond page boundary. If this behavior was codified in some hypothetical CPUID flag, it would not be feasible to allow prefetch hint to be honored as a hint for fetching data across page boundary. The latter was implemented in later generations and improves the software's ability to handle TLB misses. 

My experience has been issuing prefetch hint alters the load uop scheduling. Using software prefetch extensively would imply taking suitable precaution that accounts for different CPU implementations (CPUID family/model) can exhibit different performance characteristics. In that sense, asking new flags be added in CPUID to report implementation specific hardware behavior is not really different from using CPUID family and model combinations to dispatch code paths tuned to specific implementation.

This may not be what you want to hear but sometimes the fun about programming is experimentation :)

Shihjong



Intel Software Network Forums Statistics

8487 users have contributed to 31625 threads and 100705 posts to date.
In the past 24 hours, we have 36 new thread(s) 120 new posts(s), and 186 new user(s).

In the past 3 days, the most popular thread for everyone has been gemm(A,A,A) like possible? The most posts were made to gemm(A,A,A) like possible? The post with the most views is Dear Steve, excuse me for a d

Please welcome our newest member chat1983