Utilizing load latency event in performance monitoring to get line fill buffer breakdown

By Rajshree Chabukswar (Intel) (1 posts) on November 11, 2010 at 1:27 pm

Utilizing load latency event in performance monitoring

Mike Chynoweth talked about utilizing utilizing performance monitoring events to identify the source of the load in memory hierarchy in his blog
http://software.intel.com/en-us/blogs/2010/09/30/utilizing-performance-monitoring-events-to-find-problematic-loads-due-to-latency-in-the-memory-hierarchy/

In this blog, we are going to look at how we can utilize the load latency capability offered by performance monitoring to identify the latency on the data sources. The feature we have experimented for this capability is to help identify and estimate how we can break down the load sources further when the data request is satisfied from Line Fill Buffer (LFB).
Load latency samples on a smaller fraction of the total loads. The loads to be sampled on are selected by a complex internal mechanism.

The information that load latency offers include data sources and the latency value observed at each data source. Using this information, we can estimate the potential data sources equivalent (based on latency values) when significant samples come from line fill buffers. A load that hits in the LFB means that a previous hardware prefetch, load or store has already missed the L1D on an address contained on the same cache line and it has allocated a fill buffer for that cache line. The latency for our immediate demand load is variable since it hits in the existing line fill buffer. When we see significant samples coming from LFB, the technique below helps identify the potential data sources using the actual latency values observed on the LFB samples.
As shown in example below, ~35% of total samples come from fill buffers

Using the latency values on the fill buffer data source, we can put an estimate based on latency on what the approximate data source would be as shown below. (Note that this is just an estimate based on actual latency values observed). As seen the the picture below, 13% of sample from LFB had latency equivalent to mid-level cache, 77% had latency equivalent to last-level cache.

Categories: Uncategorized
Tags:

For more complete information about compiler optimizations, see our Optimization Notice.

Comments (1)

April 21, 2011 10:33 AM PDT

Michael Chynoweth (Intel)
Michael Chynoweth (Intel)Total Points:
707
Brown Belt
The toolset Rajshree used for the analysis above is now live:
http://software.intel.com/en-us/articles/intel-performance-bottleneck-analyzer/

Please download and tell us what you think.

Thanks,

Mike

Trackbacks (0)


Leave a comment  

To obtain technical support, please go to Software Support.
Name (required)*

Email (required; will not be displayed on this page)*

Your URL (optional)


Comment*