why LLC miss can great than 1?

why LLC miss can great than 1?

zhe.huangge.com's picture


I get"general exploration"of my code by using Amplifier XE. One of the report is LLC Miss. The whole LLC Miss of my code is 0.242, I comprehend it mean there are 24.2% cycles are uing towait read/store data from/to memory.
But,one function of my code's LLC Miss is 1.133, the other is 2.199!Ican't understand whythe rate can great than 1.
Is it because some event is not precise event? But why can it be 2.199? Anyone can tell me why?

And my CPU isCore microarchitecture, anyonecan tell mehow the LLC Miss is countedon Coremicroarchitecture?

A lot of thanks~~

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Peter Wang (Intel)'s picture


First at all, let me assume that you are using VTune Amplifier XE 2011 product.

Are you using pre-defined analysis type - nehalem_memory-access which provides LLC Miss count? (You also can create a new analysis type for event MEM_LOAD_RETIRED.LLC_MISS...)

If so, the report provides MEM_LOAD_RETIRED.LLC_MISS count number. What did you meanfor "The whole LLC Miss of my code is 0.242"?

How to measure LLC miss which impact on performance in your code? See below formula:

3rd level misses: ((MEM_LOAD_RETIRED.LLC_MISS * 180) / CPU_CLK_UNHALTED.THREAD) * 100

If theresult (percentage) is significant than 20%, consider to improve code; Otherwise ignore LLC Miss in yourmodule / function.

Regards, Peter

Tim Prince's picture

The penalty estimates for performance effects are only estimates; they can easily be off by as much as you have seen. For one thing, they don't take account many specific details of possible differences between your platform and application and those for which the estimate algorithms were derived.

zhe.huangge.com's picture

Hi, Peter Wang:

Thanks for your reply.

I'm using VTune Amplifier XE 2011.

And my CPU is Core famlily, I used pre-defined analysis: Core 2 family-GeneralExploration.Soit doesn't hasevent MEM_LOAD_RETIRED.LLC_MISS. The General Exploration report LLC Miss directly. So the LLC Miss is count by VTune. And, I do not know how can count LLC Miss great than 1.....

In Core family which event is equal to Nehalem's MEM_LOAD_RETIRED.LLC_MISS? Is it L2_LINES_IN? OrMEM_LOAD_RETIRED.L2_LINE_MISS? I'm confused with these event....

About the formula:
3rd level misses: ((MEM_LOAD_RETIRED.LLC_MISS * 180) / CPU_CLK_UNHALTED.THREAD) * 100
I wonder how the '180'was counted?The '180' is mean that the latencyfor access memory is 180 cycles? Why the latency is 180?Is it anestimation number?BecauseIthinkthe latency is not only depend on CPU but also depend onthetype of memory, such as frequency, CLnumber. So itshould not be an constant number for deferent system.
And how to estimate the latency in Core family? It should be less than 180, right?

best regards, Huangzhe

zhe.huangge.com's picture

Hi TimP:

Thanks for your reply.
I think your mean is the penalty that Vtune estimated is not the precise penalty for myplatform, so the LLC Missis also onlyan estimate number.So it can great than 1, right?
So if I want to know the precise LLC Miss rate, the improved way is write a price of code to count the precisepenalty, right?

best regards, Huangzhe

Peter Wang (Intel)'s picture

For Core 2 Duo processors, consider penalties -

Issue Performance Counter Approximate Penalty
L2 Miss MEM_LOAD_RETIRED.L2_LINE_MISS ~165 desktop/~300 server

See more from this article.

Login to leave a comment.