Performance Counters on SandyBridge

Performance Counters on SandyBridge

Hi all,

I've just started doing some profiling work on SandyBridge recently, so the following questions might be stupid.

I've checked the Intel SDM and found CYCLE_ACTIVITY should be very useful for my work. But when I actually tried to get that counter but found it seems that only IvyBridge has it. It that right?

In other words, my goal is to find how many cycles are stalled on data for a certain application. How can I do that on a SandyBridge machine (or IveyBridge).

Thanks a lot!

13 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

You can use VTune for gathering CPU related activity(Count of retired uops and etc...).

I was wondering if there is suppor for CYCLE_ACTIVITY on Sandy Bridge. Because the optimization manual said so but I couldn't find it on the Software Develop's Manual. Thanks!

Quote:

Yunqi Z. wrote:

I was wondering if there is suppor for CYCLE_ACTIVITY on Sandy Bridge. Because the optimization manual said so but I couldn't find it on the Software Develop's Manual. Thanks!

Where in the SDM did you try to find it?You need to refer to Volume 3 System Programming Manual chapters 18 and 19.

In the 64 IA-32 Architectures Optimization Manual, Appendix B.3.2.3, there mentioned to use CYCLE_ACTIVITY.STALLS_LDM_PENDING and other CYCLE_ACTIVITY counters to characterize memory subsystem. But in SDM Volume 3 19.4 for Sandy Bridge, there aren't these counters.

And I also found a link (http://software.intel.com/en-us/forums/topic/277820) saying there should be these counters on Sandy Bridge.

Thanks a lot!

In the 64 IA-32 Architectures Optimization Manual, Appendix B.3.2.3, there mentioned to use CYCLE_ACTIVITY.STALLS_LDM_PENDING and other CYCLE_ACTIVITY counters to characterize memory subsystem. But in SDM Volume 3 19.4 for Sandy Bridge, there aren't these counters.

And I also found a link (the title is "Ivy Bridge performance monitoring events CYCLE_ACTIVITY.*?") saying there should be these counters on Sandy Bridge.

Thanks a lot!

Are you refering to this link :http://software.intel.com/en-us/forums/topic/277820

I went through the all posts in that thread and it was clearly stated by one of the Intel engineer that futute editions of SDM will include information about the counters on Sandy Bridge.

What SDM revision do you use?

Btw. You have a nice avatar.IIRC this is J.B Fourier.

Aha, that's right! Thanks

In the 64 IA-32 Architectures Optimization Manual, Appendix B.3.2.3, there mentioned to use CYCLE_ACTIVITY.STALLS_LDM_PENDING and other CYCLE_ACTIVITY counters to characterize memory subsystem. But in SDM Volume 3 19.4 for Sandy Bridge, there aren't these counters.

And I also found a link (the title is "Ivy Bridge performance monitoring events CYCLE_ACTIVITY.*?") saying there should be these counters on Sandy Bridge.

Thanks a lot!

>>>And I also found a link (the title is "Ivy Bridge performance monitoring events CYCLE_ACTIVITY.*?") saying there should be these counters on Sandy Bridge>>>

Did you mention this link :And I also found a link :http://software.intel.com/en-us/forums/topic/277820

There is a respone from one of the Intel engineers he clearly states that future revision of SDM will include those counters mentioned by you.

On Intel or any architecture... I would propose looking at the front end "uops per clock" provided while the front end is busy.  So count the clks that the front end is actually doing something, that includes the DSB / MS and ILD, and then compare that with the execution core's upc ( uops per clock ).  If the front end upc while it's busy == that of the execution core, then you might be front end limited.  I only mention this since you're focusing on activity.. and thought you might think there's some limitation in the front of the machine.  In my inspections on many applications, Intel's rarely limited in the front end of their pipeline and their DSB provides much greater throughput than the execution core can  chew.  You also might want to generate a distribution of the throughput of the various front end and execution resources to see how often nothing is done... it's a large % of the time.

perfwise

Thanks a lot perfwise. :)

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!