Software Tuning, Performance Optimization & Platform Monitoring

Using Intel Performance Counter Monitor

I have been making use of the Intel Performance Counter Monitor (http://software.intel.com/en-us/articles/intel-performance-counter-monitor) software on Mac OS X. The class defined in MSRAccessor.h allows one to read the Model Specific Registers on a macbook, including performance counters and other system information. However, there is no way to know on what CPU the current thread is executing, in order to read MSRs that have scope of only one logical processor.

Strange behaviour of Intel Xeon E3-1220v2

Hi All

I have bought Intel Xeon E3-1220 v2 processor.

I am testing Linux on this processor. I created a test to see the time it takes for the OS to handle the clock tick.

My test is as following: I create a thread with highest priority ( can be preempted only by interrupts), whose task is just to do a small calculation in a loop of 45000 times. Here is the code:

long LoopCounter = 45000; 

while ( lLoopCounter > 0 )

   {

   Value =  (( Value * 3 ) + 5);

      lLoopCounter--;

   }

Strange behaviour of Intel Xeon E3-1220v2

Hi All

I have bought Intel Xeon E3-1220 v2 processor.

I am testing Linux on this processor. I created a test to see the time it takes for the OS to handle the clock tick.

My test is as following: I create a thread with highest priority ( can be preempted only by interrupts), whose task is just to do a small calculation in a loop of 45000 times. Here is the code:

long LoopCounter = 45000; 

while ( lLoopCounter > 0 )

   {

   Value =  (( Value * 3 ) + 5);

      lLoopCounter--;

   }

no-fill mode in sandy bridge

Hi every one,

     I found that the no-fill mode(cr0.CD=1 cr0.NW=0) did not work in my prosessor(i7 2600). Specifically, when I access a memory region that is surely in the L1 Dcache after entering the no-fill mode, the speed suffers significantly(*1000+). But according to table 11-5 in intel SDM vol 3A, read hit should access cache. And I am sure that I have the correct memory type(WB in the experiment, both is mtrr and pat) and the process is restricted in a single core with intertrupt disabled.

resource stalls on Sandy Bridge

Hi all,

I'm trying to measure the resource stalls on a SandyBridge machine. As the SDM introduces, I can break the stalls down to different causes as shown below.

RESOURCE_STALLS:ANY - 0x5301a2
RESOURCE_STALLS:LB - 0x5302a2
RESOURCE_STALLS:RS - 0x5304a2
RESOURCE_STALLS:SB - 0x5308a2
RESOURCE_STALLS:ROB - 0x5310a2
RESOURCE_STALLS:FCSW- 0x5320a2
RESOURCE_STALLS:MXCSR - 0x5340a2

To my understanding, the number of ANY is supposed to be equal to the sum of the other 6 numbers. But it seems to be wrong based on my experiments.

Measuring BTB misses VS Predictor Miss for Branches vTune 2013

Hello all,

I am an undergrad workign on a performance profiling project. I specifically am measuring branch-miss impact on a bit of code using the Amplifier XE 2013 suite (vTune). I have found out where the highest branch miss rates occur. 

My current goal is to come up with some kind of confirmation that this is indeed where the misses are happening. My section of code contains 27 branch-like statements (if, else if) that are condition based. I have successfully found a way to change these conditional branches into indirect jumps. 

How to find different Intel Xeon processor's P-states?

I am not sure whether this is the correct forum I should post at. There are too many:< Anyway my question is as following(I need such information to help me buy an Intel Processor):

Different p-states will let processor core work at different frequency level and voltage level. It's enabled by Intel Enhanced Speedstep technology.

Subscribe to Software Tuning, Performance Optimization &amp; Platform Monitoring