Software Tuning, Performance Optimization & Platform Monitoring

Can't run the Intel Power Gadget

I'm trying to run the Intel Power Gadget on a Win 8.1 x64 system with a Core i7-720QM.  The program and the MS Visual C++ 2010 redistributable library appear to install OK, but the tool will not start.  I try launching it, and nothing happens.  I attached windbg to the executable, but did not learn anything.  I also investigated it with Process Monitor and Dependency Walker, but couldn't get to the bottom of it.

Intel® Memory Latency Checker v2 with buffer option


the documentation of the Intel Memory Latency Checker states that with the option -bXXX you can specify the buffer size. For example to measure caches instead of DRAM. But this option will not considered for execution. The print message "Using buffer size of" as well as the measures values indicate that it not works. For example mlc --idle_latency –b3000 –c0 –t3 out of the documentation will not work. Is there a workaround?


Kind regards,

MicroSequencer (MS) @ SNB


In 64-ia-32-architectures-optimization-manual, chapter B.3.7.2 Understanding the Sources of the Micro-op Queue it is said that UOPs come from DSB, MITE and MS, and a 'typical distribution' is given. It happens so that in the app I'm profiling quite a lot more UOPs are dispatched from MS than suggested as desirable by Intel in the manual while the execution is clearly front-end bound.

The problem is, I don't understand why that happens. The manual reads:

Not PMCx reset working when collecting raw PEBS dump

Hello all,

  I'm novice on using PEBS facility and I am trying to use "long latency loads" facility and want to dump "raw PEBS records" for further analysis.

For writing a simple example, I referenced SDM v3, especially on through (for Sandy Bridge). 

When testing, counting long latency loads counter normally works, but PEBS recording does not correctly works. 

In the test, I fount that PMCx reset value for adjusting sampling rate does not correctly working, specifically it does not overflow at all for too low counts.

pause instruction doesn't seem to reduce cpu usage / elect consumption


I came across the pause assembly instruction which is effective with sse2.
I own a core 2 duo from 2007 (Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz) and when used in a spin wait loop, i see no change in cpu usage / electric consumption.

i used the same loop as here :

request for a demo project of using AVX asm


             I'm studying and trying to use AVX-256/512 instructions/intrinics, but I could not find a good demo/example for new starters.  If there is a simple example project with c code and AVX-related asm code to run, it may help a lot. Could you send me one such example project?


Thank you


Using PEBS facility


I've been going through the documentation for the PEBS facility as described in the Intel software-developer manual vol 3b section

It is mentioned that, in order to use PEBS, software needs to initialize the DS_BUFFER_MANAGEMENT_AREA data structure in memory (in non-paged pool) and then store the beginning linear address of this data structure in the IA32_DS_AREA register. 

Is there a sample piece of code that illustrates how this data structure initialization and setting of IA32_DS_AREA register needs to be done?

Subscribe to Software Tuning, Performance Optimization & Platform Monitoring