Software Tuning, Performance Optimization & Platform Monitoring

PCM 2.5.1 - cleanup after execution?

when running pcm-tsx.x multiple times I stumbled on something that looks like a cleanup prob.
This may already be known but I post a brief log anyway.



Rolfs-MacBook-Air:IntelPerformanceCounterMonitorV2.5.1 ran$ ./pcm-tsx.x 2

Intel(r) Performance Counter Monitor: Intel(r) Transactional Synchronization Extensions Monitoring Utility

Copyright (c) 2013 Intel Corporation

PCM 2.5.1 - missing CPU model for MacBook Air with i7-4650U?

Hi, I got this error message when trying to run pcm-tsx.x on my MBA (cpu i7-4650U)

Error: unsupported processor. Only Intel(R) processors are supported (Atom(R) and microarchitecture codename Nehalem, Westmere, Sandy Bridge and Ivy Bridge). CPU model number: 69 Brand: "Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz"
Access to Intel(r) Performance Counter Monitor has denied (no MSR or PCI CFG space access).


"HASWELL_4650 = 69" to the SupportedCPUModels enum in cpucounters.h and

"|| model_ == HASWELL_4650" to isCPUModelSupported in cpucounters.cpp

How can i scale only one core's frequency in a multi-core processor?(Monitor through PMU)


I am trying to scale only one core's frequency in a multi-core processor with linux 3.8 kernel based on intel i7 3610QM.

I scaled each core's frequency through program A  .And i got a result like this:

    cat  /proc/cpuinfo  | grep MHz

Can QPI LL PMU monitor more than 1 kind of CTO event at the same time?


We are developing PCM to monitor QPI traffics. After reading ivy bridge uncore performance guid Chapter 2, section8. QPI Link Layer Performance Monitoring.I have some confusions. says

“In addition to generic event counting, each port of the Intel® QPI Link Layer provides two pairs of MATCH/MASK registers that allow a user to filter packet traffic serviced (crossing from an input port to an output port) by the Intel® QPI Link Layer.”

Where I can get docs for CPUID and CPUID-like assembly registers ?

I everyone, I need to use some assembly calls to get informations about the hardware on x86: where I can get some docs with the list of all the possible registers that I can use and some basic documentation on them ? I'm also interested in any other possible set of registers to get all the possible informations that I can get.

Please note that I don't need docs on the intrinsics, I have the compiler docs for them, just plain old assembly.

Reading Events from Intel(R) Xeon(R) CPU E5-1650

Hi all,

I am using  Intel(R) Xeon(R) CPU E5-1650 CPU for experiments and I want to read the events micro ops retired and L2 cache misses.

After looking at the software development manual I saw that the events would be


Loads blocked due to store forwarding

Hi all

I'm using vtune to spot bottlenecks in a piece of code that looks like the following:

for (int i = 0; i < X; i++) 
  for (int j = 0; j < Y; j++) 
    for (int k = 0; k < Z; k++) 
      A[j][k] += (FE0[i][j]*FE0[i][k]*I[0] + B1[i][k]*C1[i][j]*I[1] + B2[i][k]*C2[i][j]*I[2] + ...);

The code is automatically generated by a high-level tool, that's why it looks "weird". I'm using the most recent intel suite (compiler and tools).

Some inconsistency in QPI traffic monitoring for TxL_FLITS_G0


I did couple experements measuring QPI trafic with customized version of PCM. 

I measured DATA, NONDATA and IDLE traffic over QPI link using two events RxL_FLITS_G0 and TxL_FLITS_G0. My expectations were that results of those events should be semetric. That is amount of data/nondata flits sent should be equal to amount of flits received and Idle rate should be similar for both reciver and transmitter. And also sum of all three would give maximum available bandwidth from QPI about 14 GB/sec. 

Asynchronous DRAM refresh for xeon E5-2600

Hi All,

To retrieve crash logs currently I'm using kexec based solution. However, I am more interested in Linux feature pramfs (

I want to put DRAM in self-refresh mode. If I'm not wrong, for this purpose I can use ADR feature. I could see "General Power Management Configuration" register 2 bit shows the DRAM self-refresh status in Read Only form. Meanwhile, I'm trying with "BIOS Implementation Test Suite" to check values set by my BIOS (not sure if it'll help me).

non-snoop read and non-snoop write. meaning?


Can somebody explain what is the exact meaning of term "non-snoop" for both read and write. 

I am trying to get better understanding of PCIe i/o events in CBo which are named "non-snoop read" and "non-snoop write".



Подписаться на Software Tuning, Performance Optimization &amp; Platform Monitoring