Intel® Performance Counter Monitor - A better way to measure CPU utilization

Short URL for this page: www.intel.com/software/pcm

Table of Contents

Contributors (in historical order)

Roman Dementiev, Thomas Willhalm, Otto Bruggeman, Patrick Fay, Patrick Ungerer, Austen Ott, Patrick Lu, James Harris, Phil Kerly, Patrick Konsor

Introduction to Intel® PCM (Performance Counter Monitor)

Intel® PCM version 2.0 is now available and adds support for the Intel® Xeon® E5 series processors based on Intel microarchitecture code-named Sandy Bridge EP/EN/E. See the Intel® PCM version 2.0 Features section below for a brief description of the new features.

The complexity of computing systems has tremendously increased over the last decades. Hierarchical cache subsystems, non-uniform memory, simultaneous multithreading and out-of-order execution have a huge impact on the performance and compute capacity of modern processors.

 

Figure%201%20%u201CCPU%20Utilization%u201D%20measures%20only%20the%20time%20a%20thread%20is%20scheduled%20on%20a%20core
Figure 1: "CPU Utilization" measures only the time a thread is scheduled on a core

 

Software that understands and dynamically adjusts to resource utilization of modern processors has performance and power advantages. The Intel® Performance Counter Monitor provides sample C++ routines and utilities to estimate the internal resource utilization of the latest Intel® Xeon® and Core™ processors and gain a significant performance boost

When the CPU utilization does not tell you the utilization of the CPU

CPU utilization number obtained from operating system (OS) is a metric that has been used for many purposes like product sizing, compute capacity planning, job scheduling, and so on. The current implementation of this metric (the number that the UNIX* "top" utility and the Windows* task manager report) shows the portion of time slots that the CPU scheduler in the OS could assign to execution of running programs or the OS itself; the rest of the time is idle. For compute-bound workloads, the CPU utilization metric calculated this way predicted the remaining CPU capacity very well for architectures of 80ies that had much more uniform and predictable performance compared to modern systems. The advances in computer architecture made this algorithm an unreliable metric because of introduction of multi core and multi CPU systems, multi-level caches, non-uniform memory, simultaneous multithreading (SMT), pipelining, out-of-order execution, etc.

 

Diagram%20of%20a%20multi-socket%2C%20multi-core%20system
Figure 2: The complexity of a modern multi-processor, multi-core system

 

A prominent example is the non-linear CPU utilization on processors with Intel® Hyper-Threading Technology (Intel® HT Technology). Intel® HT technology is a great performance feature that can boost performance by up to 30%. However, HT-unaware end users get easily confused by the reported CPU utilization: Consider an application that runs a single thread on each physical core. Then, the reported CPU utilization is 50% even though the application can use up to 70%-100% of the execution units. Details are explained in [1].

A different example is the CPU utilization for "memory throughput"-intensive workloads on multi-core systems. The bandwidth test "stream" already saturates the capacity of memory controller with fewer threads than there are cores available.

Abstraction Level for Performance Monitoring Units

The good news is that Intel processors already provide the capability to monitor performance events inside processors. In order to obtain a more precise picture of CPU resource utilization we rely on the dynamic data obtained from the so-called performance monitoring units (PMU) implemented in Intel's processors. We concentrate on the advanced feature set available in the current Intel® Xeon® 5500, 5600, 7500, E5, E7 and Core i7 processor series [2-4].

We have implemented a basic set of routines with a high level interface that are callable from user C++ application and provide various CPU performance metrics in real-time. In contrast to other existing frameworks like PAPI* and Linux* "perf" we support not only core but also uncore PMUs of Intel processors (including the recent Intel® Xeon® E7 processor series). The uncore is the part of the processor that contains the integrated memory controller and the Intel® QuickPath Interconnect to the other processors and the I/O hub. In total, the following metrics are supported:

  • Core: instructions retired, elapsed core clock ticks, core frequency including Intel® Turbo boost technology, L2 cache hits and misses, L3 cache misses and hits (including or excluding snoops).
  • Uncore: read bytes from memory controller(s), bytes written to memory controller(s), data traffic transferred by the Intel® QuickPath Interconnect links.

Intel® PCM version 1.5 (and later) also supports Intel® Atom™ processors but counters like memory and Intel® QPI bandwidth and L3 Cache Misses will always show 0 because there is no L3 Cache in the Intel® Atom™ processor and no on-die memory controller or Intel® QPI links.

Intel® PCM version 1.6 supports on-core performance metrics (like instructions per clock cycle, L3 cache misses) of 2nd generation Intel® Core™ processor family (Intel® microarchitecture code name Sandy Bridge) and an experimental support of some earlier Intel® microarchitectures (e.g. Penryn): it can be enabled by defining PCM_TEST_FALLBACK_TO_ATOM in the cpucounter.cpp.

I want to see these counters!

As an additional goody, the package includes easy-to-use command line and graphical utilities that are based on these routines. They can be used out-of-the box by users which cannot or do not want to integrate the routines in their code but are willing to monitor and understand the CPU capacity limits in real-time.

Figure 3 shows the screen shot of the command line utility on the Windows* platform. Whereas the Linux* version can rely on the MSR kernel module that is provided with the Linux kernel, no such facility is available on Windows. For Windows, a sample implementation of a Windows driver provides a similar interface.

 

Screenshot%20of%20Intel%20PCM%20command%20line%20tool
Figure 3: Intel® Performance Counter Monitor command line version

 

But there is more to come. For the Linux operating system, the package includes an adaptor that plugs into the KDE* utility ksysguard. Using this daemon, it is possible to graph the various metrics in real-time. Figure 4 shows a screen shot where some of the metrics are displayed during a workload run.

See figures 9 and 10 below for PCM version 2.0 versions of these screenshots.

 

Screenshoot%20of%20ksysguard%20with%20Intel%20PCM
Figure 4: The KDE utility ksysguard on Linux can graph performance counters using a plug-in (from PCM v1.7)

 

Since these utilities provide a direct insight into the system, they can even be used to quickly find and understand fundamental performance bottlenecks in real-time. (In contrast to the Intel® VTune™ Performance Analyzer, they won't however tell you what parts of the application are causing the performance issue.)

Since version 1.5 the Intel® Performance Counter Monitor package contains a Windows* service, based on Microsoft .Net* 2.0 or better, that will create performance counters that can be shown in the Perfmon program that is delivered with the Microsoft Windows* OS. Microsoft's perfmon is capable of showing many useful performance counters on the Windows* OS like disk activity, memory usage, cpu load. More information about perfmon for Windows* 7 and Windows* 2008/R2 can be found at here (but perfmon has been available for many releases of Windows now). Please read the Windows_howto.rtf file on how to install and remove the service for Intel® PCM.

For all of the above mentioned hardware counters on the Nehalem and Westmere based platforms, a corresponding perfmon counter is created and therefore all features supported by perfmon are also available for these counters like logging over time in a file or database. For Intel® Atom processors the perfmon counters for memory and Intel® QPI bandwidth and L3 Cache Misses will always show 0 for reasons mentioned above. In a future update of Intel® Performance Counter Monitor the service will only show the available counters.

 

PCM+Service+screenshot.png
Figure 5: Windows* Perfmon showing data from Intel® Performance Counter Monitor v1.7

 

Intel® Performance Counter Monitor inside your programs

Thanks to the abstraction layer that the library provides, it has become very easy to monitor the processor metrics inside your application. Before their usage, the performance counters need to be initialized. Afterwards, the counter state can be captured before and after the code section of interest. Different routines capture the counters for cores, sockets, or the complete system, and store their state in corresponding data structures. Additional routines provide the possibility to compute the metric based on these states. The following code snippet shows an example for their usage:

PCM * m = PCM::getInstance();

// program counters, and on a failure just exit

if (m->program() != PCM::Success) return;

SystemCounterState before_sstate = getSystemCounterState();

     [run your code here] 

SystemCounterState after_sstate = getSystemCounterState();

  cout << "Instructions per clock:" << getIPC(before_sstate,after_sstate)

  << "L3 cache hit ratio:" << getL3CacheHitRatio(before_sstate,after_sstate)

  << "Bytes read:" << getBytesReadFromMC(before_sstate,after_sstate)

  << [and so on]...

"CPU resource"-aware scheduling

To assess the potential impact of having precise resource utilization, we have implemented a simple scheduler that executed 1000 compute intensive and 1000 memory-bandwidth intensive jobs in a single thread. The challenge was the existence of non-predictable background load on the system, a rather typical situation in modern multi component systems with many third party components. Figure 6 depicts a possible schedule for a scheduler that is unaware of the background activity.

Scheduler%20without%20Intel%20PCM
Figure 6: Scheduler without Intel® Performance Counter Monitor

If the scheduler can detect (using the provided routines) that a lot of the memory bandwidth is currently used by a different process, it can adjust its schedule accordingly. Our simulations show that such a scheduler executes the 2000 jobs 16% faster than a generic unaware scheduler on the test system.

Scheduler%20with%20Intel%20PCM
Figure 7: Scheduler using Intel® Performance Counter Monitor

Intel PCM version 2.0 Features

Intel PCM version 2.0 adds support for the Intel® Xeon E5 series processor based on Intel microarchitecture codenamed Sandy Bridge EP/EN/E. This processor has a new uncore with lots of monitoring options.

For general info on the Intel® Xeon® E5 processors see this page.

For Intel® Xeon® E5 technical info see this page.

Below is a block diagram of the new processor from the Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Guide.

Intel Xeon E5 series block diagram
Figure 8: Intel® Xeon® E5 series block diagram

The Xeon E5 series processor's uncore has multiple 'boxes' similar to the Xeon E7 processor (Intel microarchitecture codename Westmere-EX). Intel PCM v2.0 supports Intel®QPI and memory metrics for the new processor.

Comparing the output of 'pcm.exe 1' version 1.7 versus version 2.0 on a Xeon E7 (Westmere-EX) based system, the primary differences are:

  • Version 2.0 prints a 'TEMP' column for each core (and socket for Xeon E5 processor series) where 'TEMP' values are temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
  • Version 2.0 also displays the C-state core and package residency. This is the percentage of time that the core (or the whole package) spends in a particular level of C-state. The higher the level, the greater the power savings.

Intel® Xeon® E5 series specific features

The PCM version 2.0 information below applies to the Intel® Xeon® E5 series processor.

PCM version 2.0 adds more Intel® QPI info:

  • the QPI link(s) speed
  • the percentage of in-coming (received) QPI bandwidth used for data
  • the bytes of out-going (transmitted) data and non-data traffic for each link along with percentage utilization for the out-going link.

Please, note that availability of Intel® QPI information may depend on support of Xeon E5 uncore performance monitoring units in your BIOS and the BIOS settings.

PCM version 2.0 also adds energy usage info:

  • Energy usage by socket
  • DRAM energy usage. If the BIOS doesn't support this feature then the DRAM energy will be reported as zero.

PCM-power utility

For the Intel® Xeon® E5 series processor, PCM version 2.0 also provides the pcm-power utility. The MSVS Windows project file for this utility is in the PCM-Power_Win directory.

The pcm-power utility displays, for all cases:

  • For each socket and Intel® QPI port, the percentage of QPI clocks spent in the L0p and L1 lower power states. The L0p power saving state has half the QPI lanes are disabled. In L1 state all the lanes are in standby mode. The above mentioned uncore performance monitoring guide has more information on these metrics (see table 2-102). Please, note that availability of Intel® QPI information may depend on support of Xeon E5 uncore performance monitoring units in your BIOS and the BIOS settings.
  • For each socket, display the energy used, the watts, and the thermal headroom.
  • For the DRAM, display the energy and watts used, if the platform supports this feature. The value displayed will be zero if the DRAM energy display is not supported.

The pcm-power '-m' option displays IMC (Integrated Memory Controller) PMU (Performance Monitoring Unit) power state info. The valid options are:

  • option '-m 0' displays DRAM rank 0 and rank 1 'CKE off' residencies
    • 'CKE off' is a DRAM power saving state so the higher percentage of time you spend in 'CKE off' mode, then the less power the DRAM uses.
    • Rank 0 and rank 1 are two of the ranks of the DRAM.
    • This option is the default IMC PMU display if no other '-m' option is entered.
  • option '-m 1' displays DRAM rank 2 and rank 3 'CKE off' residencies
  • option '-m 2' displays DRAM rank 4 and rank 5 'CKE off' residencies
  • option '-m 3' displays DRAM rank 6 and rank 7 'CKE off' residencies
  • option '-m 4' displays DRAM self-refresh residencies
    • 'self-refresh' mode is another DRAM power saving mode
  • option '-m -1' omits IMC PMU output
    • This is helpful to cut down on the output if you don't want DRAM info.

The pcm-power '-p' option displays PCU (power control unit) PMU power state info. The valid options are:

  • option '-p 0' displays frequency residencies
    • This option uses the 'frequency banding' feature of the PCU PMU to display the percentage of time the cores spend in 3 'bands' of frequency.
    • The default bands are 10, 20 and 40. You can override each band with '-a band0', '-b band1', and '-c band2'. Each band is multiplied by 100 MHz. The default bands then represent the %time the cores are in frequency:
      • Band0: freq >= 1GHz
      • Band1: freq >= 2GHz
      • Band2: freq >= 4GHz
    • This is the default -p option.
    • On an idle system, running with './pcm-power.x "sleep 5" -p 0 -a 0 -b 12 -c 27' gave the output:
      S0; PCUClocks: 3994206932; Freq band 0/1/2 cycles: 98.52%; 92.61%; 0.02%
      Which means on Socket 0, for 3994206932 PCU clockticks, the processor spent:
      • 98.52% in band 0: freq >= 0 GHz,
      • 92.61% in band 1: freq >= 1.2 GHz
      • 0.02% in band 2: freq >= 2.7 GHz. The socket barely got into full nominal frequency (2.7 GHz) or turbo mode (2.8 GHz or higher)
  • option '-p 1' displays core C-state residency
    • The unit is the number of cores on the socket who were in C0, C3 or C6 during the measurement interval.
    • On a busy system one can get:
      S0; PCUClocks: 26512878934; core C0/C3/C6-state residency: 7.28; 0.00; 0.72
      Which means that, for socket 0, during the interval, on average, 7.28 cores were in C0 (the full-power mode), 0.0 cores were in C3 (a low power state) and 0.72 cores were in C6 state (an even lower power state).
  • option '-p 2' displays Prochot (throttled) residencies and thermal frequency limit cycles
    • For instance, on a busy system one can get:
      S0; PCUClocks: 50540355190; Internal prochot cycles: 0.00 %; External prochot cycles:0.00 %; Thermal freq limit cycles:0.00%
      So the processor didn't hit any thermal throttling
  • option '-p 3' displays {Thermal,Power,Clipped} frequency limit cycles
    • On a busy system one can get:
      S0; PCUClocks: 26724849741; Thermal freq limit cycles: 0.00 %; Power freq limit cycles:2.36 %; Clipped freq limit cycles:89.63 %
      So, for socket 0,
      • the freq was limited by thermal constraints 0.0% of the time. This is based on PCU event 0x4 FREQ_MAX_LIMIT_THERMAL_CYCLES.
      • the power usage limited the freq 2.36% of the time. This is based on PCU event 0x5 FREQ_MAX_POWER_CYCLES.
      • the current usage limited the freq 89.63% of the time. This is based on PCU event 0x7 FREQ_MAX_CURRENT_CYCLES.
  • option '-p 4' displays {OS,Power,Clipped} frequency limit cycles
    • On a busy system one can get:
      S0; PCUClocks: 26170529847; OS freq limit cycles: 6.09 %; Power freq limit cycles:2.39 %; Clipped freq limit cycles:91.51 %
      So, for socket 0,
      • the freq was limited by the OS 6.09% of the time. This is based on PCU event 0x6 FREQ_MAX_OS_CYCLES.
      • the power usage limited the freq 2.39% of the time. This is based on the same event as option '-p 3' second event.
      • the current usage limited the freq 91.51% of the time. This is based on the same event as option '-p 3' third event.
  • option '-p -1' omits PCU PMU output

Updates to plugins for Linux Ksysguard and Windows* Perfmon GUI

In addition to the command line tools the graphical plugins for Linux Ksysguard and Windows* Perfmon have been extended with essential energy related metrics (C-states, thermal headroom, processor and DRAM energy).

Ksysguard screenshot
Figure 9: Intel PCM version 2.0 Ksysguard plugin showing energy metrics.

Windows* Perfmon Plugin screenshot
Figure 10: Intel PCM version 2.0 Windows* Perfmon plugin showing energy metrics.

Changelog

 

Version 1.0

  • Initial release

Version 1.5

  • Integration into Windows* perfmon
  • Intel® Atom™ support

Version 1.6

  • Intel® Xeon® E7 series support (Intel microarchitecture code name Westmere-EX)
  • On-core performance metrics of 2nd generation Intel® Core™ processor family (Intel® microarchitecture code name Sandy Bridge)
  • Highly experimental support of some earlier Intel® microarchitectures (e.g. Penryn). Enable by defining PCM_TEST_FALLBACK_TO_ATOM in the cpucounter.cpp
  • Enhanced Linux KDE ksysguard plugin
  • New options for the command line pcm utility
  • Support of >64 cores on Windows 7 and Windows Server 2008 R2
  • Support of Performance Monitoring Unit Sharing Guideline to prevent collisions with other processor performance monitoring agents (e.g. Intel® VTune™ Performance Analyzer)

Version 1.7

  • Intel PCM is now distributed under BSD license. See license.txt file in zip.
  • Support additional processor models with Intel® microarchitecture code name Nehalem
  • New metrics: timestamps via RDTSCP instruction, C0 active core residency and a few other derived metrics
  • Extended custom core configuration facility/mode
  • Bug fixes

Version 2.0

  • Support of Xeon E5 series (based on Intel microarchitecture code name Sandy Bridge EP/EN/E)
  • CSV format output for the pcm command line utility (-csv option)
  • Support of basic energy metrics (availability varies depending on processor architecture): core and package C states, processor and memory DRAM energy, temperature thermal headroom
  • A new command line utility (pcm-power) for extended power and energy monitoring on Xeon E5 series (Intel microarchitecture code name Sandy Bridge EP/EN/E)
  • Frequency residency (bands) statistics
  • Processor and DRAM energy
  • DRAM sleep CKE state statistics
  • DRAM self-refresh statistics
  • QPI power saving state statistics
  • Core C-states statistics
  • Frequency throttling cause statistics
  • Experimental OpenGL 3D visualization tool for 2 Socket Xeon E5 series (Intel microarchitecture code name Sandy Bridge EP)

Version 2.1

  • On-core performance metrics of 3nd generation Intel® Core™ processor family (Intel® microarchitecture code name Ivy Bridge)

Version 2.2

  • Support of SGI UV 2 (up to 256 sockets)
  • Support of uncore metrics for Intel microarchitecture code name Sandy Bridge E (single socket)
  • Added frequency transition statistics for pcm-power tool
  • Bug fixes

Version 2.3

  • Support of Apple Mac OS X 10.7 ("Lion") and OS X 10.8 ("Mountain Lion")
  • Support of FreeBSD
  • new tool for monitoring memory traffic per channel on Intel Xeon processor E5 product family.

Version 2.3.5

  • Experimental Linux perf driver support (see Makefile and LINUX_HOWTO.txt)
  • Fixed cache metrics counting for Intel Xeon E5 based on Intel microarchitecture codenamed Sandy Bridge-EP and Sandy Bridge-E according to erratum
  • Added core C1 residency metric
  • Improved documentation and error messages

Version 2.4

  • Support of memory bandwidth metrics on the 2nd, 3rd and 4th generation Intel® Core™ processors using integrated memory controller counters (Linux).
  • Support of memory bandwidth metrics on additional server systems based on Intel® Xeon® E5 processors.

Version 2.5

  • Support 4th generation Intel® Core™ processors (previously codenamed Haswell)
  • New utility (pcm-tsx) for monitoring Intel® Transactional Synchronization Extensions (Intel® TSX) metrics (transactional success (total/transactional/aborted cycles) and custom TSX events)
  • New utility (pcm-pcie) for monitoring PCIe traffic on Intel® Xeon® E5 processors
  • Impoved the speed of reading performance counters by factor up to 3x using new PCM::getAllCounterStates call
  • Added Windows 2012 support

Version 2.5.1:

  • Support of memory bandwidth metrics on the 2nd, 3rd and 4th generation Intel® Core™ processors using integrated memory controller counters  (Apple OS X).
  • Support on-core metrics for Intel® Atom™ Processor S1200 Series (previously codenamed Centerton)
  • Bug fixes

Version 2.6:

  • Support for Intel® Xeon® E5 v2 processor series (microarchitecture previously codenamed Ivybridge-EP)
  • Support for Intel® Core™ i5-4350U (microarchitecture previously codenamed Haswell ULT)
  • Support for Intel® Atom™ processor C2000 series (microarchitecture previously codenamed Avoton)
  • Support for Intel® Atom™ processor Z3000 series (microarchitecture previously codenamed Baytrail)
  • Support API for programming “off-core response” PMU events. A usage example is in the new pcm-numa utility.
  • Bug fixes

References

For questions and comments about Intel PCM and its use-cases, we recommend the Software Tuning, Performance Optimization & Platform Monitoring forum.

[1] Drysdale, Gillespie, Valles "Performance Insights to Intel® Hyper-Threading Technology"

[2] Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3B: System Programming Guide, Part 2

[3] Intel® Xeon® Processor 7500 Series Uncore Programming Guide

[4] Peggy Irelan and Shihjong Kuo "Performance Monitoring Unit Sharing Guide"

[5] David Levinthal "Performance Analysis Guide for Intel® Core™ i7 Processor and Intel® Xeon™ 5500 processors"

Intel, Xeon, Core, and VTune are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number

Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license. The software license text is included into the code sample.

Intel® Turbo Boost Technology requires a system with Intel® Turbo Boost Technology capability. Consult your PC manufacturer. Performance varies depending on hardware, software and system configuration. For more information, visit http://www.intel.com/technology/turboboost

Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.

This software is subject to the U.S. Export Administration Regulations and other U.S. law, and may not be exported or re-exported to certain countries (Burma, Cuba, Iran, North Korea, Sudan, and Syria) or to persons or entities prohibited from receiving U.S. exports (including Denied Parties, Specially Designated Nationals, and entities on the Bureau of Export Administration Entity List or involved with missile technology or nuclear, chemical or biological weapons).

License and Download

There are downloads available under the Open Source Initiative OSI - The BSD License: Licensing license. Download
For more complete information about compiler optimizations, see our Optimization Notice.

Comments

's picture

Hi,

I am working on a profiling an application where I am interested in L1 and L2 cache misses. I am using Win 7 64bit and I manged to compile the pcm.exe. However, I was unsuccessful with the lib. I used WinDDK i64freebuildenvironment and received following build output

Thank you very much for you help!

best regards,
Grega

Output:
D:projLibraryINTELProfilerWinMSRDriverWin7>build
BUILD: Compile and Link for IA64
BUILD: Loading c:winddk7600.16385.1build.dat...
BUILD: Computing Include file dependencies:
BUILD: Start time: Fri Aug 26 13:45:34 2011
BUILD: Examining d:projlibraryintelprofilerwinmsrdriverwin7 directory for files to compile.
BUILD: Saving c:winddk7600.16385.1build.dat...
BUILD: Compiling and Linking d:projlibraryintelprofilerwinmsrdriverwin7 directory
_NT_TARGET_VERSION SET TO WS03
Compiling - msrmain.c
1>errors in directory d:projlibraryintelprofilerwinmsrdriverwin7
1>d:projlibraryintelprofilerwinmsrdriverwin7msrmain.c(186) : error C4013: '__writemsr' undefined; assuming extern returning int
1>d:projlibraryintelprofilerwinmsrdriverwin7msrmain.c(201) : error C4013: '__readmsr' undefined; assuming extern returning int
Linking Executable - objfre_wnet_ia64ia64msr.sys
1>link : error LNK1181: cannot open input file 'd:projlibraryintelprofilerwinmsrdriverwin7objfre_wnet_ia64ia64msrmain.obj'
BUILD: Finish time: Fri Aug 26 13:45:35 2011
BUILD: Done

3 files compiled - 2 Errors
1 executable built - 1 Error

Roman Dementiev (Intel)'s picture

Grega,

the "IA64" build environment is for Itanium processors. Intel PCM does not support Itanium. I assume you have a 64-bit x86 processor with a microarchitecture listed in the article ("Nehalem", "Westmere", "Sandy-Bridge"). In that case you should select Windows 7 x64 build environment.

Roman

's picture

Hi, Roman

Do you have a plan to release next version of IntelPCM? Could you tell us the date? And, what will be enhanced in next version?

Thanks.

Br,
Bright

's picture

Hi Roman,

In response to your query dated 25 Aug, I run 32 bit Windows XP SP3 on my laptop, and so does my colleague on his laptop. Everytime I run pcm.exe on my machine, regardless of the commandline switch I get the folllowng message:

Copyright (c) 2009-2011 Intel Corporation

Starting MSR service failed with error 3
Can not access CPU counters
You must have signed msr.sys driver in your current directory and have administrator rights to run this program

And I have admin rights on my machine.

Roman Dementiev (Intel)'s picture

Chai,

can you try the following:
1. make sure pcm.exe and msr.sys are in the same directory, like c:pcm
2. chdir c:pcm
3. pcm.exe --uninstallDriver (not that you must run it from that directory)
4. pcm.exe 1 (this installs the driver again and runs the tool)

Let me know if that works.

Best regards,
Roman

Roman Dementiev (Intel)'s picture

pcm.exe must be run from the same directory where the msr.sys driver is.

's picture

Hi Roman,

I was wondering if there will be support for some other processors soon.
Unfortunately, I tried to use the tool on my two machines unsuccessfully (my machine configurations are shown below).

1. Machine1's /proc/cpuinfo shows 2 cores:
Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
and when I run pcm.x it says:
Error: unsupported processor. Only Intel(R) processors are supported (Atom(R) and microarchitecture codename Nehalem, Westmere and Sandy Bridge). CPU model: 23

2. Machine2's /proc/cpuinfo shows 8 cores:
Intel(R) Xeon(R) CPU X5355 @ 2.66GHz
Error: unsupported processor. Only Intel(R) processors are supported (Atom(R) and microarchitecture codename Nehalem, Westmere and Sandy Bridge). CPU model: 15

I'm using v1.6 that I downloaded from this webpage, but couldn't find what the support page is for newer versions, if any.

Thanks!
Chris

's picture

Hi,

1. Is there anyway to get count of L1-L3 cache sizes and cache Line width in bytes?
2. I am using a Core i7 2nd Sandy Bridge which is not recognized by library. Here is the output: Unsupported processor, CPU Model: 45
I tried the solution provided to Saurabh still have the same issue.

3. It would be great if you add capabilities such as detection of AVX, SSE2, MMX and ...

Thanks

Roman Dementiev (Intel)'s picture

Chris,

you might try the (very experimental, limited support) PCM_TEST_FALLBACK_TO_ATOM option described above in the article and mentioned in the comments already. Not that older architectures do not support any uncore metrics and have a smaller number of on-core counters and available metrics.

Roman

Roman Dementiev (Intel)'s picture

Pourya,

detection of cache topology is out of the scope of this sample code. Please use instead the source code package available at http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/ . It has also CPUID routines which you can use/customize to detect AVX, SSE2, etc. The CPUID instruction with the feature flags is described here: http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html

To run the tool you might try to replace: "SANDY_BRIDGE = 42" => "SANDY_BRIDGE = 45"

I could not find your cpu model 45 dec (= Ext model: 01 Model: 1101 in binary) or 0x2D in Table 5-3 of the latter document or in this summary: http://software.intel.com/en-us/articles/intel-processor-identification-with-cpuid-model-and-family-numbers/ . Could you share the exact product name of your processor? like Intel® Core™ i7-2600

Roman

Pages