| July 25, 2010 1:00 AM PDT | |
Download Code Sample
IntelPerformanceCounterMonitorV1.7.zip
The complexity of computing systems has tremendously increased over the last decades. Hierarchical cache subsystems, non-uniform memory, simultaneous multithreading and out-of-order execution have a huge impact on the performance and compute capacity of modern processors.
Figure 1: “CPU Utilization” measures only the time a thread is scheduled on a core
Software that understands and dynamically adjusts to resource utilization of modern processors has performance and power advantages. The Intel® Performance Counter Monitor provides sample C++ routines and utilities to estimate the internal resource utilization of the latest Intel® Xeon® and Core™ processors and gain a significant performance boost
When the CPU utilization does not tell you the utilization of the CPU
CPU utilization number obtained from operating system (OS) is a metric that has been used for many purposes like product sizing, compute capacity planning, job scheduling, and so on. The current implementation of this metric (the number that the UNIX* “top” utility and the Windows* task manager report) shows the portion of time slots that the CPU scheduler in the OS could assign to execution of running programs or the OS itself; the rest of the time is idle. For compute-bound workloads, the CPU utilization metric calculated this way predicted the remaining CPU capacity very well for architectures of 80ies that had much more uniform and predictable performance compared to modern systems. The advances in computer architecture made this algorithm an unreliable metric because of introduction of multi core and multi CPU systems, multi-level caches, non-uniform memory, simultaneous multithreading (SMT), pipelining, out-of-order execution, etc.

Figure 2: The complexity of a modern multi-processor, multi-core system
A prominent example is the non-linear CPU utilization on processors with Intel® Hyper-Threading Technology (Intel® HT Technology). Intel® HT technology is a great performance feature that can boost performance by up to 30%. However, HT-unaware end users get easily confused by the reported CPU utilization: Consider an application that runs a single thread on each physical core. Then, the reported CPU utilization is 50% even though the application can use up to 70%-100% of the execution units. Details are explained in [1].
A different example is the CPU utilization for “memory throughput”-intensive workloads on multi-core systems. The bandwidth test “stream” already saturates the capacity of memory controller with fewer threads than there are cores available.
Abstraction Level for Performance Monitoring Units
The good news is that Intel processors already provide the capability to monitor performance events inside processors. In order to obtain a more precise picture of CPU resource utilization we rely on the dynamic data obtained from the so-called performance monitoring units (PMU) implemented in Intel’s processors. We concentrate on the advanced feature set available in the current Intel Xeon 5500, 5600, 7500, E7 and Core i7 processor series [2-4].
We have implemented a basic set of routines with a high level interface that are callable from user C++ application and provide various CPU performance metrics in real-time. In contrast to other existing frameworks like PAPI* and Linux* “perf” we support not only core but also uncore PMUs of Intel processors (including the recent Intel Xeon E7 processor series). The uncore is the part of the processor that contains the integrated memory controller and the Intel® QuickPath Interconnect to the other processors and the I/O hub. In total, the following metrics are supported:
Intel® PCM version 1.6 supports on-core performance metrics (like instructions per clock cycle, L3 cache misses) of 2nd generation Intel® CoreTM processor family (Intel® microarchitecture code name Sandy Bridge) and an experimental support of some earlier Intel® microarchitectures (e.g. Penryn): it can be enabled by defining PCM_TEST_FALLBACK_TO_ATOM in the cpucounter.cpp .
I want to see these counters!
As an additional goody, the package includes easy-to-use command line and graphical utilities that are based on these routines. They can be used out-of-the box by users which cannot or do not want to integrate the routines in their code but are willing to monitor and understand the CPU capacity limits in real-time.
Figure 3 shows the screen shot of the command line utility on the Windows* platform. Whereas the Linux* version can rely on the MSR kernel module that is provided with the Linux kernel, no such facility is available on Windows. For Windows, a sample implementation of a Windows driver provides a similar interface.

Figure 3: Intel Performance Counter Monitor command line version
But there is more to come. For the Linux operating system, the package includes an adaptor that plugs into the KDE* utility ksysguard. Using this daemon, it is possible to graph the various metrics in real-time. Figure 4 shows a screen shot where some of the metrics are displayed during a workload run.

Figure 4: The KDE utility ksysguard on Linux can graph performance counters using a plug-in.
Since these utilities provide a direct insight into the system, they can even be used to quickly find and understand fundamental performance bottlenecks in real-time. (In contrast to the Intel® VTuneTM Performance Analyzer, they won’t however tell you what parts of the application are causing the performance issue.)
Since version 1.5 the Intel® Performance Counter Monitor package contains a Windows* service, based on Microsoft .Net* 2.0 or better, that will create performance counters that can be shown in the Perfmon program that is delivered with the Microsoft Windows* OS. Microsoft's perfmon is capable of showing many useful performance counters on the Windows* OS like disk activity, memory usage, cpu load. More information about perfmon for Windows* 7 and Windows* 2008/R2 can be found at here (but perfmon has been available for many releases of Windows now). Please read the Windows_howto.rtf file on how to install and remove the service for Intel® PCM.
For all of the above mentioned hardware counters on the Nehalem and Westmere based platforms, a corresponding perfmon counter is created and therefore all features supported by perfmon are also available for these counters like logging over time in a file or database. For Intel® Atom processors the perfmon counters for memory and Intel® QPI bandwidth and L3 Cache Misses will always show 0 for reasons mentioned above. In a future update of Intel® Performance Counter Monitor the service will only show the available counters.

Intel® Performance Counter Monitor inside your programs
Thanks to the abstraction layer that the library provides, it has become very easy to monitor the processor metrics inside your application. Before their usage, the performance counters need to be initialized. Afterwards, the counter state can be captured before and after the code section of interest. Different routines capture the counters for cores, sockets, or the complete system, and store their state in corresponding data structures. Additional routines provide the possibility to compute the metric based on these states. The following code snippet shows an example for their usage:

Figure 5: Scheduler without Intel® Performance Counter Monitor
If the scheduler can detect (using the provided routines) that a lot of the memory bandwidth is currently used by a different process, it can adjust its schedule accordingly. Our simulations show that such a scheduler executes the 2000 jobs 16% faster than a generic unaware scheduler on the test system.

Figure 6: Scheduler using Intel® Performance Counter Monitor Changelog
Version 1.0
- Initial release
Version 1.5
- Integration into Windows* perfmon
- Intel® AtomTM support
Version 1.6
- Intel Xeon E7 series support (Intel microarchitecture code name Westmere-EX)
- On-core performance metrics of 2nd generation Intel® CoreTM processor family (Intel® microarchitecture code name Sandy Bridge)
- Highly experimental support of some earlier Intel® microarchitectures (e.g. Penryn). Enable by defining PCM_TEST_FALLBACK_TO_ATOM in the cpucounter.cpp
- Enhanced Linux KDE ksysguard plugin
- New options for the command line pcm utility
- Support of >64 cores on Windows 7 and Windows Server 2008 R2
- Support of Performance Monitoring Unit Sharing Guideline to prevent collisions with other processor performance monitoring agents (e.g. Intel® VTuneTM Performance Analyzer)
Version 1.7
- Intel PCM is distributed under new BSD license
- Support additional processor models with Intel® microarchitecture code name Nehalem
- New metrics: timestamps via RDTSCP instruction, C0 active core residency and a few other derived metrics
- Extended custom core configuration facility/mode
- Bug fixes
For questions and comments about Intel PCM and its use-cases, we recommend the Software Tuning, Performance Optimization & Platform Monitoring forum.
[1] Drysdale, Gillespie, Valles “Performance Insights to Intel® Hyper-Threading Technology”
[2] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B: System Programming Guide, Part 2
[3] Intel® Xeon® Processor 7500 Series Uncore Programming Guide
[4] Peggy Irelan and Shihjong Kuo “Performance Monitoring Unit Sharing Guide”
[5] David Levinthal ”Performance Analysis Guide for Intel® Core™ i7 Processor and Intel® Xeon™ 5500 processors”
Intel, Xeon, Core, and VTune are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number
Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license. The software license text is included into the code sample.
Intel® Turbo Boost Technology requires a system with Intel® Turbo Boost Technology capability. Consult your PC manufacturer. Performance varies depending on hardware, software and system configuration. For more information, visit http://www.intel.com/technology/turboboost
Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
This software is subject to the U.S. Export Administration Regulations and other U.S. law, and may not be exported or re-exported to certain countries (Burma, Cuba, Iran, North Korea, Sudan, and Syria) or to persons or entities prohibited from receiving U.S. exports (including Denied Parties, Specially Designated Nationals, and entities on the Bureau of Export Administration Entity List or involved with missile technology or nuclear, chemical or biological weapons).
IntelPerformanceCounterMonitorV1.7.zip
The complexity of computing systems has tremendously increased over the last decades. Hierarchical cache subsystems, non-uniform memory, simultaneous multithreading and out-of-order execution have a huge impact on the performance and compute capacity of modern processors.
Figure 1: “CPU Utilization” measures only the time a thread is scheduled on a core
Software that understands and dynamically adjusts to resource utilization of modern processors has performance and power advantages. The Intel® Performance Counter Monitor provides sample C++ routines and utilities to estimate the internal resource utilization of the latest Intel® Xeon® and Core™ processors and gain a significant performance boost
When the CPU utilization does not tell you the utilization of the CPU
CPU utilization number obtained from operating system (OS) is a metric that has been used for many purposes like product sizing, compute capacity planning, job scheduling, and so on. The current implementation of this metric (the number that the UNIX* “top” utility and the Windows* task manager report) shows the portion of time slots that the CPU scheduler in the OS could assign to execution of running programs or the OS itself; the rest of the time is idle. For compute-bound workloads, the CPU utilization metric calculated this way predicted the remaining CPU capacity very well for architectures of 80ies that had much more uniform and predictable performance compared to modern systems. The advances in computer architecture made this algorithm an unreliable metric because of introduction of multi core and multi CPU systems, multi-level caches, non-uniform memory, simultaneous multithreading (SMT), pipelining, out-of-order execution, etc.
Figure 2: The complexity of a modern multi-processor, multi-core system
A prominent example is the non-linear CPU utilization on processors with Intel® Hyper-Threading Technology (Intel® HT Technology). Intel® HT technology is a great performance feature that can boost performance by up to 30%. However, HT-unaware end users get easily confused by the reported CPU utilization: Consider an application that runs a single thread on each physical core. Then, the reported CPU utilization is 50% even though the application can use up to 70%-100% of the execution units. Details are explained in [1].
A different example is the CPU utilization for “memory throughput”-intensive workloads on multi-core systems. The bandwidth test “stream” already saturates the capacity of memory controller with fewer threads than there are cores available.
Abstraction Level for Performance Monitoring Units
The good news is that Intel processors already provide the capability to monitor performance events inside processors. In order to obtain a more precise picture of CPU resource utilization we rely on the dynamic data obtained from the so-called performance monitoring units (PMU) implemented in Intel’s processors. We concentrate on the advanced feature set available in the current Intel Xeon 5500, 5600, 7500, E7 and Core i7 processor series [2-4].
We have implemented a basic set of routines with a high level interface that are callable from user C++ application and provide various CPU performance metrics in real-time. In contrast to other existing frameworks like PAPI* and Linux* “perf” we support not only core but also uncore PMUs of Intel processors (including the recent Intel Xeon E7 processor series). The uncore is the part of the processor that contains the integrated memory controller and the Intel® QuickPath Interconnect to the other processors and the I/O hub. In total, the following metrics are supported:
- Core: instructions retired, elapsed core clock ticks, core frequency including Intel® Turbo boost technology, L2 cache hits and misses, L3 cache misses and hits (including or excluding snoops).
- Uncore: read bytes from memory controller(s), bytes written to memory controller(s), data traffic transferred by the Intel® QuickPath Interconnect links.
Intel® PCM version 1.6 supports on-core performance metrics (like instructions per clock cycle, L3 cache misses) of 2nd generation Intel® CoreTM processor family (Intel® microarchitecture code name Sandy Bridge) and an experimental support of some earlier Intel® microarchitectures (e.g. Penryn): it can be enabled by defining PCM_TEST_FALLBACK_TO_ATOM in the cpucounter.cpp .
I want to see these counters!
As an additional goody, the package includes easy-to-use command line and graphical utilities that are based on these routines. They can be used out-of-the box by users which cannot or do not want to integrate the routines in their code but are willing to monitor and understand the CPU capacity limits in real-time.
Figure 3 shows the screen shot of the command line utility on the Windows* platform. Whereas the Linux* version can rely on the MSR kernel module that is provided with the Linux kernel, no such facility is available on Windows. For Windows, a sample implementation of a Windows driver provides a similar interface.
Figure 3: Intel Performance Counter Monitor command line version
But there is more to come. For the Linux operating system, the package includes an adaptor that plugs into the KDE* utility ksysguard. Using this daemon, it is possible to graph the various metrics in real-time. Figure 4 shows a screen shot where some of the metrics are displayed during a workload run.
Figure 4: The KDE utility ksysguard on Linux can graph performance counters using a plug-in.
Since these utilities provide a direct insight into the system, they can even be used to quickly find and understand fundamental performance bottlenecks in real-time. (In contrast to the Intel® VTuneTM Performance Analyzer, they won’t however tell you what parts of the application are causing the performance issue.)
Since version 1.5 the Intel® Performance Counter Monitor package contains a Windows* service, based on Microsoft .Net* 2.0 or better, that will create performance counters that can be shown in the Perfmon program that is delivered with the Microsoft Windows* OS. Microsoft's perfmon is capable of showing many useful performance counters on the Windows* OS like disk activity, memory usage, cpu load. More information about perfmon for Windows* 7 and Windows* 2008/R2 can be found at here (but perfmon has been available for many releases of Windows now). Please read the Windows_howto.rtf file on how to install and remove the service for Intel® PCM.
For all of the above mentioned hardware counters on the Nehalem and Westmere based platforms, a corresponding perfmon counter is created and therefore all features supported by perfmon are also available for these counters like logging over time in a file or database. For Intel® Atom processors the perfmon counters for memory and Intel® QPI bandwidth and L3 Cache Misses will always show 0 for reasons mentioned above. In a future update of Intel® Performance Counter Monitor the service will only show the available counters.
Figure 5: Windows* Perfmon showing data from Intel® Performance Counter Monitor
Intel® Performance Counter Monitor inside your programs
Thanks to the abstraction layer that the library provides, it has become very easy to monitor the processor metrics inside your application. Before their usage, the performance counters need to be initialized. Afterwards, the counter state can be captured before and after the code section of interest. Different routines capture the counters for cores, sockets, or the complete system, and store their state in corresponding data structures. Additional routines provide the possibility to compute the metric based on these states. The following code snippet shows an example for their usage:
PCM * m = Monitor::getInstance();
if (m->program() != PCM::Success) // program counters
return -1; // error occured during programming
SystemCounterState before_sstate = getSystemCounterState();
[run your code here]
SystemCounterState after_sstate = getSystemCounterState();
cout << “Instructions per clock:“ << getIPC(before_sstate,after_sstate)
<< “L3 cache hit ratio:” << getL3CacheHitRatio(before_sstate,after_sstate)
<< “Bytes read:”<< getBytesReadFromMC(before_sstate,after_sstate)
<< [and so on]…
“CPU resource“-aware scheduling
To assess the potential impact of having precise resource utilization, we have implemented a simple scheduler that executed 1000 compute intensive and 1000 memory-bandwidth intensive jobs in a single thread. The challenge was the existence of non-predictable background load on the system, a rather typical situation in modern multi component systems with many third party components. Figure 5 depicts a possible schedule for a scheduler that is unaware of the background activity.
Figure 5: Scheduler without Intel® Performance Counter Monitor
If the scheduler can detect (using the provided routines) that a lot of the memory bandwidth is currently used by a different process, it can adjust its schedule accordingly. Our simulations show that such a scheduler executes the 2000 jobs 16% faster than a generic unaware scheduler on the test system.
Figure 6: Scheduler using Intel® Performance Counter Monitor
Version 1.0
- Initial release
Version 1.5
- Integration into Windows* perfmon
- Intel® AtomTM support
Version 1.6
- Intel Xeon E7 series support (Intel microarchitecture code name Westmere-EX)
- On-core performance metrics of 2nd generation Intel® CoreTM processor family (Intel® microarchitecture code name Sandy Bridge)
- Highly experimental support of some earlier Intel® microarchitectures (e.g. Penryn). Enable by defining PCM_TEST_FALLBACK_TO_ATOM in the cpucounter.cpp
- Enhanced Linux KDE ksysguard plugin
- New options for the command line pcm utility
- Support of >64 cores on Windows 7 and Windows Server 2008 R2
- Support of Performance Monitoring Unit Sharing Guideline to prevent collisions with other processor performance monitoring agents (e.g. Intel® VTuneTM Performance Analyzer)
Version 1.7
- Intel PCM is distributed under new BSD license
- Support additional processor models with Intel® microarchitecture code name Nehalem
- New metrics: timestamps via RDTSCP instruction, C0 active core residency and a few other derived metrics
- Extended custom core configuration facility/mode
- Bug fixes
For questions and comments about Intel PCM and its use-cases, we recommend the Software Tuning, Performance Optimization & Platform Monitoring forum.
[1] Drysdale, Gillespie, Valles “Performance Insights to Intel® Hyper-Threading Technology”
[2] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B: System Programming Guide, Part 2
[3] Intel® Xeon® Processor 7500 Series Uncore Programming Guide
[4] Peggy Irelan and Shihjong Kuo “Performance Monitoring Unit Sharing Guide”
[5] David Levinthal ”Performance Analysis Guide for Intel® Core™ i7 Processor and Intel® Xeon™ 5500 processors”
Intel, Xeon, Core, and VTune are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number
Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license. The software license text is included into the code sample.
Intel® Turbo Boost Technology requires a system with Intel® Turbo Boost Technology capability. Consult your PC manufacturer. Performance varies depending on hardware, software and system configuration. For more information, visit http://www.intel.com/technology/turboboost
Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
This software is subject to the U.S. Export Administration Regulations and other U.S. law, and may not be exported or re-exported to certain countries (Burma, Cuba, Iran, North Korea, Sudan, and Syria) or to persons or entities prohibited from receiving U.S. exports (including Denied Parties, Specially Designated Nationals, and entities on the Bureau of Export Administration Entity List or involved with missile technology or nuclear, chemical or biological weapons).
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (64) 
| October 26, 2010 2:32 AM PDT
Thomas Willhalm (Intel)
|
Matt, Thank you for your interest in our library. The focus of this sample code is to give you an example how processor counters can be accessed. I am therefore very sorry that providing and maintaining binaries is currently beyond the scope of this effort. We will look into this and consider it for future version. Kind regards Thomas |
| October 26, 2010 3:27 AM PDT
Roman Dementiev (Intel)
|
Matt, thank you for bringing the issue with the BAT files to our attention. These mymake.bat files have been removed by a virus scan. In fact these batch files just deleted the output directories left from the previous build invocations and run "nmake" tool. You can just simply run "nmake" from the Windows Driver Kit build environment to build the driver. We will address this issue in the next version of the sample code. Best regards, Roman |
| November 28, 2010 7:34 PM PST
Mugilan Chitambram |
Hi guys, I'm from Intel as well, working on Power and Performance and I have a question on the software. I would like to know whether this tool will be useful in measuring memory bandiwidth such Memory Read Bytes/sec and Memory Write Bytes/sec ? I couldnt find this counters/option in WIndows perfmon tool Thank you. |
| November 30, 2010 2:45 AM PST
Roman Dementiev (Intel)
|
Hi Mugilan, sure, our command line PCM tool for can already show you the estimation of memory controller traffic with the metrics you asked. In addition, we plan to release a Windows perfmon plugin to visualize this information graphically. Best regards, Roman |
| December 17, 2010 5:54 PM PST
Raman Muthukrishnan |
Hi, This article is very useful. We have Ubuntu 2.6.31.9 (distribution 9.10). Do you source code for linux as well or have any suggestions on how to port it? In your slides, I saw a linux plugin to view the results, but was not able to find it. Thank you, Raman |
| December 20, 2010 3:23 PM PST
Raman Muthukrishnan |
Sorry for asking question before trying. We tried it in linux, and it works wonderfully.. Thank you for this, Raman |
| December 21, 2010 4:53 AM PST
Roman Dementiev (Intel)
|
Hi Raman, thank you for feedback. The Linux version is included into package as well. Just type "make" in the main directory: pcm.x (command line utility) and cpusensor.x (KDE ksysguard plugin) binaries should be compiled. We tested on SUSE SLES10 and 11 but not on Ubuntu (but I guess there should be no problems as well). Roman |
| December 29, 2010 4:35 PM PST
Raman Muthukrishnan |
Hi Roman, We are not able to see the TLB miss counts.. What can we do to get access to these counts? Thank you, Raman |
| December 30, 2010 12:09 PM PST
Roman Dementiev (Intel)
|
Hi Raman, Performance Monitoring Unit (PMU) is a limited resource: it can not count all available events simultaneously. The recent Intel processors have four configurable performance counters per logical core (the uncore counters are different). Therefore Intel PCM configures only to count L2 and L3 cache statistics on on-core PMU per default. However, you can change the default configuration by changing the source code (the parameters of the 'program' call). You can do it in your code if you have instrumented it with Intel PCM, or try to change/extend the source code of pcm.x utility included into the package (if you prefer to use this stand-alone utility). Example for custom event configuration: Monitor::CustomCoreEventDescription MyEvents[4]; // only 4 fully programmable counters are supported on microarchitecture codenamed Nehalem/Westmere MyEvents[0].event_number = 0x0e; // UOPS_ISSUED.ANY event number (for a example) MyEvents[0].umask_value = 0x01; // UOPS_ISSUED.ANY umask MyEvents[1].event_number = 0x08; // DTLB_LOAD_MISSES.ANY event number (Counts all load misses that cause a page walk) MyEvents[1].umask_value = 0x01; // DTLB_LOAD_MISSES.ANY umask // add your own event ids here for on-core counter 2 and 3 MyEvents[2].event_number = ??; MyEvents[2].umask_value = ??; MyEvents[3].event_number = ??; MyEvents[3].umask_value = ??; if (m->good()) m->program(Monitor::CUSTOM_CORE_EVENTS,&MyEvents); // ... for system-wide PMU state monitoring (like in the pcm.x utility): SystemCounterState sstate1 = getSystemCounterState(); // run cour code that you want to measure // … // … SystemCounterState sstate2 = getSystemCounterState(); uint64 UOPS_ISSUED_ANY_events = getNumberOfCustomEvents(0,sstate1, sstate2); // read number of occurred events from counter 0 uint64 DTLB_LOAD_MISSES_ANY_events = getNumberOfCustomEvents(1,sstate1, sstate2); // read number of occurred events from counter 1 uint64 eventmetric2 = getNumberOfCustomEvents(2,sstate1, sstate2); // read number of occurred events from counter 2 uint64 eventmetric3 = getNumberOfCustomEvents(3,sstate1, sstate2); // read number of occurred events from counter 3 Hope it helps, Roman |
| December 30, 2010 12:15 PM PST
Roman Dementiev (Intel)
|
the custom event IDs and umasks can be found in Appendix A of "Intel® 64 and IA-32 Architectures Software Developer’s Manual" (Volume 3B: System Programming Guide, Part 2) available at http://www.intel.com/products/processor/manuals/ Roman |
| December 31, 2010 5:47 PM PST
Raman Muthukrishnan |
Hi Roman, Thank you for this detailed answer. Based on this we were able to modify our code to get the TLB miss counts. Thank you, Raman |
| April 22, 2011 9:15 AM PDT
john-baron1
|
This is a very interesting tool. Is there an update that supports Westmere-EX? On a system with E7-4870 processors I get Error: unsupported processor. Only Intel(R) processors are supported (Atom(R) and microarchitecture codename Nehalem or Westmere). Access to Intel(r) Performance Counter Monitor has denied. thanks, John |
| April 26, 2011 6:08 AM PDT
Roman Dementiev (Intel)
|
Hi John, thank you for your question. We are preparing a new version that includes the support of microarchitectures codename Westmere-EX and Sandy Bridge, new processor performance metrics, and also many user interface enhancements. The new version will appear soon. Stay tuned! Roman |
| May 10, 2011 2:41 PM PDT
Ken | I would like to see examples for reading these counters, especially cache hits/misses, from older architectures like E2140 or E8400. |
| May 12, 2011 12:59 AM PDT
Nagendra Gulur |
Hi, I am looking for access to the MMU contents. In particular, is it possible to monitor page faults and trace out virtual to physical address translations? This is an OS function ofcourse but I am not sure if access to this requires architecture-specific hooks. Thanks Nagendra |
| May 13, 2011 3:04 AM PDT
Roman Dementiev (Intel)
|
Ken, the next version will contain some experimental support of older architectures. Best regards, Roman |
| May 13, 2011 3:09 AM PDT
Roman Dementiev (Intel)
|
Nagendra, On Linux you can try SystemTap to monitor page faults. Maybe it is possible to create or customize SystemTap scripts in ("http://www.scribd.com/doc/54723138/51/Probing-page-faults") to trace out virtual to physical address translations. Best regards, Roman |
| May 17, 2011 12:15 AM PDT
Rafiq Ahamed
|
Good content. Assuming same PMU metrics are supported on Itanium processors too, do you any sample program or library for the same? Thanks! |
| May 25, 2011 5:27 AM PDT
Andy Liu |
Hi, This is great utility to get more insight to multi-core software! I have a question on cpucounters.cpp, line 790, which implements UncoreCounterState::readAndAggregate() method. Regarding to line 804 and 805, WRITE and READ are aggregated, say, count both MSR_UNCORE_PMC0 and MSR_UNCORE_PMC1 twice in the case of 2-core processor. It seems wrongly doubled than the expected value, because 2 cores share the same UNCORE counter: UNC_QMC_WRITES_FULL_ANY_EVTNR or UNC_QMC_NORMAL_READS_ANY_EVTNR. Hope to confirm whether my understanding is correct. Best regards, Andy |
| May 26, 2011 3:01 AM PDT
Andy Liu |
Sorry, today I notice that an average value is calculated after readAndAggregate() at line 836,837. Then it's the expected value. Brs, Andy |
| May 26, 2011 10:48 AM PDT
Roman Dementiev (Intel)
|
Andy, thank you for your feedback. The uncore event values are normalized in the getSocketCounterState function to prevent double counting. Roman |
| June 2, 2011 4:22 PM PDT
Sid |
Hi, I'm trying to run the performance on Intel Core i7 CPU running Linux and it throws up an Unsupported CPU error, i.e., "Error: unsupported processor. Only Intel(R) processors are supported (Atom(R) and microarchitecture codename Nehalem or Westmere)." Can not access CPU counters I thought the core i7 processor is supported? This is the model name information my procinfo file: model name : Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz Any help would be appreciated! |
| June 4, 2011 2:50 AM PDT
Roman Dementiev (Intel)
|
Sid, just released version 1.6 of the package should support your processor. Best regards, Roman |
| June 7, 2011 1:58 PM PDT
arun | wow.. great..... |
| June 16, 2011 2:18 PM PDT
Peter Bartalos |
Hi, is it possible to use this tool on servers running Windows server 2008 R2? I've tried it, but running pcm.exe (run with -ns -nc parameters ... it was built using visual studio from PCM_Win) results in the following error: Opening service manager failed with error 5 Can not access CPU counters You must have signed msr.sys driver in your current directory and have administrator rights to run this program (Previously I used to get another error: "Starting MSR service failed with error 1275". Now I'm not.) In the WinMSRDriver7 directory, there is only a Win7 and WinXP subdir, so I used Win7 to create the driver (and used Win7 WDK build environment - the Win server 2008 env. didn't create the driver). My original problem is that I need a programmatic access to the last level cache miss counter and this tool seems to be able to do it. Thanks for any advice. The best, Peter |
| June 16, 2011 2:38 PM PDT
Peter Bartalos |
Hi, is it possible to use this tool on servers running Windows server 2008 R2? I've tried it, but running pcm.exe (run with -ns -nc parameters ... it was built using visual studio from PCM_Win) results in the following error: Opening service manager failed with error 5 Can not access CPU counters You must have signed msr.sys driver in your current directory and have administrator rights to run this program (Previously I used to get another error: "Starting MSR service failed with error 1275". Now I'm not.) In the WinMSRDriver7 directory, there is only a Win7 and WinXP subdir, so I used Win7 to create the driver (and used Win7 WDK build environment - the Win server 2008 env. didn't create the driver). My original problem is that I need a programmatic access to the last level cache miss counter and this tool seems to be able to do it. Thanks for any advice. The best, Peter |
| June 17, 2011 9:59 AM PDT
Peter Bartalos |
Hi, is it possible to use this tool on servers running Windows server 2008 R2? I've tried it, but running pcm.exe (run with -ns -nc parameters ... it was built using visual studio from PCM_Win) results in the following error: Opening service manager failed with error 5 Can not access CPU counters You must have signed msr.sys driver in your current directory and have administrator rights to run this program (Previously I used to get another error: "Starting MSR service failed with error 1275". Now I'm not.) In the WinMSRDriver7 directory, there is only a Win7 and WinXP subdir, so I used Win7 to create the driver (and used Win7 WDK build environment - the Win server 2008 env. didn't create the driver). My original problem is that I need a programmatic access to the last level cache miss counter on a Windows machine and this tool seems to be able to do it. Thanks for any advice. The best, Peter |
| June 18, 2011 11:32 PM PDT
Roman Dementiev (Intel)
|
Hi Peter, the Win7 driver sources should work for Windows Server 2008 R2. The Windows error codes you have mentioned might indicate an issue with your driver signing signature or signing procedure: please see http://msdn.microsoft.com/en-us/library/ms537361(VS.85).aspx and http://technet.microsoft.com/en-us/library/dd919200(WS.10).aspx and links therein. Best reggards, Roman |
| June 27, 2011 9:28 AM PDT
Saurabh |
Hi, I'm trying to run the performance on Intel Core i7 CPU and it throws up an Unsupported CPU error on both windows 7-64 bit and linux, i.e., Intel(r) Performance Counter Monitor Copyright (c) 2009-2011 Intel Corporation Error: unsupported processor. Only Intel(R) processors are supported (Atom(R) an d microarchitecture codename Nehalem, Westmere and Sandy Bridge). CPU Model: 30 Access to Intel(r) Performance Counter Monitor has denied (no MSR access). Please help. This is the model name information my procinfo file: model name : Intel(R) Core(TM) i7 CPU 720QM Thanks in advance! |
| June 28, 2011 5:32 AM PDT
Roman Dementiev (Intel)
|
Hi Saurabh, it seems we have overlooked that particular model of Intel Core i7. Its support will be added in the future versions of Intel PCM. For a moment please try the following workaround: replace the id 26 with your cpu model 30 in cpucounters.h : "NEHALEM_EP = 26" => "NEHALEM_EP = 30" and recompile. Best regards, Roman |
| July 12, 2011 1:40 PM PDT
Peter Bartalos |
I've struggled to run PCM on Windows Server 2008 R2, 64 bit, without the need to have a need to use a Software Publishing Certificate that chains to an approved certification authority. I've tried the F-8 option to reboot with disabling digital driver enforcement. Didn't work. I've also tried the DDISABLE_INTEGRITY_CHECKS setting using bcededit. Didn't work. Thus, I installed ubuntu on the machine. Did work ;-) Since the server I use has two Inter(R) Xeon 5130, 2GHZ processors (i.e. only experimental support from PCM) I had to 1) enable PCM_TEST_FALLBACK_TO_ATOM in the cpucounter.cpp, 2) permit write to /dev/cpu/*/msr , and 3) execute "sudo modprobe msr". The 2) and 3) must be done after each system reboot. The best, Peter |
| August 9, 2011 7:23 AM PDT
Bright |
brigchen@ubuntu:/mnt/hgfs/vm-share/IntelPerformanceCounterMonitor V1.6$ make g++ -g -O3 -c msr.cpp g++ -g -O3 -c cpucounters.cpp g++ -g -O3 -c cpucounterstest.cpp cpucounterstest.cpp: In function ‘void MySystem(char*)’: cpucounterstest.cpp:146: warning: ignoring return value of ‘int system(const char*)’, declared with attribute warn_unused_result cpucounterstest.cpp: In function ‘std::string unit_format(IntType) [with IntType = long long unsigned int]’: cpucounterstest.cpp:414: instantiated from here cpucounterstest.cpp:71: warning: format ‘%4d’ expects type ‘int’, but argument 3 has type ‘long long unsigned int’ cpucounterstest.cpp:414: instantiated from here cpucounterstest.cpp:76: warning: format ‘%4d’ expects type ‘int’, but argument 3 has type ‘long long unsigned int’ cpucounterstest.cpp:414: instantiated from here cpucounterstest.cpp:81: warning: format ‘%4d’ expects type ‘int’, but argument 3 has type ‘long long unsigned int’ cpucounterstest.cpp:414: instantiated from here cpucounterstest.cpp:86: warning: format ‘%4d’ expects type ‘int’, but argument 3 has type ‘long long unsigned int’ cpucounterstest.cpp:414: instantiated from here cpucounterstest.cpp:90: warning: format ‘%4d’ expects type ‘int’, but argument 3 has type ‘long long unsigned int’ g++ -g -O3 -lrt msr.o cpucounters.o cpucounterstest.o -o pcm.x g++ -g -O3 -c cpusensor.cpp g++ -g -O3 -lpthread -lrt msr.o cpucounters.o cpusensor.o -o cpusensor.x g++ -g -O3 -c realtime.cpp g++ -g -O3 -lpthread -lrt msr.o cpucounters.o realtime.o -o realtime.x Some warnings in my linux virtual machine. Linux version: Linux ubuntu 2.6.32-33-generic #71-Ubuntu SMP Wed Jul 20 17:30:40 UTC 2011 i686 GNU/Linux CPU info: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz stepping : 2 cpu MHz : 2394.002 cache size : 3072 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss rdtscp lm constant_tsc up arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat bogomips : 4788.00 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: |
| August 13, 2011 2:34 AM PDT
Bright Chen |
Could you help to answer these questions? 1, Why no L1 cache related functions? How could I get these information? Such as: getL1CacheHitRatio getL1CacheHits getL1CacheMisses 2, Could you give more detail explain about: EXEC : instructions per nominal CPU cycle #from code : per time intervall IPC : instructions per CPU cycle #from code : per core cycle (IPC) Sorry, I don't know what's the difference? 3, How to identify the int and float operation instruction number? I haven't found a interface to do this. |
| August 15, 2011 3:08 AM PDT
Roman Dementiev (Intel)
|
Bright, thanks for informing us (did not test on g++ Ubuntu). I have fixed warnings in the next version of the package. BTW: the API/tool is not supposed to run in a virtual machine, because HW performance counters are not virtualized by hypervisor vendors. Best regards, Roman |
| August 16, 2011 4:58 AM PDT
Roman Dementiev (Intel)
|
Bright, 1 and 3: The L1 cache and floating point operation metrics are not monitored by default because of PMU is a limited resource. Please see my answer to Raman above. You can add counting these events using the custom programing interface in PCM like the Raman did for counting TLB misses. For event ids and metric calculations (formulas, etc.) you can consult and/or post questions on the "Software Tuning, Performance Optimization & Platform Monitoring" forum http://software.intel.com/en-us/forums/platform-monitoring/. It discusses these topics (events/metrics) which are not only applicable to Intel PCM but also to other PMU tools like Intel VTune Amplifier XE. For counting floating point operation events you can check this article: http://software.intel.com/en-us/articles/estimating-flops-us.....pling-ebs/ 2. EXEC metric is similar to the MIPS http://en.wikipedia.org/wiki/Million_instructions_per_second..... per_second . That means number of instructions per wall-clock time interval. It is a good metric to estimate instruction throughput under full processor load. However, in some situations the processor or core can sleep most of the time in a power saving state waking up very rarely to execute a few instructions. If you (mis-)use MIPS or EXEC to calculate the number of average core cycles per instruction then the number will be unrealistic (too high) if the core was in a power saving state most of the time. For that purpose the IPC metric is there: it is the number of instructions divided by the number of active core cycles (when the core was not in a power saving C-state/sleeping). Roman |
| August 24, 2011 12:57 AM PDT
Akihiro |
I tried to build PCMService.exe for x64 but I got following link error: 1>MSVCMRT.lib(locale0_implib.obj) : error LNK2022: metadata operation failed (8013118D) : Inconsistent layout information in duplicated types (std.basic_string<char,std::char_traits<char>,std::alloc ator<char> >): (0x0200003d). 1>MSVCMRT.lib(locale0_implib.obj) : error LNK2022: metadata operation failed (8013118D) : Inconsistent layout information in duplicated types (std.basic_string<wchar_t,std::char_traits<wchar_t>,std: :allocator<wchar_t> >): (0x02000063). 1>MSVCMRT.lib(locale0_implib.obj) : error LNK2022: metadata operation failed (8013118D) : Inconsistent layout information in duplicated types (std._String_val<char,std::allocator<char> >): (0x02000081). 1>MSVCMRT.lib(locale0_implib.obj) : error LNK2022: metadata operation failed (8013118D) : Inconsistent layout information in duplicated types (std._String_val<wchar_t,std::allocator<wchar_t> >): (0x02000083). I cannot build both VisualStudio 2008 and 2010. Best regards, Akihiro |
| August 24, 2011 3:21 AM PDT
Chai |
Hi, I just compiled msr.sys and both the GUI and commandline utilities on my WinXP SP3 box but I'm facing the following problems: 1) I cannot start up pcmservice because it says "error 526470 - service already started" even though it has not, even after a system restart 2) I cannot get "pcm.exe -ns -nc" to run as it says that I need to sign msr.sys and run pcm.exe as an administrator - whereas I am not on Win7, so as per your Windows Readme RTF, we are not needed to sign the msr.sys on WinXP - isn't it? Any hints, please? |
| August 24, 2011 9:17 AM PDT
Chai |
Hi, On my Core I7 Windows XP Laptop, I'm neither able to register the PCM Service (reports a false "already started" status) nor am I able to use PCM.exe (reports that I need to sign the msr.sys file.) Any clues as to why this is happening? Oddly enough, both cmdline and GUI utilities that I built on my PC work fine on my colleague's WinXP PC, which has a Core I5 processor. Thank in advance, Chai |
| August 25, 2011 2:49 PM PDT
Roman Dementiev (Intel)
|
Chai, the behavior you describe is not expected. For Windows XP the driver does not need to be signed. Could you please post the full output of the pcm.exe here? What is the bittness of your OS and of the colleague's Windows XP? 32 bit or 64 bit (x64 Edition). Best regards, Roman |
| August 26, 2011 5:52 AM PDT
Grega |
Hi, I am working on a profiling an application where I am interested in L1 and L2 cache misses. I am using Win 7 64bit and I manged to compile the pcm.exe. However, I was unsuccessful with the lib. I used WinDDK i64freebuildenvironment and received following build output Thank you very much for you help! best regards, Grega Output: D:projLibraryINTELProfilerWinMSRDriverWin7>build BUILD: Compile and Link for IA64 BUILD: Loading c:winddk7600.16385.1build.dat... BUILD: Computing Include file dependencies: BUILD: Start time: Fri Aug 26 13:45:34 2011 BUILD: Examining d:projlibraryintelprofilerwinmsrdriverwin7 directory for files to compile. BUILD: Saving c:winddk7600.16385.1build.dat... BUILD: Compiling and Linking d:projlibraryintelprofilerwinmsrdriverwin7 directory _NT_TARGET_VERSION SET TO WS03 Compiling - msrmain.c 1>errors in directory d:projlibraryintelprofilerwinmsrdriverwin7 1>d:projlibraryintelprofilerwinmsrdriverwin7msrmain.c(186) : error C4013: '__writemsr' undefined; assuming extern returning int 1>d:projlibraryintelprofilerwinmsrdriverwin7msrmain.c(201) : error C4013: '__readmsr' undefined; assuming extern returning int Linking Executable - objfre_wnet_ia64ia64msr.sys 1>link : error LNK1181: cannot open input file 'd:projlibraryintelprofilerwinmsrdriverwin7objfre_wnet_ia64ia64ms rmain.obj' BUILD: Finish time: Fri Aug 26 13:45:35 2011 BUILD: Done 3 files compiled - 2 Errors 1 executable built - 1 Error |
| August 27, 2011 1:59 PM PDT
Roman Dementiev (Intel)
|
Grega, the "IA64" build environment is for Itanium processors. Intel PCM does not support Itanium. I assume you have a 64-bit x86 processor with a microarchitecture listed in the article ("Nehalem", "Westmere", "Sandy-Bridge"). In that case you should select Windows 7 x64 build environment. Roman |
| September 27, 2011 12:23 AM PDT
Bright Chen |
Hi, Roman Do you have a plan to release next version of IntelPCM? Could you tell us the date? And, what will be enhanced in next version? Thanks. Br, Bright |
| September 27, 2011 12:39 AM PDT
Chai |
Hi Roman, In response to your query dated 25 Aug, I run 32 bit Windows XP SP3 on my laptop, and so does my colleague on his laptop. Everytime I run pcm.exe on my machine, regardless of the commandline switch I get the folllowng message: Copyright (c) 2009-2011 Intel Corporation Starting MSR service failed with error 3 Can not access CPU counters You must have signed msr.sys driver in your current directory and have administrator rights to run this program And I have admin rights on my machine. |
| September 29, 2011 3:29 PM PDT
Roman Dementiev (Intel)
|
Chai, can you try the following: 1. make sure pcm.exe and msr.sys are in the same directory, like c:pcm 2. chdir c:pcm 3. pcm.exe --uninstallDriver (not that you must run it from that directory) 4. pcm.exe 1 (this installs the driver again and runs the tool) Let me know if that works. Best regards, Roman |
| September 29, 2011 3:31 PM PDT
Roman Dementiev (Intel)
| pcm.exe must be run from the same directory where the msr.sys driver is. |
| October 4, 2011 11:35 AM PDT
Chris |
Hi Roman, I was wondering if there will be support for some other processors soon. Unfortunately, I tried to use the tool on my two machines unsuccessfully (my machine configurations are shown below). 1. Machine1's /proc/cpuinfo shows 2 cores: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz and when I run pcm.x it says: Error: unsupported processor. Only Intel(R) processors are supported (Atom(R) and microarchitecture codename Nehalem, Westmere and Sandy Bridge). CPU model: 23 2. Machine2's /proc/cpuinfo shows 8 cores: Intel(R) Xeon(R) CPU X5355 @ 2.66GHz Error: unsupported processor. Only Intel(R) processors are supported (Atom(R) and microarchitecture codename Nehalem, Westmere and Sandy Bridge). CPU model: 15 I'm using v1.6 that I downloaded from this webpage, but couldn't find what the support page is for newer versions, if any. Thanks! Chris |
| October 17, 2011 1:20 PM PDT
Pourya |
Hi, 1. Is there anyway to get count of L1-L3 cache sizes and cache Line width in bytes? 2. I am using a Core i7 2nd Sandy Bridge which is not recognized by library. Here is the output: Unsupported processor, CPU Model: 45 I tried the solution provided to Saurabh still have the same issue. 3. It would be great if you add capabilities such as detection of AVX, SSE2, MMX and ... Thanks |
| October 18, 2011 2:50 AM PDT
Roman Dementiev (Intel)
|
Chris, you might try the (very experimental, limited support) PCM_TEST_FALLBACK_TO_ATOM option described above in the article and mentioned in the comments already. Not that older architectures do not support any uncore metrics and have a smaller number of on-core counters and available metrics. Roman |
| October 18, 2011 4:05 AM PDT
Roman Dementiev (Intel)
|
Pourya, detection of cache topology is out of the scope of this sample code. Please use instead the source code package available at http://software.intel.com/en-us/articles/intel-64-architectu..... umeration/ . It has also CPUID routines which you can use/customize to detect AVX, SSE2, etc. The CPUID instruction with the feature flags is described here: http://www.intel.com/content/www/us/en/processors/processor-.....-note.html To run the tool you might try to replace: "SANDY_BRIDGE = 42" => "SANDY_BRIDGE = 45" I could not find your cpu model 45 dec (= Ext model: 01 Model: 1101 in binary) or 0x2D in Table 5-3 of the latter document or in this summary: http://software.intel.com/en-us/articles/intel-processor-ide..... y-numbers/ . Could you share the exact product name of your processor? like Intel® Core™ i7-2600 Roman |
| October 18, 2011 7:32 PM PDT
Stephen |
Hi Roman - I installed the windows service on Windows Server 2008 R2 x64 (2x Intel Xeon x5650) and also Windows 7 x64 (Intel Core i7 2600K) and neither one worked. The service installed and I can open the counters, but the counters are all 0. The event viewer gives no errors or warnings on either machine. I also had the msr.sys file placed on c:windowssystem32 for both machines and it was digitally signed. I also tried switching the "SANDY_BRIDGE = 42" => "SANDY_BRIDGE = 45" and that doesn't work either. I've tried restarting the machines and also restarting the services and I get nothing at all on the counters. The only output I get on the event viewer is the following when the service starts: --Trying to start the driver... --Trying to create the measure thread... --PCM: Number of cores detected: 8 --PMU Programmed. --Old categories deleted. --New categories added. --All instances of the performance counter categories have been created. --Service started successfully. Any ideas? Thanks Stephen |
| October 24, 2011 3:18 AM PDT
Thomas Willhalm (Intel)
|
Stephen, have you given the command line tool a try? If see something there, you know that the PMU access is working. Do you see any output at all or just for some counters? The instructions retired should work on all platforms. Kind regards Thomas |
| October 25, 2011 6:02 PM PDT
Stephen |
Hi Thomas - I ran the command line tool and it gives the following error: "Access to Intel(r) Performance Counter Monitor has denied (Unknown error)." I signed the driver using test signing instructions from microsoft. I'm also running Win7 x64 in test mode in order to run test signed drivers. I'm also running the command line as an administrator so permissions should not be an issue. Any ideas what this unknown error refers to? Here is the full output of the command line: ----------------------------------------------------------------- ---------------------------------- Intel(r) Performance Counter Monitor Copyright (c) 2009-2011 Intel Corporation Num cores: 8 Num sockets: 1 Threads per core: 2 Core PMU (perfmon) version: 3 Number of core PMU generic (programmable) counters: 4 Width of generic (programmable) counters: 48 bits Number of core PMU fixed counters: 3 Width of fixed counters: 48 bits Nominal core frequency: 3400000000 Hz Access to Intel(r) Performance Counter Monitor has denied (Unknown error). ----------------------------------------------------------------- --------------------------------- Thanks Stephen |
| October 27, 2011 3:02 AM PDT
Roman Dementiev (Intel)
|
Hi Stephen, it seems to be a permission error with pcm.exe command line tool and the Windows service running in parallel (we need to put some more debug output for this case in a future version), it could be that they run under different users? Could you please try to stop the PCM Windows Service ('net stop pcmservice', '"PCM Service.exe" -Uninstall' ), (then probably reboot), and then try to run the pcm.exe again? Best regards, Roman |
| November 14, 2011 12:30 AM PST
GHui | It is support SandyBridge? |
| November 16, 2011 3:07 AM PST
Roman Dementiev (Intel)
|
GHui, yes, version 1.6 of Intel PCM supports on-core performance metrics of 2nd generation Intel® CoreTM processor family (Intel® microarchitecture code name Sandy Bridge) Roman |
| November 17, 2011 11:47 PM PST
GHui |
Roman, It output the following message. ------------------------- Intel(r) Performance Counter Monitor Copyright (c) 2009-2011 Intel Corporation Error: unsupported processor. Only Intel(R) processors are supported (Atom(R) and microarchitecture codename Nehalem, Westmere and Sandy Bridge). CPU model: 45 Access to Intel(r) Performance Counter Monitor has denied (no MSR access). ------------------------- Is that mean it doesn't support SandyBridget? |
| November 27, 2011 10:00 AM PST
aromr |
I tried to run "make" command from the main directory on Ubuntu 11.10, processor Intel Centrino Duo I got the following log g++ -g -O3 -c msr.cpp g++ -g -O3 -c cpucounters.cpp g++ -g -O3 -c cpucounterstest.cpp cpucounterstest.cpp: In function ‘std::string unit_format(IntType) [with IntType = long long unsigned int, std::string = std::basic_string<char>]’: cpucounterstest.cpp:414:85: instantiated from here cpucounterstest.cpp:71:9: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long long unsigned int’ [-Wformat] cpucounterstest.cpp:414:85: instantiated from here cpucounterstest.cpp:76:9: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long long unsigned int’ [-Wformat] cpucounterstest.cpp:81:9: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long long unsigned int’ [-Wformat] cpucounterstest.cpp:86:9: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long long unsigned int’ [-Wformat] cpucounterstest.cpp:90:5: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long long unsigned int’ [-Wformat] cpucounterstest.cpp: In function ‘void MySystem(char*)’: cpucounterstest.cpp:146:19: warning: ignoring return value of ‘int system(const char*)’, declared with attribute warn_unused_result [-Wunused-result] g++ -g -O3 -lrt msr.o cpucounters.o cpucounterstest.o -o pcm.x cpucounters.o: In function `SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:86: undefined reference to `sem_open' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:100: undefined reference to `sem_wait' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:86: undefined reference to `sem_open' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:100: undefined reference to `sem_wait' cpucounters.o: In function `PCM::decrementInstanceSemaphore()': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:1194: undefined reference to `sem_wait' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:1196: undefined reference to `sem_getvalue' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `PCM::decrementInstanceSemaphore()': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:1194: undefined reference to `sem_wait' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:1196: undefined reference to `sem_getvalue' cpucounters.o: In function `SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:86: undefined reference to `sem_open' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:100: undefined reference to `sem_wait' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:637: undefined reference to `pthread_getaffinity_np' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:642: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `~TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:646: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:86: undefined reference to `sem_open' cpucounters.o: In function `PCM::getInstance()': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:100: undefined reference to `sem_wait' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:637: undefined reference to `pthread_getaffinity_np' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:642: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `~TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:646: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:86: undefined reference to `sem_open' cpucounters.o: In function `PCM::getInstance()': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:100: undefined reference to `sem_wait' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:86: undefined reference to `sem_open' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:100: undefined reference to `sem_wait' cpucounters.o: In function `PCM::program(PCM::ProgramMode, void*)': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:676: undefined reference to `sem_open' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:683: undefined reference to `sem_post' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:685: undefined reference to `sem_getvalue' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:637: undefined reference to `pthread_getaffinity_np' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:642: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `~TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:646: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `PCM::decrementInstanceSemaphore()': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:1194: undefined reference to `sem_wait' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:1196: undefined reference to `sem_getvalue' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:86: undefined reference to `sem_open' cpucounters.o: In function `PCM::getInstance()': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:100: undefined reference to `sem_wait' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:86: undefined reference to `sem_open' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:100: undefined reference to `sem_wait' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:86: undefined reference to `sem_open' cpucounters.o: In function `TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:637: undefined reference to `pthread_getaffinity_np' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:642: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `~TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:646: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `PCM::getInstance()': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:100: undefined reference to `sem_wait' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:637: undefined reference to `pthread_getaffinity_np' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:642: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `~TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:646: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:637: undefined reference to `pthread_getaffinity_np' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:642: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `~TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:646: undefined reference to `pthread_setaffinity_np' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:86: undefined reference to `sem_open' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:86: undefined reference to `sem_open' cpucounters.o: In function `PCM::getInstance()': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:100: undefined reference to `sem_wait' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `PCM::getInstance()': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:100: undefined reference to `sem_wait' cpucounters.o: In function `~SystemWideLock': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:104: undefined reference to `sem_post' cpucounters.o: In function `~TemporalThreadAffinity': /media/user1/PCM/IntelPerformanceCounterMonitorV1.6/cpucounters.c pp:646: undefined reference to `pthread_setaffinity_np' collect2: ld returned 1 exit status make: *** [pcm.x] Error 1 ===================================== Any ideas? |
| November 29, 2011 7:40 AM PST
Roman Dementiev (Intel)
|
aromr, Can you try to add "-lpthread" to the Makefile: pcm.x: msr.o cpucounters.o cpucounterstest.o pci.o $(CC) $(OPT) -lpthread -lrt msr.o pci.o cpucounters.o cpucounterstest.o -o pcm.x "-lpthread" is not requred for Linux distribution which we had used for testing. Maybe it helps here. Roman |
| November 29, 2011 8:23 AM PST
Roman Dementiev (Intel)
|
GHui, the current Intel PCM version has not been tested with your processor model. Could you provide more detail about your system? How many sockets and cores (per socket) do you have? Is it desktop or server system? You might try the trick mentioned above: in cpucounters.h replace "SANDY_BRIDGE = 42" => "SANDY_BRIDGE = 45" Roman |
| November 30, 2011 9:22 AM PST
Roman Dementiev (Intel)
|
Dear users, we have just released a new version 1.7 (see change log above). From now on, for questions and comments about Intel PCM and its use-cases, we recommend the Software Tuning, Performance Optimization & Platform Monitoring forum: http://software.intel.com/en-us/forums/platform-monitoring/ If you post a question/comment on this forum regarding Intel PCM, please mention "Intel PCM" in the title of your forum topic to catch our attention. Thanks, Intel PCM team |
| December 2, 2011 11:25 AM PST
aromr |
Hi Roman, Thanks for your reply. I actually tried to do as you suggested adding -lpthread to Makefile, but did not work. The solution I just found now was to use -pthread instead of -lpthread. I am not sure how correct is that. It is just the make completes successfully. And I also replace all the -lpthread with -pthread just to get is working fine. the Make file will be updated as following: pcm.x: msr.o cpucounters.o cpucounterstest.o $(CC) $(OPT) -pthread -lrt msr.o cpucounters.o cpucounterstest.o -o pcm.x realtime.x: msr.o cpucounters.o realtime.o $(CC) $(OPT) -pthread -lrt msr.o cpucounters.o realtime.o -o realtime.x cpusensor.o: cpusensor.cpp cpucounters.h cpuasynchcounter.h msr.h types.h $(CC) $(OPT) -c cpusensor.cpp cpusensor.x: msr.o cpucounters.o cpusensor.o $(CC) $(OPT) -pthread -lrt msr.o cpucounters.o cpusensor.o -o cpusensor.x Can you please advice how correct is this solution? |
| December 4, 2011 8:10 AM PST
aromr |
Hi Roman, I am using JNI to invoke the PCM code from Java. My java code calls the PCM code to get CoreCounterState. Then the java code proceed with normal execution. Then I call the PCM to get CoreCounterState again to compare the CPU counters before and after. the problem is, every time I call the PCM it returns the same CoreCounterState. the comparison results is -1; |
| December 29, 2011 4:00 AM PST
Roman Dementiev (Intel)
|
Dear readers and users, for questions and comments about Intel PCM and its use-cases, we recommend the Software Tuning, Performance Optimization & Platform Monitoring forum: http://software.intel.com/en-us/forums/platform-monitoring/ If you post a question/comment on this forum regarding Intel PCM, please mention "Intel PCM" in the title of your forum topic to catch our attention. Thanks, Intel PCM team |
Trackbacks (7)
- Dissecting STREAM benchmark with Intel® Performance Counter Monitor – Intel Software Network Blogs
November 24, 2010 3:51 AM PST - Jeff’s Notebook: A monitor for measuring CPU utilization of your application – Intel Software Network Blogs
December 17, 2010 9:26 AM PST - Intel Performance Counter Monitor to measure CPU utilization
December 18, 2010 6:50 PM PST - Lightning Engine » Blog Archive » Useless Snippet #1: Transform Vec3f by Matrix4x4f
July 23, 2011 4:28 AM PDT - Codex-Team » Intel® Performance Counter Monitor – A better way to measure CPU utilization – Intel® Software Network
November 8, 2011 12:45 AM PST - A nice tool to monitor intel’s CPU counters « Vincenzo’s Thoughts Repository
December 13, 2011 7:19 AM PST - Performance counters | Commanderfitne
December 28, 2011 5:13 AM PST
Leave a comment 
To obtain technical support, please go to Software Support.
Author
Otto Bruggeman (Intel)
| ||
Thomas Willhalm (Intel)
| ||
Roman Dementiev (Intel)
| ||
Patrick Fay (Intel)
| ||
Patrick Ungerer (Intel)
|



Matt
(BTW -- two BAT files from IntelPerformanceCounterMonitorV1.0/WinMSRDriver/ have been excluded from the package, was this intentional?)