Intel® Developer Zone:
Platform Monitoring

Welcome to Intel Platform Monitoring Community!

Here you will find information covering performance monitoring and software tuning, and platform monitoring topics. Performance monitoring covers a variety of topics including an introduction to monitoring and software tuning methodologies, as well as software optimization techniques and best known methods (BKMs) for novice and more advanced users.

For developers, programming reference manuals are available with the latest information describing the hardware interface of the Performance Monitoring Unit (PMU) of Intel microprocessors including core and un-core monitoring resources, as well the definitive source of information on performance events which may be monitored.

Platform monitoring includes machine monitoring topics such as monitoring CPU core and graphics processors and other system coprocessors as well as metering and quality of service.

Merrifield Uncore Performance Monitoring Events
By SAI SINDURI G. (Intel)Posted 05/20/20140
Using the Merrifield SoC Performance Monitoring Events This article focuses directly on the uncore performance monitoring events for the SoC Merrifield.  For the introduction to SoC uncore performance monitoring, please see this article: Silvermont SoC Uncore Performance Monitoring Guide Introd...
Rangeley Uncore Performance Monitoring Events
By Perry Taylor (Intel)Posted 05/09/20140
Using the Rangeley SoC Performance Monitoring Events This article focuses directly on the uncore performance monitoring events for the SoC Rangeley.  For the introduction to SoC uncore performance monitoring, please see this article: Silvermont SoC Uncore Performance Monitoring Guide Introducti...
Baytrail Uncore Performance Monitoring Events
By Perry Taylor (Intel)Posted 04/16/20140
Using the Baytrail SoC Performance Monitoring Events This article focuses directly on the uncore performance monitoring events for the SoC Baytrail.  For the introduction to SoC uncore performance monitoring, please see this artictle: Silvermont SoC Uncore Performance Monitoring Guide Introduct...
Silvermont SoC Uncore Performance Monitoring Guide
By Perry Taylor (Intel)Posted 04/10/20140
Welcome to the System on a Chip (SoC) uncore performance monitoring guide.  This article will introduce you to the SoC uncore performance monitoring event set and provide details on the events and how to interpret results. The Silvermont generation of SoCs features a new set of uncore performa...
Subscribe to Intel Developer Zone Articles
Documentation for uncore performance monitoring units
By Roman Dementiev (Intel) Posted on 07/11/14 0
Hello everyone, The uncore performance monitoring units (uncore PMUs) provide many useful information like memory controller traffic, traffic between sockets/processor packages, energy related metrics in the uncore (sleep states for Intel® Quick Path Interconnect links or DRAM sleep states for e...
Dissecting STREAM benchmark with Intel® Performance Counter Monitor
By Roman Dementiev (Intel) Posted on 11/23/10 8
Intel® Performance Counter Monitor (Intel® PCM) is an API and a set of tools that should help developers to understand how their applications utilize the underlying compute platform. In this blog I will explain how to instrument the well-known STREAM benchmark with library functions of Intel® PCM...
Subscribe to Intel Developer Zone Blogs
Performance seems not stable after using AES-NI for data encryption/decryption
By Xuehan X.0
Hi, everyone. I've got a need to encrypt data written from a virtual machine on XenServer. I added a pure software AES CBC encryption method to the Xen virtual disk read/write operation, and test the write throughput by runing the following command in the VM: dd if=/dev/zero of=/mnt/test_file bs=512 count=1048576and the tested throughput is about: 55 MB/s. I modified the encryption method to use the Intel AES-NI for encryption/decrytion, and run the former test several times, and the result is as follows: Test 1: 85.1 MB/s Test 2: 72.0 MB/s Test 3: 56.0 MB/s Test 4: 95.9 MB/s Test 5: 43.5 MB/s Test 6: 61.5 MB/s Test 7: 74.5 MB/s Test 8: 43.3 MB/s Test 9: 63.8 MB/s Test 10: 94.8 MB/s Test 11: 110 MB/s Although the average throughput is about 40% higher than that using the pure software method, the throughput seems to be very unstable. Why? Is there anyway to stablize it? Thank you:-)  
How hardware prefetcher change load and store buffer behavior in processor pipeline
By Zhu G.3
Hi, Community! I am experimenting with XEON E5620 dual socket server. I perf with event RESOURCE_STALLS.LOAD and RESOURCE_STALLS.STORE in SDM page 2699 of chapter 19.7. I first turned off hardware prefetch following instructions on url: https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-co... the instruction I used is : wrmsr -a 0x1a4 0xf then I used perf command as: perf stat -e ra202,ra208 ./fft-m26 the result is : 201 ra202 10,215,615 ra208 29.485852761 seconds time elapsed then I enabled hardware prefetch using : wrmsr -a 0x1a4 0x0 again I perf with: perf stat -e ra202,ra208 ./fft-m26 As I wished I get better performance, the result is : 2,206 ra202 18,970,999 ra208 24.963877684 seconds time elapsed But I observed that it seems the pipeline has stalled more on load buffer and store buffer. Why is this?
Counting native events
By Vincent B.2
Hi, I try to count some performance events of a part of an application written in C. So far, I have used PAPI to count events. It works fine for preset events. However, when I profile native events, all of them turn out to be translated into the same Event Code : 0x40000022 (an output of papi_avail is below). It makes no sense, but no error occurs when I profile them. What could be wrong ? How could I debug this ? Also, I've got a list (from the perfmon directory) of events that can be counted for the architecture I work with (Ivy bridge) and the adresses and values of the associated registers. Is there an alternative to PAPI I could use to count a particular event of that list, over a certain part of the program, knowing the name of the event and the registers informations ? Would it work to simply write and read to the corresponding registers manually (for example using pcm-msr) ? Thanks in advance for your help, Vincent   $ papi_avail -e LLC_MISSES Available events and hardware i...
How vtune compute bandwith?
By HUIZHAN Y.1
Hi, I am analyzing a simulated cannealling program from parsec. The program often access elem data randomly, so it have poor performance. I add a prefetching instruction for elem, and I am glad to see the time of parallel region with multiple threads has been reduced from 31 second to 15 second. Indeed it is a good result. I just prefetch the data in advance one iteration, and I wish get more performance improvement. But after adjusting the prefetching parameter, I cannot get much better result. So I doubt the prefetching has used up all bandwidth when prefetching the data in advance one iteration. So I check the bandwidth with vtune bandwidth analysis after the prefetching, and I found that the bandwidth only was increased a few from 3.004GB/s to 3.268 (for a single package). I feel the result is not right. Since adding prefetching do not add loaded data size, the time is reduced to a half from 31s to 15s, the bandwidth should be equal to  DATA_SIZE/TIME, so the bandwidth should be...
Package C-State PC6/PC7 on Linux
By Michael M.0
Hi, I have an i5 4590T processor, I'm running Ubuntu Linux. The package c-state never goes above pc3. I've searched all over for a solution, and tried all of the suggestions I've found, but none of them work. I've set all of the Tunables in PowerTOP to good. I've enabled ASPM. It's enabled on all of the PCI devices: lspci -vvvv | grep ASPM LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk- LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+ LnkCap: Port #5, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 unlimited LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+ LnkCap: Port #0, Speed 2....
Error reading llc_misses event in Xeon D-1540
By Roberto R.4
Hello everyone,  I am working in a tool that permits to access the different hardware events through performance counters (PMC). This tools works great I have tested in several Intel processors, SandyBridge, Haswell and Haswel-EP. Now I am working with a Broadwell processor that has some new cache monitoring features I need to work with.  Trying my tool in this processor I found that the events, described in 64-ia-32-architectures-software-developer-manual-325462.pdf Table 19.1, LLC Reference (2EH, Umask 4FH) and LLC Misses (2EH, Umask 41H) report the same number.  I though this could be an error from my tool so I tried perf and I got the same error. Also I can use only 4 programable PMCs, it is supposed to have 8 programmable PMCs, if I tried to use a 5th PMC it returns zero, same happend with perf. My processor is: Intel(R) Xeon(R) CPU D-1540 @ 2.00GHz Vendor    : GenuineIntel Family    : 6 Model    : 6 Stepping: 2 Type    : OEM The perf output is: $ perf stat -I 1000 -e ins...
isCoreOnline will always return false if /proc/cpuinfo doesn't print &quot;physical id&quot; and / or &quot;core id&quot;
By Valentin B.10
Hi Guys, just FYI: # ./pcm.x 1 -i=1  Intel(r) Performance Counter Monitor V2.8 (2014-12-18 12:52:39 +0100 ID=ba39a89)  Copyright (c) 2009-2014 Intel Corporation Number of physical cores: 1 Number of logical cores: 2 Number of online logical cores: 2 Threads (logical cores) per physical core: 2 Num sockets: 1 Physical cores per socket: 1 Core PMU (perfmon) version: 1 Number of core PMU generic (programmable) counters: 4 Width of generic (programmable) counters: 48 bits Nominal core frequency: 2800000000 Hz Delay: 1 Detected Intel(R) Xeon(R) CPU X5660 @ 2.80GHz "Intel(r) microarchitecture codename Westmere/Clarkdale" terminate called after throwing an instance of 'std::exception'   what():  std::exception DEBUG: caught signal to interrupt (Aborted). Cleaning up  Zeroed PMU registers  Freeing up all RMIDs It turns out that isCoreOnline returned false for every processor because /proc/cpuinfo on this machine looked like: processor       : 0 vendor_id  ...
Disable C6/C7 C-state for core
By futureishere5
Is there a way to prevent the core from going to C6/C7 C-state? I looked into my BIOS and it only provides support for disabling package C-states but not core c-states.
Subscribe to Forums
Using Intel® GPA to Check Power Usage
06/03/20130

Brad Hill of Intel talks about using Intel GPA to check application power usage. Learn how to use the GPA tool to analyze power consumption of graphics and CPU intensive applications. Learn more


Intel® Graphics Performance Analyzers 2012 R5 Overview
02/20/20130

Paul Lindberg talks about the Intel® Graphics Performance Analyzers 2012 R5 releases, and gives a preview of what will be coming in 2013 for GPA.


Software Performance Monitoring
08/24/20110

Software Performance Monitoring


Videos


Software Performance Monitoring

Highlights from the Community Manager

On Jan 5, 2011, Intel launched the 2nd Generation Intel® Core™ processor family (formerly code-named Sandy Bridge) for laptops and PCs. The new processors have a revolutionary new architecture that combines the computing “brain,” or microprocessor, with a graphics engine on the same die for the very first time. New features include Intel® Insider™, Intel® Quick Sync Video, and a new version of the company's award-winning Intel® Wireless Display (WiDi), which now adds 1080p HD and content protection for those wishing to beam premium HD content from their laptop screen to their TV.

Stay connected. Visit often. We will be posting the PMU programming guides and updated tools to give you the latest information on the new Intel microarchitecture innovations