Intel® Developer Zone:
Platform Monitoring

Welcome to Intel Platform Monitoring Community!

Here you will find information covering performance monitoring and software tuning, and platform monitoring topics. Performance monitoring covers a variety of topics including an introduction to monitoring and software tuning methodologies, as well as software optimization techniques and best known methods (BKMs) for novice and more advanced users.

For developers, programming reference manuals are available with the latest information describing the hardware interface of the Performance Monitoring Unit (PMU) of Intel microprocessors including core and un-core monitoring resources, as well the definitive source of information on performance events which may be monitored.

Platform monitoring includes machine monitoring topics such as monitoring CPU core and graphics processors and other system coprocessors as well as metering and quality of service.

Memory and Cache Profiling Erratum on Intel® Xeon® processor E5 family
By Angela Schmid (Intel)Posted 05/06/20130
Audience: Anyone collecting event based performance data on a platform based on the Intel® Xeon® processor E5 family. There is a Performance monitoring unit erratum on the Intel®  Xeon® processor E5  family that affects the events used for memory and cache profiling. To collect data on the events...
Intel Architecture and Processor Identification With CPUID Model and Family Numbers
By Hussam Mousa (Intel)Posted 06/15/201211
Summary of recent Intel processor's cpuid values, model and family numbers linked to the architecture codename and processor codename as well as their brand names and model. Summary covers mainline IA x86 and x64 90nm, 65nm, 45nm, and 32nm processors.
Platform Monitoring Basics
By Posted 09/15/20100
Introduction This is an organic document, meaning, that it will expand as need and request dictate. The purpose is to help establish a baseline understanding of terms used in Platform Monitoring, concepts described, and utilizations or capabilities comprehended. Performance Monitoring Terminolog...
Performance Optimization and Platform Monitoring
By Posted 09/15/20102
Introduction This discussion covers some of the needs and implications that drive one to optimize and manage one’s platform. In this, we disclose opportunities to influence the ultimate performance of a computer system at the architectural, platform and software levels, and provide a rationale fo...
Subscribe to Intel Developer Zone Articles
Dissecting STREAM benchmark with Intel® Performance Counter Monitor
By Roman Dementiev (Intel) Posted on 11/23/10 8
Intel® Performance Counter Monitor (Intel® PCM) is an API and a set of tools that should help developers to understand how their applications utilize the underlying compute platform. In this blog I will explain how to instrument the well-known STREAM benchmark with library functions of Intel® PCM...
Subscribe to Intel Developer Zone Blogs
Intel hardware tss (context switch)
By Jog L.1
Hello, I see the Linux kernel didn't go the hardware route for process switching in ring 0 (kernel). Could one gain a lot by using it ? It is said to be slower than software context switch. Seems strange to me. Any performance pointer ? Thanks
MicroSequencer (MS) @ SNB
By Mikhail2
Hello, In 64-ia-32-architectures-optimization-manual, chapter B.3.7.2 Understanding the Sources of the Micro-op Queue it is said that UOPs come from DSB, MITE and MS, and a 'typical distribution' is given. It happens so that in the app I'm profiling quite a lot more UOPs are dispatched from MS than suggested as desirable by Intel in the manual while the execution is clearly front-end bound. The problem is, I don't understand why that happens. The manual reads: A large portion of micro-ops coming from the microcode sequencer may be benign, such as complex instructions, or string operations, but can also be due to code assists handling undesired situations like Intel SSE to Intel AVX code transitions. But I am pretty sure there aren't any SSE/AVX instructions employed at all, nor could 'denormals' or string operations occur often enough to produce any notable amount of stirring (the code mainly works with integer values). Is there a complete list of instructions that actually cause M...
Not PMCx reset working when collecting raw PEBS dump
By jaeyoung j.0
Hello all,   I'm novice on using PEBS facility and I am trying to use "long latency loads" facility and want to dump "raw PEBS records" for further analysis. For writing a simple example, I referenced SDM v3, especially on 18.8.4.1 through 18.8.4.3 (for Sandy Bridge).  When testing, counting long latency loads counter normally works, but PEBS recording does not correctly works.  In the test, I fount that PMCx reset value for adjusting sampling rate does not correctly working, specifically it does not overflow at all for too low counts. According to SDM v3, it needs to trigger overflow to "arm PEBS facility" and to set "PEBS Counter X Reset" for triggering PEBS counters repeatedly.  However, even with set of "PEBS Counter X Reset" in high value (0xFFFFFFFFFF00 in my case), PMCx does not appear to correctly set as preset value in Debug Store Area (0x40H ~ 0x58H), even after first overflow of PMCx. I was trying to find simple examples on this, but it is hard to find the example that in...
Intel PCM, error with very basic code.
By Jeremie Lagraviere1
Hi everyone, I have a very basic code using IntelPCM: int main(void) { int N = 99999999; PCM * m = PCM::getInstance(); if (m->program() != PCM::Success) { cout << "Intel PCM does not Work"; exit(0); } SystemCounterState before_sstate = getSystemCounterState(); eratosthenesBlockwise(N, 100, 1); SystemCounterState after_sstate = getSystemCounterState(); cout << "Instructions per clock:" << getIPC(before_sstate,after_sstate) << endl; cout << "L3 cache hit ratio:" << getL3CacheHitRatio(before_sstate,after_sstate) << endl; cout << "Bytes read:" << getBytesReadFromMC(before_sstate,after_sstate) << endl; //printf("%d", eratosthenesBlockwise(N, 100, true)); printf("done !"); exit(0); } I compile the code with this command line: g++ -I/home/xxxx/pgashpc/EnergyManagement/IntelPCM/ sie...
pause instruction doesn't seem to reduce cpu usage / elect consumption
By Jog L.4
Hello, I came across the pause assembly instruction which is effective with sse2. I own a core 2 duo from 2007 (Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz) and when used in a spin wait loop, i see no change in cpu usage / electric consumption. i used the same loop as here : https://software.intel.com/sites/products/documentation/doclib/iss/2013/... So are there some options to check to be sure it is working ? Even repeated x3000 (yes, a lot) it still shows no improvement. Did I miss something ? jog
request for a demo project of using AVX asm
By WEI Z. (Intel)3
Hi              I'm studying and trying to use AVX-256/512 instructions/intrinics, but I could not find a good demo/example for new starters.  If there is a simple example project with c code and AVX-related asm code to run, it may help a lot. Could you send me one such example project?   Thank you John
Using PEBS facility
By Jithin Parayil T.1
Hi, I've been going through the documentation for the PEBS facility as described in the Intel software-developer manual vol 3b section 18.7.1.1 It is mentioned that, in order to use PEBS, software needs to initialize the DS_BUFFER_MANAGEMENT_AREA data structure in memory (in non-paged pool) and then store the beginning linear address of this data structure in the IA32_DS_AREA register.  Is there a sample piece of code that illustrates how this data structure initialization and setting of IA32_DS_AREA register needs to be done? I'm a bit confused about using PEBS and haven't been able to find any useful examples of how PEBS is utilized either. It would be quite helpful if I could refer to a piece of sample code that configures and utilizes PEBS?  I'm working on an IvyBridge machine. Thanks in advance, Jithin
Run to Run variability on an Intel(R) Xeon(R) CPU E3-1240 v3
By Animesh J.0
Hi, I have been struggling to get reproducible results in a very simple Matrix multiplication code. I see variability of even more than 10% from run to run. I have run the perf stat command to monitor the runs.   1024x1024  Performance counter stats for 'taskset 0x1 MM_binaries/MM_tiled_1024':    198,505,412,302 cycles                    #    0.000 GHz                     [57.14%]    283,932,630,578 instructions              #    1.43  insns per cycle         [71.42%]    111,811,914,939 L1-dcache-loads                                              [71.43%]      1,091,387,355 L1-dcache-load-misses     #    0.98% of all L1-dcache hits   [71.43%]      1,088,112,128 r504f2e                                                      [71.44%]        543,287,591 r50412e                                                      [71.43%]                  0 LLC-prefetches                                               [57.14%]       71.139608212 seconds time elapsed  Performance counter stats for ...
Subscribe to Forums
Using Intel® GPA to Check Power Usage
06/03/20130

Brad Hill of Intel talks about using Intel GPA to check application power usage. Learn how to use the GPA tool to analyze power consumption of graphics and CPU intensive applications. Learn more


Intel® Graphics Performance Analyzers 2012 R5
02/20/20130

Paul Lindberg talks about the Intel® Graphics Performance Analyzers 2012 R5 releases, and gives a preview of what will be coming in 2013 for GPA.


Software Performance Monitoring
08/24/20110

Software Performance Monitoring


Videos


Software Performance Monitoring

Highlights from the Community Manager

On Jan 5, 2011, Intel launched the 2nd Generation Intel® Core™ processor family (formerly code-named Sandy Bridge) for laptops and PCs. The new processors have a revolutionary new architecture that combines the computing “brain,” or microprocessor, with a graphics engine on the same die for the very first time. New features include Intel® Insider™, Intel® Quick Sync Video, and a new version of the company's award-winning Intel® Wireless Display (WiDi), which now adds 1080p HD and content protection for those wishing to beam premium HD content from their laptop screen to their TV.

Stay connected. Visit often. We will be posting the PMU programming guides and updated tools to give you the latest information on the new Intel microarchitecture innovations