Intel® Developer Zone:
Platform Monitoring

Welcome to Intel Platform Monitoring Community!

Here you will find information covering performance monitoring and software tuning, and platform monitoring topics. Performance monitoring covers a variety of topics including an introduction to monitoring and software tuning methodologies, as well as software optimization techniques and best known methods (BKMs) for novice and more advanced users.

For developers, programming reference manuals are available with the latest information describing the hardware interface of the Performance Monitoring Unit (PMU) of Intel microprocessors including core and un-core monitoring resources, as well the definitive source of information on performance events which may be monitored.

Platform monitoring includes machine monitoring topics such as monitoring CPU core and graphics processors and other system coprocessors as well as metering and quality of service.

Merrifield Uncore Performance Monitoring Events
By SAI SINDURI G. (Intel)Posted 05/20/20140
Using the Merrifield SoC Performance Monitoring Events This article focuses directly on the uncore performance monitoring events for the SoC Merrifield.  For the introduction to SoC uncore performance monitoring, please see this article: Silvermont SoC Uncore Performance Monitoring Guide Introd...
Rangeley Uncore Performance Monitoring Events
By Perry Taylor (Intel)Posted 05/09/20140
Using the Rangeley SoC Performance Monitoring Events This article focuses directly on the uncore performance monitoring events for the SoC Rangeley.  For the introduction to SoC uncore performance monitoring, please see this article: Silvermont SoC Uncore Performance Monitoring Guide Introducti...
Baytrail Uncore Performance Monitoring Events
By Perry Taylor (Intel)Posted 04/16/20140
Using the Baytrail SoC Performance Monitoring Events This article focuses directly on the uncore performance monitoring events for the SoC Baytrail.  For the introduction to SoC uncore performance monitoring, please see this artictle: Silvermont SoC Uncore Performance Monitoring Guide Introduct...
Silvermont SoC Uncore Performance Monitoring Guide
By Perry Taylor (Intel)Posted 04/10/20140
Welcome to the System on a Chip (SoC) uncore performance monitoring guide.  This article will introduce you to the SoC uncore performance monitoring event set and provide details on the events and how to interpret results. The Silvermont generation of SoCs features a new set of uncore performa...
Subscribe to Intel Developer Zone Articles
Documentation for uncore performance monitoring units
By Roman Dementiev (Intel) Posted on 07/11/14 0
Hello everyone, The uncore performance monitoring units (uncore PMUs) provide many useful information like memory controller traffic, traffic between sockets/processor packages, energy related metrics in the uncore (sleep states for Intel® Quick Path Interconnect links or DRAM sleep states for e...
Dissecting STREAM benchmark with Intel® Performance Counter Monitor
By Roman Dementiev (Intel) Posted on 11/23/10 8
Intel® Performance Counter Monitor (Intel® PCM) is an API and a set of tools that should help developers to understand how their applications utilize the underlying compute platform. In this blog I will explain how to instrument the well-known STREAM benchmark with library functions of Intel® PCM...
Subscribe to Intel Developer Zone Blogs
Xeon-D Uncore ECC Error Injection
By Ryan S.0
I am trying to test a driver for monitoring ECC errors, and would like to check functionality by injecting errors. In Intel ® Xeon ® Processor D-1500 Product Family External Design Specification (EDS), Volume Two: Core and Uncore Registers, I have found the registers below relating to error injection: 4.3.80 rsp_func_addr_match_lo 4.3.81 rsp_func_addr_match_hi 4.3.82 rsp_func_addr_mask_lo 4.3.83 rsp_func_addr_mask_hi 4.3.84 rsp_func_rank_bank_match 4.5.55 rsp_func_crc_err_inj_dev0_xor_mask 4.5.56 rsp_func_crc_err_inj_dev1_xor_mask 4.5.57 rsp_func_crc_err_inj_extra I believe I have set all of these registers as necessary in the driver to inject an error, but I am unable to enable address matching, by setting addr_match_en in the rsp_func_addr_match_hi register. I notice that this bit is locked. How does one go about setting up error injection? Am I going about this correctly, and if so how do I unlock this bit to enable address matching?
QPI Link Layer Packet Matching Reference "Gen By"
By Paul C.2
Hi,   Regarding the Intel Xeon Processor E5 and E7 v3 Family Uncore Performance Monitoring Reference Manual (June 2015): The last table in this document, "Table 2-265. Opcodes" lists various coherence transaction messages and their opcodes & msg classes. It also has a column labeled "Gen By" in which there are entries such as, for example, Co, Ci, Ho, Hi, Uo. Can someone clarify the meaning of the entries in this column?   Regards, Paul.      
By GHui1
I'm reading xeon-e5-v3-uncore-performance-monitoring.pdf. And I want to get RING_THRU_DN_BYTES and RING_THRU_UP_BYTES. I have no idea about get bus, dev, fucn and its event code. And which device I should open.    
Mistake in Intel Developer Manual Volume 3?
By T C2
Intel Developer Manual, Volume 3 contains this hardware event counter description: BACLEAR_FORCE_IQ Counts number of times a BACLEAR was forced by the Instruction Queue. The IQ is also responsible for providing conditional branch prediction direction based on a static scheme and dynamic data provided by the L2 Branch Prediction Unit. If the conditional branch target is not found in the Target Array and the IQ predicts that the branch is taken, then the IQ will force the Branch Address Calculator to issue a BACLEAR. Each BACLEAR asserted by the BAC generates approximately an 8 cycle bubble in the instruction fetch pipeline.   I have read several of the original Intel patents and they had detailed schematic diagrams showing the Branch Address Calculator (BAC) contains the actual static prediction logic. Could somebody please confirm/explain this? If its not the BAC, why would an instruction queue be doing static prediction?? (I know why the static prediction is done, I mean it seems o...
How branches in loop body affect the performance when unrolling?
By Peng Z.3
Hi, I want to know how the branches in loop body affect the performance when unrolling. So I do some tests. Code is in the attachment. Compiler: icc 15.0.3; options: -O3; platform: Ivy Bridge I5 3337U and Sandy Bridge E5-2670. I use the #pragma unroll(n) to unroll the innermost loop with different unroll facts, such as 2, 4 and 8. In both platforms, when unroll(8), the execute time increases nearly 100% than unroll(2)!  And I analysis the assembler code of the innermost loop. when unroll(8), the innermost loop body size is 288B, less than L1 I-cache (32KB), and has 58 non-branch instructions+17 condition branches. And when unroll(2), its body size is 71B, has 16 non-branch instructions+ 5 condition branches. So when unroll(8), it decreases more  loop overhead. And after unrolling, loop body is lesser than L1 I-cache, and the number of condition branches is also less than BTB(branch target buffer) capacities which has 4K entries. But why its execute time is more than unroll(2)? And I...
intel xeon hardware cache events not supported
By Jacob K.1
I am trying to use perf tool to measure performance on some program. For some reason perf stat doesn't support hardware cache events. I'm using intel xeon e5-2620 (haswell) processor. I read in some thread in this forum that the event codes might have been changed for this cpu and that is why perf doesn't support these events. I tried using perfmon2 to find the raw events but with no luck. Does anybody know how to find the correct raw events for hardware cache events for this cpu? I'm specifically interested in L1-dcache-loads and L1-dcache-stores but a generic solution will be better. I am using Linux version 3.0.101-0.47.52-default. thanks
RDPMC Fast Mode
By Georgi G.1
Hi all, I am currently writing a C++ class which measures performance using the RDPMC instruction. Everything works as expected, but I noticed in the manual that some of the processors support "fast" mode of the RDPMC instruction (reading only the lower 32 bits of the counter). When I try to do it on mine (i.e. switching the ECX[31]) the code produces seg fault. This mode is supported on processors with 40 bit counters and the counters on my machine are 48 bit. The model name of my processor is "Intel(R) Xeon(R) CPU W3580". I was wondering if there is some equivalent of this "fast" mode for different processors and if not if it's possible to reduce the number of cycles which this instruction takes (currently ~30 cycles). Thanks, Georgi 
DRAM Memory reads and writes
By Pramodkumar P.1
Hello All, I am using Intel core i7-2600 CPU with DDR3-1333 SDRAM. I want to calculate number of reads and writes from and to the DRAM memory respectively. I found the Model specific registers like UNC_DRAM_READ_CAS.CHx, UNC_DRAM_WRITE_CAS.CHx that are not supported on my system (64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf).<!--break--> I also wish to calculate the number of activates and idle cycles after executing application. I used DRAMPower to get these values. However, it gives results only for trace files given in source code. PAPI also does not give these values. So, let me know if anyone is aware of getting these values.<!--break--> Thanks in advance.
Subscribe to Forums
Using Intel® GPA to Check Power Usage

Brad Hill of Intel talks about using Intel GPA to check application power usage. Learn how to use the GPA tool to analyze power consumption of graphics and CPU intensive applications. Learn more

Intel® Graphics Performance Analyzers 2012 R5 Overview

Paul Lindberg talks about the Intel® Graphics Performance Analyzers 2012 R5 releases, and gives a preview of what will be coming in 2013 for GPA.

Software Performance Monitoring

Software Performance Monitoring


Software Performance Monitoring

Highlights from the Community Manager

On Jan 5, 2011, Intel launched the 2nd Generation Intel® Core™ processor family (formerly code-named Sandy Bridge) for laptops and PCs. The new processors have a revolutionary new architecture that combines the computing “brain,” or microprocessor, with a graphics engine on the same die for the very first time. New features include Intel® Insider™, Intel® Quick Sync Video, and a new version of the company's award-winning Intel® Wireless Display (WiDi), which now adds 1080p HD and content protection for those wishing to beam premium HD content from their laptop screen to their TV.

Stay connected. Visit often. We will be posting the PMU programming guides and updated tools to give you the latest information on the new Intel microarchitecture innovations