Intel® Developer Zone:
Plattformüberwachung

Willkommen bei der Intel Platform Monitoring Community!

Hier finden Sie Informationen zu Leistungsüberwachung und Software-Tuning sowie Themen zur Plattformüberwachung. Die Leistungsüberwachung deckt eine Reihe von Themen ab, darunter eine Einführung zu Überwachungs- und Software-Tuning-Methodiken sowie Strategien zur Softwareoptimierung und bewährte Methoden für Anfänger und fortgeschrittene Benutzer.

Für Entwickler stehen Programmierreferenzhandbücher zur Verfügung, die alle Neuheiten zur Hardwareschnittstelle der Performance Monitoring Unit (PMU) bei den Intel Mikroprozessoren beschreiben, darunter Kern- und Nicht-Kern-Überwachungsressourcen sowie die definitive Informationsquelle zu Leistungsereignissen, die überwacht werden können.

Zu der Plattformüberwachung gehören Themen zur Geräteüberwachung wie die Überwachung von CPU-Kern, Grafikprozessoren und anderen System-Coprozessoren sowie zur Messung des Quality of Service.

Kein Inhalt gefunden
Intel Developer Zone Beiträge abonnieren
Kein Inhalt gefunden
Intel Developer Zone Blogs abonnieren
Some question about performace counter and how to read them
Von Hamid Reza K.3
Hello all, I am working on CMP scheduling and need to obtain some inforamtion about the behaviour of multi-threaded applications executed on Intel CPUs. One CPU considered in my study is Intel Core 2 Duo. I have some question and would be thankful if somebody answer me. 1- Could you tell me how many performance counter registers exist in Core 2 Duo? 2- I am going to know L1 cache misses of each thread of a multi-threaded application for a given period of its runtime. Take a four-threaded application as an example: t0, t1, ... t3. My goal is to obtain the number of L1 cache misses for threads 0, 1, ... 3 which happen in the first second of the application's runtime. Could you help me how this task should be done? By the way, I use Linux Perf Tool to read performance counters. 3- As you know, the number of register specified to reading perormance counters is limited. Consider N programs ( N is larger than the number of perfomance counter registers) which should be monitored, and suppose…
Question about Core Specificity Encoding option for reading cpu performace counters
Von Hamid Reza K.1
Hi list, I am going to obtain core cycle during which data bus is busy for a multi-threaded application executed on Core 2 Duo. I found that performace event "Dbus_Busy" meets my purpose. But, as you know, to use the event, you are supposed to sepecify core-specificity encoding. There are two options for Core Specificity Encoding: All cores and This core. I wonder if you could tell me what the meaning of this core option is for a multi-threaded application? Best regards, H. R. Khaleghzadeh
MLC to support CoD for Haswell-EP or -EX
Von drMikeT1
I was wondering if there is a version of MLC (Memory Latency Check) utility that understands the CoD (Cluster on Die) snooping mode for the Haswell platform.  On a 2 socket Haswell-PE system withh CoD enabled, numactl -H shows 4 memory domains and accounts correctly the cores associated with each, MLC however treats the same host as just a 2 memory domain system and the tests DO NOT DIFFERENTIATE between the two memory controllers (and domains) that are available on each socket.  Thanks Michael
an issue on performance optimization by Intel compiler
Von WEI Z. (Intel)14
Hi,          I am learning to use Intel C++ Compiler XE 15.0 integrated with VS 2013, I wrote a simple example as below to look into its performance . void dataCopy(float *codeWord0Ptr, float *codeWord1Ptr, int numDataCopy, float *outputPtr) {     float *outputPtr1 = &outputPtr[numDataCopy];     __assume_aligned(codeWord0Ptr, 64);     __assume_aligned(codeWord1Ptr, 64);     __assume_aligned(outputPtr, 64);     __assume_aligned(outputPtr1, 64);     #pragma ivdep     #pragma vector aligned     for (idxData = 0; idxData < numDataCopy; idxData++)     {         outputPtr[idxData] = codeWord0Ptr[idxData];         outputPtr1[idxData] = codeWord1Ptr[idxData];     } }        I enabled  release and x64 mode,  and enabled related optimization, AVX etc settings in project properties.        I also enabled optimization report in project properties, I see it reports loop was vectorized.        When I run it on my host PC(core is i5-3320M) and do some profiling on function dataC…
Change in Turbo policy for Xeon E5 v3?
Von John D. McCalpin7
I ran across a change in the behavior of Xeon E5 v3 processors relative to Xeon E5 v1 processors and am confused about several aspects.... On Xeon E5 v1 (Sandy Bridge EP) and Xeon E5 v3 (Haswell EP) processors, the maximum non-turbo clock multiplier ratio is contained in bits 15:8 of MSR_PLATFORM_INFO (MSR 0xCE). This defines the rate at which the TSC increments, and matches the "nominal" frequency of the processor. The Linux "cpufreq" controls can be used to set specific target ratios in bits MSR 0x199 (IA32_PERF_CTL) For frequencies at or below the "nominal" multiplier in MSR_PLATFORM_INFO, this results in fixed-frequency operation at the specified frequency. For frequencies above the nominal multiplier in MSR_PLATFORM_INFO, the OS programs the highest allowable value into these fields -- i.e., the maximum single-core Turbo ratio from bits 7:0 of MSR_TURBO_RATIO_LIMIT (MSR 0x1AD) The hardware then provides the highest frequency that it is able to provide, subject to the number…
Information about PCM PCIe counters
Von Anuj K.3
Hi everyone. I have been working on measuring the PCIe activity of network cards and I wanted to understand PCM counters better. I'm running the pcm-pcie.x executable on a Haswell server which displays the following counters (full event description here: http://pastebin.com/pnuj1eKu): PCIeRdCur (PCIe read current transfer (full cache line) RFO (Demand Data RFO) CRd (Demand Code Read) DRd (Demand Data Read) ItoM (PCIe write full cache line) PRd (MMIO Read) WiL (MMIO Write) I had these questions: What is the difference between PCIeRdCur and DRd? PCIeRdCur measures the number of partial and full cache line reads. Does it miss any PCIe reads that are captured by DRd, or does PCIeRdCur include DRd? I'm seeing non-zero values for both these counters. The description printed by pcm-pcie.x says that WiL measures traffic for "PCI devices writing to memory - application reads from disk/network/PCIe device", but it also describes it as "MMIO Writes (Full/Partial)". Aren't these two descriptio…
RAPL analysis/tests on my laptop
Von Carlos P.9
Hi, I am testing the RAPL feature on my laptop in a way to try to read some CPU consumption values and I need someone to help me getting some answers. Well, I am running a RAPL sample code simultaneously with a total battery power consumption code and I am getting the following values: ##RAPL: Package energy before: 5743.516907J PowerPlane0 (core) for core 0 energy before: 2655.314636J PowerPlane0 (core) for core 0 policy: 0 PowerPlane1 (on-core GPU if avail) before: 1199.609863J PowerPlane1 (on-core GPU if avail) 0 policy: 16 DRAM energy before: 33766.502869J Sleeping 1 second Package energy after: 5747.005737  (3.488831J consumed) PowerPlane0 (core) for core 0 energy after: 2655.366577  (0.051941J consumed) PowerPlane1 (on-core GPU if avail) after: 1199.689941  (0.080078J consumed) DRAM energy after: 33767.247131  (0.744263J consumed) ##Total battery power consumption: 8 watts average I suppose that Package energy is the CPU total energy without DRAM energy right? (PKG=uncor…
Sample code for PCIe Burst Transfer white paper by Intel?
Von Sonny G.4
Hi,   I bumped into a white paper by intel: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers... Is there sample code for Linux on Xeon (E5-2600) processor that I can take a look, instead of the general idea outlined in the paper? For example, basically the steps are: 1. Mark memory Region as WC  -   Any sample code for this? 2. Burst transfer -  Sample code to use this __mm256_store_si256() functions Any help is appreciated. Thanks!  
Foren abonnieren
Kein Inhalt gefunden

Videos


Software-Leistungsüberwachung

Highlights vom Community Manager

Am 5. Januar 2011 hat Intel die Intel® Core™ Prozessoren der 2. Generation (zuvor unter dem Codenamen Sandy Bridge bekannt) für Notebooks und PCs vorgestellt. Die neuen Prozessoren besitzen eine revolutionäre neue Architektur, die zum allerersten Mal das "Rechengehirn" (den Mikroprozessor) mit einem Grafikmodul auf demselben Chip kombiniert. Neue Features umfassen Intel® Insider™, Intel® Quick-Sync-Video und eine neue Version des preisgekrönten Intel® Wireless-Display (WiDi), welches nun Full-HD und einen Schutzmechanismus unterstützt, für Benutzer die Premium-HD-Inhalte von ihren Notebook auf ihren Fernsehbildschirm übertragen möchten.

Bleiben Sie verbunden. Besuchen Sie uns oft. Wir werden die PMU-Programmierleitfäden und aktualisierte Tools veröffentlichen, damit Sie aktuell über die neuen Innovationen bei der Intel Mikroarchitektur informiert sind.