Zone des développeurs Intel® :
Surveillance de plate-forme

Bienvenue dans la communauté Intel de surveillance de plate-forme !

Vous trouverez ici des informations sur la surveillance des performances, le réglage des logiciels et la surveillance de plate-forme. La surveillance des performances recouvre différents sujets, y compris une introduction sur les méthodologies de surveillance et de réglage des logiciels, ainsi que sur les techniques d’optimisation des logiciels et les méthodes les plus connues (BKM, best known methods) pour les utilisateurs novices et expérimentés.

Pour les développeurs, des manuels de référence de programmation sont disponibles avec les dernières informations décrivant l’interface matérielle de la PMU (Performance Monitoring Unit) des microprocesseurs Intel, y compris des ressources de surveillance avec ou sans cœur, ainsi que la source d’informations qui fait autorité sur les événements de performance pouvant être surveillés.

La surveillance de plate-forme comprend des sujets de surveillance d’ordinateurs comme la surveillance du cœur de l’UC et des processeurs graphiques ainsi que d’autres coprocesseurs du système et les mesures et la qualité de service.

Intel® Microarchitecture Codename Nehalem Performance Monitoring Unit Programming Guide - Uncore
Par THOMAS J.Publié le 03/15/20110
Download Article Preface This document contains advance information. While every effort has been made to ensure the accuracy of the information contained herein, some errors may occur. Please contact thomas.m.johnson@intel.com if you have questions or comments.This document describes the program...
S’abonner à Articles de la Zone des développeurs Intel
Aucun contenu trouvé
S’abonner à Blogs de la Zone des développeurs Intel®
Cache misses for sequential vs. random access patterns
Par Danilcha D.5
Hello, Could you please help me? I'm measuring L3 cache misses and accesses when scanning a huge array vs. scanning a huge linked list, that was randomized in memory beforehand. Element size is 32 bytes. The performance difference is huge — 1–2 orders of magnitude. Which is understandable. However, cache miss measurements are not that different. For linked list it's 1.00 access and 1.00 miss per element. Which is understandable. For the array it is 0.45 accesses, 33% of which are misses. And that is strange. Why 0.45 and not 0.50? And why 33% miss rate? It is as if the prefetch only loads 3 lines upon L3 miss. But in that case the performance difference wouldn't be 100 faster. It is also clear that these counters does not explain the performance implications of L3 misses, because a miss during random memory access is apparently much more expensive than L3 miss when sequentially scanning the memory. Do you know if there is a way to count 'real' L3 misses, then a full random RAM acces...
theoretical peak integer performance
Par gilles c.3
Hi, In order to play with roof-line charts for 32b integer-based code, I struggle to find what are the theoretical peak integer performances for Ivy Bridge, Haswell and Knights Corner processors and co-processors. For floating point, that's "easy": vector length / type length * 2 (for FMA) * #cores * freq Now, for integers, that's another story: Ivy Bridge's 256b AVX doesn't support integer operations, but SSE 128b does support some... But which ones exactly? I saw an integer FMA for 16b integers, a 32b add with a 0.5 cycles throughput, and a 32b multiply with a 1 cycle throughput. Does that mean that I can in average expect a 1.5 multiply / add throughput (for a typical Matrix multiplication)? For Haswell, 256b AVX2 does support some integer operations. But again, I didn't find any FMA for 32b data, only the 0.5 cycle add and 1 cycle multiply. So basically, same question here... For Xeon Phi Knights Corner, apparently we do have a SSE 512b FMA for 32b integers. However, the throug...
[PCM] Kernel panic on OS X 10.9
Par Danilcha D.2
Hello, My computer restarts every time I try to launch a simple program: PCM *m = PCM::getInstance(); if (m->program(PCM::DEFAULT_EVENTS, NULL) != PCM::Success) { std::cerr << "Failed to start PCM" << std::endl; exit(1); } SystemCounterState before = getSystemCounterState(); SystemCounterState after = getSystemCounterState(); std::cout << "Instructions per Clock: " << getIPC(before, after) << "\nL3 cache hit ratio: " << getL3CacheHitRatio(before, after) << "\nL2 cache hit ratio: " << getL2CacheHitRatio(before, after) << "\nWasted cycles caused by L3 misses: " << getCyclesLostDueL3CacheMisses(before, after) << "\nBytes read from DRAM: " << getBytesReadFromMC(before, after) << std::endl; m->cleanup(); I get kernel panic. The same happened after running pcm.x for a minute. OS X 10.9.5. This is the report: Thu Jun 11 13:53:17 2015 panic(cpu 0 caller 0xffffff80010dcc1d): Kernel trap at 0x...
[PCM] building OS X driver fails
Par Thomas Willhalm (Intel)2
Danilcha D commented on the Intel PCM article:: Hello! Could you please help me? I was building the OS X driver and it failed: Check dependencies [BEROR]error: There is no SDK with the name or path '.../IntelPerformanceCounterMonitorV2.8/MacMSRDriver/macosx10.8' I'm on OS X Mavericks (10.9.5), XCode 6.2. I'm posting this to the forum, hoping that some of the Apple user can answer it. Kind regards Thomas
[PCM] pcm-power.x tool does not report the frequency bins.
Par Thomas Willhalm (Intel)0
hilgeman commented on the Intel PCM article: The pcm-power.x tool does not report the frequency bins. No matter what bins I use with the -p 0 option and -a/-b/-c frequency values, the percentages always remain zero: [root@PowerEdgeR630-BM4GS42 IntelPerformanceCounterMonitorV2.8]# ./pcm-power.x -p 0 -m -1 -- /bin/sleep 5 Intel(r) Performance Counter Monitor V2.8 (2014-12-18 12:52:39 +0100 ID=ba39a89) Power Monitoring Utility Copyright (c) 2011-2014 Intel Corporation Number of physical cores: 24 Number of logical cores: 24 Number of online logical cores: 24 Threads (logical cores) per physical core: 1 Num sockets: 2 Physical cores per socket: 12 Core PMU (perfmon) version: 3 Number of core PMU generic (programmable) counters: 8 Width of generic (programmable) counters: 48 bits Number of core PMU fixed counters: 3 Width of fixed counters: 48 bits Nominal core frequency: 2500000000 Hz Package thermal spec power: 120 Watt; Package minimum power: 61 Watt; Package maximum power: 240 W...
[PCM] Compilation fails when using VS 13
Par Thomas Willhalm (Intel)0
Compilation fails when using Visual Studio 13.  I usually build PCM with VS 2013. All I do is opening the project file, let VS convert it, and build the project. What error are you getting?
New Xeon, poorer performance
Par Jack G.1
In reading several scientific computing benchmarks of the E5-2697 v3 vs the E5-2697 v2, I got the impression the v3's should perform better, although they were 0.1 GHz slower.  I'm getting funny results on a heterogeneous cluster I'm running on.  Centos 2.6.32-504.el6.x86_64. Basically, the E5-2697 v2's are clearly outperforming the v3 counterparts (~15% faster.  I'm running a finite element code on them, compiled against intel compiler products 15.0.2 (ifort, icc, icpc etc...)).  The timing I get either in parallel within a node, or serial on each node shows results on the v3's that are much slower than what I expected.  I ran a calculation on each of the 4 different types of nodes I have on the cluster, all named "tebowXXX": Tebow135  Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz  Tebow123 Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz Tebow117 Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz Tebow101 Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz, overclocked to 4.2 GHz The res...
jobs in lsf fails with more than 127 nodes
Par Jose Gordillo2
Hi, I've an awkward issue. I'm using LSF 9.1 as job manager, and Intel Parallel Studio 2015_update1  When a I submit a simple program (hello word) using 2032 cores (117 nodes), it works well, but when I use more cores, all the processes are created on all nodes but they hang and the program doesn't finish (it even doesn't starts). I've tried launching the process outside LSF (mpirun -hostfile ... ) and it works fine with 2048 cores.   Anny suggestions?            
S’abonner à Forums
Aucun contenu trouvé

Vidéos


Surveillance des performances des logiciels

Points saillants présentés par le gestionnaire de la communauté

Le 5 janvier 2011, Intel a lancé la 2e génération de processeurs Intel® Core™, anciennement connue sous le nom de code Sandy Bridge, pour PC portables et PC de bureau. Les nouveaux processeurs possèdent une nouvelle architecture révolutionnaire qui associe pour la toute première fois sur la même matrice le « cerveau » de calcul, ou microprocesseur, au moteur graphique. Les nouvelles fonctionnalités comprennent Intel® Insider™, Intel® Quick Sync Video, et une nouvelle version de la technologie Intel® Wireless Display (WiDi) primée, qui ajoute maintenant la résolution 1080p HD et la protection du contenu pour ceux qui souhaitent transmettre du contenu HD de haute qualité de l'écran de leur portable à leur téléviseur.

Restez connecté. Venez nous voir souvent. Nous allons publier les guides de programmation PMU et des outils mis à jour afin de vous fournir les dernières informations sur les innovations de la nouvelle microarchitecture Intel.