Zona para desarrolladores Intel®:
Supervisión de plataforma

Bienvenido a la comunidad ¡Intel Platform Monitoring Community!

Aquí encontrará información que cubre la supervisión del rendimiento y el ajuste del software, y temas de la supervisión de plataformas. La supervisión del rendimiento abarca una variedad de temas tales como una introducción a la supervisión y metodologías de ajuste de software, así como técnicas de optimización del software y los mejores métodos conocidos (BKMs) para usuarios novatos y más avanzados.

Para desarrolladores, los manuales de referencia de programación están disponibles con la información más reciente describiendo la interfaz del hardware de la Unidad de supervisión de desempeño (PMU) de los microprocesadores Intel incluyendo los recursos de supervisión de core y sin core, así como la fuente definitiva de información sobre eventos de desempeño que pueden ser supervisados.

La supervisión de plataforma incluye temas de supervisión de equipos tales como la supervisión de núcleo de la CPU y procesadores gráficos y otros coprocesadores del sistema así como la medición y la calidad de servicio.

No se encontró contenido
Suscribirse a Artículos de la Zona para desarrolladores Intel
No se encontró contenido
Suscribirse a Blogs de la Zona para desarrolladores Intel®
Information about PCM PCIe counters
Por Anuj K.3
Hi everyone. I have been working on measuring the PCIe activity of network cards and I wanted to understand PCM counters better. I'm running the pcm-pcie.x executable on a Haswell server which displays the following counters (full event description here: http://pastebin.com/pnuj1eKu): PCIeRdCur (PCIe read current transfer (full cache line) RFO (Demand Data RFO) CRd (Demand Code Read) DRd (Demand Data Read) ItoM (PCIe write full cache line) PRd (MMIO Read) WiL (MMIO Write) I had these questions: What is the difference between PCIeRdCur and DRd? PCIeRdCur measures the number of partial and full cache line reads. Does it miss any PCIe reads that are captured by DRd, or does PCIeRdCur include DRd? I'm seeing non-zero values for both these counters. The description printed by pcm-pcie.x says that WiL measures traffic for "PCI devices writing to memory - application reads from disk/network/PCIe device", but it also describes it as "MMIO Writes (Full/Partial)". Aren't these two descript...
RAPL analysis/tests on my laptop
Por Carlos P.9
Hi, I am testing the RAPL feature on my laptop in a way to try to read some CPU consumption values and I need someone to help me getting some answers. Well, I am running a RAPL sample code simultaneously with a total battery power consumption code and I am getting the following values: ##RAPL: Package energy before: 5743.516907J PowerPlane0 (core) for core 0 energy before: 2655.314636J PowerPlane0 (core) for core 0 policy: 0 PowerPlane1 (on-core GPU if avail) before: 1199.609863J PowerPlane1 (on-core GPU if avail) 0 policy: 16 DRAM energy before: 33766.502869J Sleeping 1 second Package energy after: 5747.005737  (3.488831J consumed) PowerPlane0 (core) for core 0 energy after: 2655.366577  (0.051941J consumed) PowerPlane1 (on-core GPU if avail) after: 1199.689941  (0.080078J consumed) DRAM energy after: 33767.247131  (0.744263J consumed) ##Total battery power consumption: 8 watts average I suppose that Package energy is the CPU total energy without DRAM energy right? (PKG=unc...
Sample code for PCIe Burst Transfer white paper by Intel?
Por Sonny G.4
Hi,   I bumped into a white paper by intel: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers... Is there sample code for Linux on Xeon (E5-2600) processor that I can take a look, instead of the general idea outlined in the paper? For example, basically the steps are: 1. Mark memory Region as WC  -   Any sample code for this? 2. Burst transfer -  Sample code to use this __mm256_store_si256() functions Any help is appreciated. Thanks!  
Block Matrix Multiplication With Cilk?
Por Patrick P.1
I'm trying to tackle the same problem every HPC student gets: multiply matrices faster with as few memory accesses as possible. I've started with the dumb 6-deep nested for-loop block algorithm, but I feel like you can eliminate the 3 innermost loops (or should be able to) with Cilk notation and take advantage of SSE/AVX. This is what I came up with, but I get a compile error on icpc that the array bases are invalid. I've seen tons of cilk examples with variables for array bounds, so I'm REALLY confused as to why this is invalid. Our Matrix is implemented generally using std::vector<std::vector<Val>> where in our case Val is an int but can change. Matrix Matrix::operator*(const Matrix& src) const throw(std::exception) {     if (data.size() != src.data[0].size()) {         throw std::runtime_error("Incompatible Matrix Dimensions!");     }     unsigned int BS = blockSize, m1x = data[0].size(), m1y = data.size(),         m2x = src.data[0].size();     Matrix toRet ...
Uncore frequency on Haswell
Por futureishere2
Can someone explain how the processor controls the uncore frequency. I understand that on Haswell microarchitecture, the uncore is on a separate clock domain than the processor cores. And while the core frequency can be controlled by OS dynamically (when Speedstep is enabled), I am not sure how is the uncore frequency is controlled. Is it possible for OS to control the uncore frequency? I am guessing not. And if not, is it possible to at least get information about the current frequency at which uncore is operating?
uncore event counter reading
Por yang s.8
Hi everybody,      I want to read some values of uncore event counters on Intel xeon e5620. Recently I read about PCM and find it really hard to figure out the program logic. And there is no the uncore event I want, like UNC_GQ_ALLOC.READ_TRACKER. I see the pcm-tsx.cpp in PCM and replace the events in it with mine. I just want to see if this is gonna work. And I get some value and am not sure if this is right thing to do.      I also want to know if there is simpler tools to read these events.      Thanks.   Yang          
How to enforce pointer incrementation while forbidding compiler to play smart
Por Georgi M.2
For the first time I face one unpleasant behavior of Intel C optimizer - inserting 2 unnecessary LEAs (lines #22 and #40). Here is the ugly snippet: ; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140"; ; mark_description "726"; ; mark_description "-O3 -QxSSE2 -D_N_YMM -D_N_prefetch_4096 -FAcs"; .B8.3:: 00030 44 8b 22 mov r12d, DWORD PTR [rdx] 00033 44 89 e1 mov ecx, r12d 00036 83 f1 03 xor ecx, 3 00039 41 be ff ff ff ff mov r14d, -1 0003f c1 e1 03 shl ecx, 3 00042 41 bf 01 00 00 00 mov r15d, 1 00048 41 d3 ee shr r14d, cl 0004b 45 33 db xor r11d, r11d 0004e 45 23 e6 ...
[PCM] ERROR: QPI LL monitoring device (0:127:9:2) is missing
Por Jeongseob A.5
Hi all, I am currently using the Intel PCM tool v2.8 on my machine(Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz). It has two sockets. Basically, I would like to measure QPI traffic. Specifically, I want to get traffics due to L3 cache miss traffics. As I know, if L3 miss occurs, then the request will be sent to the other socket because it is faster than memory. When an application which generates a number of cache misses is running on a NUMA node, then I expect the other NUMA node will receive many QPI traffics. I am wondering how many cache traffics occur. However, unfortunately, there is an error for QPI like below when I run pcm.x.  ------------------------------------------------------------------------------------------------------------------------------------------------ ERROR: QPI LL monitoring device (0:127:9:2) is missing. The QPI statistics will be incomplete or missing. Socket 0: 2 memory controllers detected with total number of 5 channels. 1 QPI ports detected. ERROR: Q...
Suscribirse a Foros
No se encontró contenido

Videos


Monitor de desempeño de software

Puntos destacados del Gerente de la comunidad

El 5 de enero de 2011, Intel lanzó la 2da generación de la familia de procesadores Intel® Core™ (antes conocida con el código Sandy Bridge) para laptops y equipos PC. Los nuevos procesadores tienen una nueva arquitectura revolucionaria que combina el “cerebro” computacional o microprocesador, con un motor de gráficos en el mismo lugar por primera vez. Las nuevas características incluyen Intel® Insider™, Intel® Quick Sync Video y una nueva versión del Intel® Wireless Display (WiDi) ganador de premios para la empresa, el cual agrega 1080p HD y protección de contenido a quienes desean proyectar contenido HD premium desde la pantalla de un laptop a su televisor.

Siempre conectado. Visitado a menudo. Estaremos publicando las guías de programación de PMU y las herramientas actualizadas para brindarle la información más reciente sobre las nuevas innovaciones de la microarquitectura Intel.