Zona para desarrolladores Intel®:
Supervisión de plataforma

Bienvenido a la comunidad ¡Intel Platform Monitoring Community!

Aquí encontrará información que cubre la supervisión del rendimiento y el ajuste del software, y temas de la supervisión de plataformas. La supervisión del rendimiento abarca una variedad de temas tales como una introducción a la supervisión y metodologías de ajuste de software, así como técnicas de optimización del software y los mejores métodos conocidos (BKMs) para usuarios novatos y más avanzados.

Para desarrolladores, los manuales de referencia de programación están disponibles con la información más reciente describiendo la interfaz del hardware de la Unidad de supervisión de desempeño (PMU) de los microprocesadores Intel incluyendo los recursos de supervisión de core y sin core, así como la fuente definitiva de información sobre eventos de desempeño que pueden ser supervisados.

La supervisión de plataforma incluye temas de supervisión de equipos tales como la supervisión de núcleo de la CPU y procesadores gráficos y otros coprocesadores del sistema así como la medición y la calidad de servicio.

No se encontró contenido
Suscribirse a Artículos de la Zona para desarrolladores Intel
No se encontró contenido
Suscribirse a Blogs de la Zona para desarrolladores Intel®
Calculation of DRAM Power using MSR
Por Pramodkumar P.4
Hello All, I am currently working on the performance counters. I counted different cache events using model specific registers (MSRs). I referred the following papers to use these counts for evaluating power consumption of DRAM. However, after getting event counts I am totally unaware of what to do next. I don't understand the relationship between counts and power. Let me know how to relate counts with DRAM power. Is it possible to use performance counters to estimate the DRAM power directly? Referenced papers are as follows "Complete System Power Estimation Using Processor Performance Events by Bircher, W.L. and John, L.K." "A Study on the Use of Performance Counters to Estimate Power in Microprocessors by Rodrigues, R.;Annamalai, A. ; Koren, I. ; Kundu, S."
How to find the Individual core L1 and L2 cache hit/miss on the multicore environment
Por Hemanth K.17
Scenario : 2 Process are executing on 2 different cores respectively of a processor. How can i measure Individual core L1 and L2 Cache hits and miss for each core assuming hyper threading are disabled. Performance Counter monitors are not providing me individual breakdown i believe. So is there any way i can measure the individual core L1 and L2 cache hits and misses.
Non-temporal stores and fences
Por c0d1f1ed4
Hi all, I'm trying to write a fast function for filling a large buffer with a 128-bit vector value. I'm using movntps and I was wondering if a fence instruction is necessary for correctness and/or performance. Things appear to work fine without it, but I wonder if that's just dumb luck or if the processor detects the lack of it and ensures correctness through some kind of costly interrupt and/or microcode? If a fence is highly recommended, should I use sfence or mfence? I couldn't find any documents with straight answers. Thanks! Nicolas
Can I force the compiler to pay attention to my intrinsics?
Por John D. McCalpin5
I have been working on optimization of a DGEMM kernel in the hopes of being able to understand power-limiting on Xeon E5 v3 processors. Using the DGEMM from Intel's MKL library, I can typically cause a Xeon E5 v3 to enter power-limited frequency throttling when using more than about 1/2 of the cores on the chip.  The details depend a bit on the base frequency, the number of cores, and the TDP, but it is usually around 1/2 of the cores. Using Intel's MKL DGEMM is, of course, the obvious thing to do, and I would happily use it, except that I want my own source version so that I can fiddle with register blocking and cache blocking to see how these impact power consumption. After spending about two weeks staring at this, I think I know how to register-block the code.   Starting with a naive DGEMM code in inner product order: for (i=0; i<SIZE; i++) { for (j=0; j<SIZE; j++) { for (k=0; k<SIZE; k++) { c[i][j] += a[i][k]*b[k][j]; } } } After much paper and pencil, i...
Some question about performace counter and how to read them
Por Hamid Reza K.3
Hello all, I am working on CMP scheduling and need to obtain some inforamtion about the behaviour of multi-threaded applications executed on Intel CPUs. One CPU considered in my study is Intel Core 2 Duo. I have some question and would be thankful if somebody answer me. 1- Could you tell me how many performance counter registers exist in Core 2 Duo? 2- I am going to know L1 cache misses of each thread of a multi-threaded application for a given period of its runtime. Take a four-threaded application as an example: t0, t1, ... t3. My goal is to obtain the number of L1 cache misses for threads 0, 1, ... 3 which happen in the first second of the application's runtime. Could you help me how this task should be done? By the way, I use Linux Perf Tool to read performance counters. 3- As you know, the number of register specified to reading perormance counters is limited. Consider N programs ( N is larger than the number of perfomance counter registers) which should be monitored, and suppo...
Question about Core Specificity Encoding option for reading cpu performace counters
Por Hamid Reza K.1
Hi list, I am going to obtain core cycle during which data bus is busy for a multi-threaded application executed on Core 2 Duo. I found that performace event "Dbus_Busy" meets my purpose. But, as you know, to use the event, you are supposed to sepecify core-specificity encoding. There are two options for Core Specificity Encoding: All cores and This core. I wonder if you could tell me what the meaning of this core option is for a multi-threaded application? Best regards, H. R. Khaleghzadeh
MLC to support CoD for Haswell-EP or -EX
Por drMikeT1
I was wondering if there is a version of MLC (Memory Latency Check) utility that understands the CoD (Cluster on Die) snooping mode for the Haswell platform.  On a 2 socket Haswell-PE system withh CoD enabled, numactl -H shows 4 memory domains and accounts correctly the cores associated with each, MLC however treats the same host as just a 2 memory domain system and the tests DO NOT DIFFERENTIATE between the two memory controllers (and domains) that are available on each socket.  Thanks Michael
an issue on performance optimization by Intel compiler
Por WEI Z. (Intel)14
Hi,          I am learning to use Intel C++ Compiler XE 15.0 integrated with VS 2013, I wrote a simple example as below to look into its performance . void dataCopy(float *codeWord0Ptr, float *codeWord1Ptr, int numDataCopy, float *outputPtr) {     float *outputPtr1 = &outputPtr[numDataCopy];     __assume_aligned(codeWord0Ptr, 64);     __assume_aligned(codeWord1Ptr, 64);     __assume_aligned(outputPtr, 64);     __assume_aligned(outputPtr1, 64);     #pragma ivdep     #pragma vector aligned     for (idxData = 0; idxData < numDataCopy; idxData++)     {         outputPtr[idxData] = codeWord0Ptr[idxData];         outputPtr1[idxData] = codeWord1Ptr[idxData];     } }        I enabled  release and x64 mode,  and enabled related optimization, AVX etc settings in project properties.        I also enabled optimization report in project properties, I see it reports loop was vectorized.        When I run it on my host PC(core is i5-3320M) and do some profiling on function dat...
Suscribirse a Foros
No se encontró contenido


Monitor de desempeño de software

Puntos destacados del Gerente de la comunidad

El 5 de enero de 2011, Intel lanzó la 2da generación de la familia de procesadores Intel® Core™ (antes conocida con el código Sandy Bridge) para laptops y equipos PC. Los nuevos procesadores tienen una nueva arquitectura revolucionaria que combina el “cerebro” computacional o microprocesador, con un motor de gráficos en el mismo lugar por primera vez. Las nuevas características incluyen Intel® Insider™, Intel® Quick Sync Video y una nueva versión del Intel® Wireless Display (WiDi) ganador de premios para la empresa, el cual agrega 1080p HD y protección de contenido a quienes desean proyectar contenido HD premium desde la pantalla de un laptop a su televisor.

Siempre conectado. Visitado a menudo. Estaremos publicando las guías de programación de PMU y las herramientas actualizadas para brindarle la información más reciente sobre las nuevas innovaciones de la microarquitectura Intel.