Zona para desarrolladores Intel®:
Supervisión de plataforma

Bienvenido a la comunidad ¡Intel Platform Monitoring Community!

Aquí encontrará información que cubre la supervisión del rendimiento y el ajuste del software, y temas de la supervisión de plataformas. La supervisión del rendimiento abarca una variedad de temas tales como una introducción a la supervisión y metodologías de ajuste de software, así como técnicas de optimización del software y los mejores métodos conocidos (BKMs) para usuarios novatos y más avanzados.

Para desarrolladores, los manuales de referencia de programación están disponibles con la información más reciente describiendo la interfaz del hardware de la Unidad de supervisión de desempeño (PMU) de los microprocesadores Intel incluyendo los recursos de supervisión de core y sin core, así como la fuente definitiva de información sobre eventos de desempeño que pueden ser supervisados.

La supervisión de plataforma incluye temas de supervisión de equipos tales como la supervisión de núcleo de la CPU y procesadores gráficos y otros coprocesadores del sistema así como la medición y la calidad de servicio.

No se encontró contenido
Suscribirse a Artículos de la Zona para desarrolladores Intel
No se encontró contenido
Suscribirse a Blogs de la Zona para desarrolladores Intel®
variance in time when using VTune and OpenMP
Por high end c.1
Dear friends I've noticed that when profiling with VTune for an OpenMP loop with DYNAMIC scheduling, that I'm getting a large variance ( (up to 3x longer than shortest) in time reported. Code is being run with OMP_NUM_THREADS=4 in batch on a node comprising 2x 6-core Westmere L5640 chips. I'll do some work to explore causes but was wondering if others had any insight? Previous experience of running OpenMP codes on such nodes has indicated a 10% or so variance but this is significantly higher (and hides some optimisations when the reported time is in the higher region of the variance). Yours, Michael http://highendcompute.co.uk
Throwing all optimizations at 4-level nested FORs
Por Georgi M.3
Hi. One question that I couldn't answer interests me: How to speed up the fragment below? Being an etude with 4 nested FORs what pragmas Intel provides? Also what compiler options are there to speed up FORs? Currently I use 12.1 but if there are some new or improved ones I immediately would go for 15. My desire is to throw all present optimization at it in order to make it faster.
Multi-threaded L3 cache performance
Por Marcin K.7
Hello, I have found a lot of interesting reads about cache bandwidth performance modeling and benchmarking (e.g., https://software.intel.com/en-us/forums/topic/480004) and of course a lot to read about multi-threaded stream benchmark. So here I am trying to understand the multi-threaded, or multi-core performance of the L3 cache. (too many posts about performance analysis start this way ;) Let's say I want to check the speed of SSE2 vector transfers to the registers from various cache levels: __m128d mread_sse2(double *addr, long size) { long i=0; __m128d v1, v2, v3, v4; [ zero v1...] while(i<size){ v1 = _mm_add_pd(v1, _mm_load_pd(addr+i+0)); v2 = _mm_add_pd(v2, _mm_load_pd(addr+i+2)); v3 = _mm_add_pd(v3, _mm_load_pd(addr+i+4)); v4 = _mm_add_pd(v4, _mm_load_pd(addr+i+6)); i+=8; } [sum v1... - avoid compiler opt ] return v1; }On all recent architectures, this code runs in 0.5 cycle per double when data is in L1 cache (size=1024). A step to L3 c...
Performance Monitoring Counters
Por Pramodkumar P.1
Hello,         I have currently using Intel(R) Xeon (R) X7350 @ 2.93GHz. As this architecture has only five PMCs, I can't read more than five performance events at a time.Even the worst is, when I installed PAPI 5.4.0 , it reads only two events simultaneously. So I have question in mind, "Is there any way that we can use other (general purpose or special purpose) registers as PMUs ? "         I want to capture six to seven events and their count while an application is running. I wish to measure the power consumed by this application using the values of these counters.         So, please help.         Thanks in advance.  
Uncore PMUs on Xeon E5000 processors
Por Anikaushik2
Hello, I have a Xeon E5645 system and I am curious if there exist PMUs to monitor memory controller events. There seems to be very good and extensive documentation  for E5-2600 family. From the datasheet files for the E5000 family, there does not seem to be any mention of uncore PMUs. Looking forward to your feedback.  Thanks,    
Is there any way to detect the Intel RST is running or mSATA existence?
Por Iverson T.3
Hi, My product had a case that customer can't detect the mSata in Windows7, that will caused our product can't be worked. As I know, Intel provides their own drivers(iastora.sys) which enable some features not found natively. These features are mostly found in the Rapid Storage Technology UI which allows for raid volumes to be managed and monitored from within the OS itself. As the title, my question is that is there any way(interface/SDK) to detect the mSata existence or intel RST running? If we can detect the mSATA existence, that will be great. Or to detect the iRST running also is a good workaround to solve our problem. Very appreciate your support.  Thanks.     
Why UOPS_RETIRED.ALL greater than UOPS_RETIRED.RETIRE_SLOTS
Por Alexander Alexeev0
Hello  Could you explain me a difference between those two events UOPS_RETIRED.ALL_PS and UOPS_RETIRED.RETIRE_SLOTS_PS on Sandy Bridge? I would expect that those events should give approximately the same numbers, since number of used slots should agree we with number of retired uops during period of time. Data below shows that number of used retirement slots is lesser by ~20%  than number ups retired. Is it possible that uops retired w/o using slot?  UOPS_RETIRED.ALL_PS - This event counts the number of micro-ops retired. UOPS_RETIRED.RETIRE_SLOTS_PS - This event counts the number of retirement slots used each cycle.  There are potentially 4 slots that can be used each cycle - meaning, 4 micro-ops or 4 instructions could retire each cycle.  This event is used in determining the 'Retiring' category of the Top-Down pipeline slots characterization. Hardware Events Hardware Event Type Hardware Event Count    Hardware Event Sample Count Events Per Sample CPU_CLK_UNHALTED.THREAD_P   49,45...
Unable to generate 'GPA' data with my Intel HD Graphics 4000
Por Max B.0
I'm trying to profile the execution of an OpenCL kernel on Intel HD Graphics 4000. I've installed the 30-day trial of both VTune and INDE.   If I right-click on my Project in VS2013 > Intel VTune Amplifier XE 2015 > New Analysis, I see this message in the window that opens: Some GPU metrics are currently disabled by the BIOS. See the product Release Notes for details. I've attached two images: one showing the tick-box for GPAs in my motherboard BIOS (it's not clear in the image, but the box seems to be greyed-out, and I'm unable to tick it), and another showing the results of a VTune analysis run of my program, showing zero L3 cache misses, which I presume means the GPAs are not working. Are Graphics Performance Analyzers not available?   My hardware: Motherboard Intel Desktop Board DQ77MK. (Latest firmware.) Processor: i7-3770 (featuring Intel HD Graphics 4000)   Software: Windows 7, 64-bit. VS 2013. I've installed the 30-day trial of both VTune and INDE.
Suscribirse a Foros
No se encontró contenido

Videos


Monitor de desempeño de software

Puntos destacados del Gerente de la comunidad

El 5 de enero de 2011, Intel lanzó la 2da generación de la familia de procesadores Intel® Core™ (antes conocida con el código Sandy Bridge) para laptops y equipos PC. Los nuevos procesadores tienen una nueva arquitectura revolucionaria que combina el “cerebro” computacional o microprocesador, con un motor de gráficos en el mismo lugar por primera vez. Las nuevas características incluyen Intel® Insider™, Intel® Quick Sync Video y una nueva versión del Intel® Wireless Display (WiDi) ganador de premios para la empresa, el cual agrega 1080p HD y protección de contenido a quienes desean proyectar contenido HD premium desde la pantalla de un laptop a su televisor.

Siempre conectado. Visitado a menudo. Estaremos publicando las guías de programación de PMU y las herramientas actualizadas para brindarle la información más reciente sobre las nuevas innovaciones de la microarquitectura Intel.