Intel PCM - How can I watch monitoring results of multi core of AWS CC1?

Intel PCM - How can I watch monitoring results of multi core of AWS CC1?

Hi,Now doing performance test of a certain application on Amazon HPC, CC1 instance, that hasIntel Xeon X5570, quad-core Nehalem by PCM.x.I could compile and execute PCM.x but the report results were not enough especially multi core.Reported lines of each core (from 0 to 15) have all values of0.00, -1.00, 0.00, N/A that don't make sense as below.pcm.x supports Nehalem architecture so I believe that it provides accurate result of mult core.Is it needed to configure before compiling or executing pcm.x?Or is this processor unsupported?Regards,Ryu===========================================================================================================Sample of the results===========================================================================================================Intel Performance Counter MonitorCopyright (c) 2009-2011 Intel CorporationNum cores: 16Num sockets: 2Threads per core: 2Core PMU (perfmon) version: 0Number of core PMU generic (programmable) counters: 0Width of generic (programmable) counters: 0 bitsNominal core frequency: 2933333326 HzNumber of PCM instances: 5EXEC : instructions per nominal CPU cycleIPC : instructions per CPU cycleFREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost)L3MISS: L3 cache missesL2MISS: L2 cache misses (including other core's L2 cache *hits*)L3HIT : L3 cache hit ratio (0.00-1.00)L2HIT : L2 cache hit ratio (0.00-1.00)L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latencyL2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)READ : bytes read from memory controller (in GBytes)WRITE : bytes written to memory controller (in GBytes)Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE 0 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 1 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 2 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 3 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 4 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 5 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 6 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 7 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 8 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 9 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 10 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 11 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 12 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 13 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 14 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A 15 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A------------------------------------------------------------------------------------------------------------SKT 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 0.00 0.00SKT 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 0.00 0.00------------------------------------------------------------------------------------------------------------TOTAL * 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 0.00 0.00Instructions retired: 0 ; Active cycles: 0 ; Time (TSC): 2830 Mticks ; C0 (active,non-halted) core residency: 0.00 %PHYSICAL CORE IPC : -1.00 => corresponds to -25.00 % utilization for cores in active stateInstructions per nominal CPU cycle: 0.00 => corresponds to 0.00 % core utilization over time intervalIntel QPI data traffic estimation in bytes (data traffic coming to CPU/socket through QPI links): QPI0 QPI1----------------------------------------------------------------------------------------------SKT 0 0 0SKT 1 0 0----------------------------------------------------------------------------------------------Total QPI incoming data traffic: 0 QPI data traffic/Memory controller traffic: nan

publicaciones de 14 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

What's OS?

--GHui
Best Reply

Ryu,

as far as I know Amazon instances are virtualized. Usually virtualization hypervisors forbid direct low-level access to hardware performance counters through model specific registers inside guests. You see invalid values in Intel PCM output because of that.

Thanks,
Roman

Ah, I have that porblem too. MSR in VM isn't supported. Is there any method to resolve this. e.g. Make some changes on VM setting.

--GHui

Hi Rman,Thank you for your very quick reply.OK, I understand your guess and explanation about Amazon EC2 and virtualization.We cannot use pcm on the purpose of measure performance on virtualized machine...Is there any other ideas as other said?

Linux, not Windows

You, too?I DO hope that there are any other ways to solve this situation...

Er, I haven't describe clearly. Some linux os doesn't load msr by default. You must modprobe it by hand, like sles.

--GHui

I found /dev/cpu/*/msr but couldn't modprobe them, maybe the reason is on the virtual machine

You found msr in /dev/cpu/*/, so, you needn't modprobe it.

--GHui

I thought that current situation was what you told you.
OK, it is no need to modprobe them, so already loaded from the begining.
It is not related to the solution, unfortunately...

Most virtual OS's don't allow access to most MSRs.
If I understand correctly, the hypervisor intercepts rdmsr/wrmsr instructions and only allows access to certain registers. The virtual OS may also spoof the cpuid info, the number of cpus, the PCI devices, etcand cpu-specific tools can fail.
The virtual OS may do these things for both security and virtualization reasons.
Tools like PCM and VTune have a hard time with virtual machines. The tools have code paths for a specific architecture, the virtual OS may want to abstract away the architecture.
This is an area we are trying to improve.
Pat

Thank you everone, I give up to meesure by pcm.x at last this time.However, if other solusions are found, tell us in order to share your information!

As Roman and Pat already explained, any performance tool that requires direct hardware access to theperformance monitoring units (PMU) is blocked by the virtualization layer. This won't be solved unless Amazon introduces a virtualization layer for the PMU, like they do for memory or I/O.

Depending what you are trying to achieve, you might want to try a tool for hotspot analysis that does not require acces to kernel space. The "hotspot" analysis in Intel Amplifier is an example for this (in contrast to the "light-weight hotspot" analysis). "quantify" should also work.

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya