Power Measurement of Xeon Phi

Power Measurement of Xeon Phi

I wonder if there is any method to measure power of Xeon Phi coprocessor. if so, how to? I checked a guide document (Xeon Phi Coprocessor System Software Developer Guide), but it was hard for me to find the answer. It does not seem to have a register to count power like Xeon processor. If anybody know the answer or even can approach to the answer, It would be very appreciating. 

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I asked the same question a few months ago, and was told that there are no power PMU event counters.

The data sheet, http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/xeon-phi-datasheet.pdf, has some information on the power usage at various C-states and (package) PC-states. Just a note that as of June 2013, PC6 was not enabled. This may be different if you have a newer stepping.

Otherways of measuring power involve using a known steady state workload, and either something like a NetDAQ on a properly instrumented motherboard, or a from-the-wall type of measurement meter.


Instantanous power consumption on Xeon Phi is available through the "micsmc" executable on the host.

Although it is not terribly well documented, the same information is available on the Xeon Phi using the pseudo-file /sys/class/micras/power
(e.g., "cat /sys/class/micras/power" on the Xeon Phi prints out much of the same info as "micsmc -f" on the host.)
See the Xeon Phi SW Developer's Guide in the section on "Sysfs Nodes" (section 2.2.8 in document 488596 or document 328207).

These are instantaneous power measurements, rather than the accumulated energy estimates from the RAPL feature on the newer Xeon processors.  It is fairly easy to write a script to grab the power measurements periodically while your job is running so you can compute averages.
I typically pin this script to logical processor 0 to keep it away from my user threads, but I have not tried to measure whether reading this at a granularity of a few seconds causes enough overhead to be worth worrying about.

"Dr. Bandwidth"


Here is the measurement results with the method provided in this post with /sys/class/micras/power. The GFLOPS/Watt result does not match the result reported on intel documents for DGEMM call. 

Intel MIC 5110P : 59.00 Joules  in 459.843 milliseconds with 156.57 GFLOPS. So the GFLOPS/Watt=156.57/(59.00/(459.843/1000))=1.2202GFLOPS/Watt

Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz:  60.89 Joules in 798.345 milliseconds with 271.831 GFLOPS. So the GFLOPS/Watt = 271.831/(60.89/(798.345/1000))=3.564 GFLOPS/Watt. Intel(R) Xeon(R) CPU power is measured by PAPI RAPL PACKAGE_ENERGY:PACKAGE0 event.

http://www.intel.com/content/dam/www/public/us/en/documents/performance-... The performance per watt reported on above document on DGEMM call is for MIC is above 4 while for CPU is less than 1.5 GFLOPS/Watt. 

Your Xeon Phi GFLOPS value of 156.57 is less than 20% of the 837 GFLOPS that Intel reported for DGEMM on the Xeon Phi 5110P on the web page you reference.    This pretty clearly indicates that you don't have the DGEMM benchmark set up correctly.

Your Xeon Phi power number (59 Joules in 0.45983 seconds) is 128 Watts, is also much lower than the 225 Watt value for the Xeon Phi 5110P from the referenced presentation.  It is not surprising that the ratios are not optimal when you are only getting 1/5 of the expected performance.

Note that the values larger than 4 on slide 5 of that presentation are not "GFLOPS/Watt" -- they are "GFLOPS/Watt on the Xeon Phi relative to GFLOPS/Watt on the Xeon E5-2697 v2".   The slide is poorly labeled and so very easy to misinterpret.

Your Xeon E5-2650 v2 has a peak rating of 332.8 GFLOPS at 2.60 GHZ, so your result of 271.831 GFLOPS is 81.68% of peak for two sockets -- this is unusually low for a Xeon processor running MKL -- we easily exceed 90%.    Your power value of 60.89 Joules in 0.798345 seconds is only 76 Watts, might be a reasonable power consumption value for one 95 Watt processor (with Turbo boost disabled), but is not reasonable at all for two sockets with all 16 cores running DGEMM.


"Dr. Bandwidth"

Hi John,

How come "perf" is able to calculate during runtime? I get following values, which are reasonable to me and also validated by running intensive benchmarks. If I multiple following Joules with difference of time (1 sec is consistent below) for samples below, I do get acceptable power (W) values.

$ sudo ./perf stat -a -I 1000 -e power/energy-pkg/
#           time             counts   unit events
     1.000327769              45.90 Joules power/energy-pkg/
     2.001644906              58.55 Joules power/energy-pkg/
     3.002460994              72.99 Joules power/energy-pkg/
     4.003547462              63.73 Joules power/energy-pkg/
     5.004554410              55.48 Joules power/energy-pkg/
     6.005421470              53.54 Joules power/energy-pkg/
     7.006237491              73.03 Joules power/energy-pkg/
     8.007120101              66.31 Joules power/energy-pkg/
     9.008347035              54.42 Joules power/energy-pkg/
    10.009302326              52.87 Joules power/energy-pkg/
    10.286808690              20.19 Joules power/energy-pkg/

As per perf documentation it says: 

All the counters measure in the same unit (exposed via sysfs).
The perf_events API exposes all RAPL counters as 64-bit integers
counting in unit of 1/2^32 Joules (about 0.23 nJ). User level tools
must convert the counts by multiplying them by 2^-32 to obtain
Joules. The reason for this is that the kernel avoids
doing floating point math whenever possible because it is
expensive (user floating-point state must be saved). The method
used avoids kernel floating-point usage. There is no loss of
precision. Thanks to PeterZ for suggesting this approach.

To convert the raw count in Watt:
   W = C * 2.3 / (1e10 * time)
or ldexp(C, -32).

I am not sure what this "C" means here. Is it coming from following sysfs?



Chetan Arvind Patil

It is not clear what you are asking....

Perf can calculate during runtime because it has internal timers that enable it to wake up and read the counters at any desired interval.   The absence of "ordinary" floating-point support in the kernel does not mean that the kernel cannot do floating-point arithmetic -- it simply does it in software when it is required.  Most of the time it uses scaled integer arithmetic of various kinds, but doing a small number of FP operations along with an output routine is not a big overhead.

Section 14.9 of Volume 3 of the Intel Architectures SW Developer's Manual describes the RAPL system, including the units used by RAPL.  The RAPL energy unit can vary by processor -- on my Xeon Platinum 8160 processors the RAPL Energy Unit is 1/16384 Joules. 

"Dr. Bandwidth"

Leave a Comment

Please sign in to add a comment. Not a member? Join today