Intel® Power Governor

Notice: 

Active support for this tool is currently discontinued. The source code is provided below for the brave souls who would like to hack away at it. 

Authors: 

Martin Dimitrov, Carl Strickland, Seung-Woo Kim, Karthik Kumar, Kshitij Doshi

Introduction : 

Intel® Power Governor (power_gov) is a software utility and library, which allows developers to (a) monitor power and (b) regulate power at very fine time granularities (few tens of milliseconds). Power monitoring/control is available for the package, core, graphics, uncore and DRAM domains, as illustrated in Figure 1 below. The tool is self-contained, easy to use, and available on Intel® Xeon® E5 series processors based on Intel® microarchitecture code-named Sandy Bridge EP/EN/E, 2nd Generation Intel® Core™ and newer processors. As a library, power_gov allows developers to incorporate power monitoring/control into their own custom and dynamic solutions tailored to the needs of their application. 
 

power_domains2.jpg

Figure 1. Power domains for which power monitoring/control is available. To get uncore (last level caches and memory controller) power, subtract the core and graphics from package. Note: graphics power monitoring/control is only available on client parts, while the DRAM power monitoring/control is only available on server parts.
 
Usages :

Software power meter: power_gov can be used to report power consumption on the different power planes. The example in Figure 2 shows power_gov executing on a two socket Intel® Xeon® Processor E5-based machine, and reporting the average power consumption of the package, core, uncore and DRAM for each socket at 1 second intervals.

power_meter_output2.jpg

Figure 2. power_gov reporting power consumption at 1-second intervals for package, core, uncore, DRAM domains on a two-socket system.

 
Optimize for power/performance target: power_gov can be used to enforce power limits on the different power domains. In the example from Figure 3, we executed an OLTP workload, while at the same time we varied the power limit enforced on the package power domain (from 130W down to 30W on the x-axis). At the same time, on the y-axis we observed how the performance of the workload (in terms of average response time of transactions) varied with the enforced power limit. Assuming that our response time target was 0.015ms, we could limit the power consumption of the CPU socket to 50W while still satisfying the performance requirement. 
 
perf target
Figure 3. Using power_gov to optimize for a power/performance target.
 
Dynamic power/performance optimization: Using the power_gov library allows developers to create dynamic power monitoring/control solutions, which respond to changes in workload demand, time of day, etc. The example in Figure 4 depicts a scenario in which the customer software monitors energy consumption, in addition to its own quality of service (QoS) metrics and responds to changes in workload demand by dynamically adjusting the power limits, see ref[1]. The control algorithms used to dynamically adjust the power limits depend on the usage scenario and can be arbitrarily complex.
 
dynamic optimization
Figure 4. Using the power_gov library in order to design dynamic power/performance optimization solutions.
 

References:

[1] V. Anagnostopoulou, M. Dimitrov, K. Doshi, “SLA-Guided Energy Savings for Enterprise Servers”, ISPASS 2012 poster, full version to appear in ITJ 2012
[2] Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide, Part 2
AttachmentSize
Downloadapplication/x-gtar power_gov.rev72.tgz40.17 KB
For more complete information about compiler optimizations, see our Optimization Notice.

Comments



Hi

Hi
I am using it for monitoring, and I require short sampling delays i.e. less than 10ms, but is that possible?
Everytime I tried it, the program tells me : "Delay must be greater than 50 ms." is this a hw restriction?

Thank you,
Leonardo



Hello Team,

Hello Team,

I use ./power_gov to set CPU Power limit and DRAM power limit.

then read CPU power through this code, but I also want to read "memory power consumption while the application is running". 

Can you please direct me to a document which will be helpful.

#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <sys/time.h>
#define core_msr   0x639
#define cpu_msr    0x611
#define IA32_PERF_CTL 0x199
#define IA32_PERF_STATUS 0x198
#define MSR_IA32_MPERF          0x000000e7
#define MSR_IA32_APERF          0x000000e8
#define IA32_TIME_STAMP_COUNTER 0x00000010
#define IA32_MISC_ENABLE 0x1A0
#define mem_msr         0x619
#define uncore_msr 0x642
#define energy_unit_msr 0x606
#define ENERGY_UNIT_OFFSET      0x08
#define ENERGY_UNIT_MASK        0x1F00

static int numProcs = 6;

void mytime_(double *val)
{
                                struct timeval end;
                                long xx; 
                                *val =0.0;
                                gettimeofday(&end, NULL);
                                xx = end.tv_sec * 1000000 + end.tv_usec;
                                *val = ((double)xx)/1000000.0;
}

uint64_t rdmsr(uint32_t msr_id, int cpu)
{
                                char path[100];
                                sprintf(path, "/dev/cpu/%d/msr", cpu);

                                uint64_t msr_value;
                                int retval = 0;

                                int fd = open(path, O_RDONLY);
                                if (fd >= 0)
                                                                retval = pread(fd, &msr_value, sizeof(msr_value), msr_id);
 else printf("++ 00 Can not read MSR id=%d\n",msr_id);
                                close(fd);

                                return retval == sizeof(msr_value) ? msr_value : 0;
}

void getpowerunit_(double *val)
{
                                uint32_t value;
                                uint64_t msr_output = rdmsr(energy_unit_msr, 0);
                                value = (msr_output & ENERGY_UNIT_MASK) >> ENERGY_UNIT_OFFSET;
                                unsigned int energy_unit_divisor = 1 << value;
                                *val = (double) energy_unit_divisor;
                                //      printf("============ energy_unit_divisor=%d\n",energy_unit_divisor);
}

void mperf_(double *msr_value1)
{

                                char path[100];
                                sprintf(path, "/dev/cpu/%d/msr", 0);

                                uint64_t msr_value=-1;
                                int retval = 0;

                                int fd = open(path, O_RDONLY);
                                if (fd >= 0)
                                                                retval = pread(fd, &msr_value, sizeof(msr_value), MSR_IA32_MPERF);
                                else printf("++ Can not read MSR id=%d\n",core_msr);
                                close(fd);

                                *msr_value1=(double) msr_value;
}

void aperf_(double *msr_value1)
{

                                char path[100];
                                sprintf(path, "/dev/cpu/%d/msr", 0);

                                uint64_t msr_value=-1;
                                int retval = 0;
int fd = open( path, O_RDWR );
                                if (fd >= 0)
                                {
                                                                retval = pread(fd, &msr_value, sizeof(msr_value), MSR_IA32_APERF);
                                }
                                else printf("++ 22 Can not read MSR id=%d err:%d\n",core_msr,fd);
                                close(fd);

                                *msr_value1=(double) msr_value;
}

void cpupower_(double *msr_value1)
{

                                char path[100];
                                sprintf(path, "/dev/cpu/%d/msr", 0);

                                uint64_t msr_value=-1;
                                int retval = 0;

                                int fd = open(path, O_RDONLY);
                                if (fd >= 0)
                                                                retval = pread(fd, &msr_value, sizeof(msr_value), cpu_msr);
                                else printf("++ 44 Can not read MSR id=%d\n",cpu_msr);
                                close(fd);

                                *msr_value1=(double) msr_value;
}

void gettemp(double * temp)
{
        int i;
        double tot_temp = 0;
        for(i=0;i<numProcs;i++) {
                char path[100];
                sprintf(path,"/sys/devices/platform/coretemp.%d/temp1_input",i);
    FILE * f=fopen(path,"r");
          if (!f) {    printf("TEST");
                        fprintf(stderr, "Error when open temp files.\n");
                        exit(1);
                }
                char val[6];
                fgets(val, 6, f);
                tot_temp += atof(val) / 1000;
                fclose(f);
}
        *temp = tot_temp / numProcs;
}

int main(int argc,char * argv[])
{
                                if(argc < 2) {
                                        fprintf(stderr, "Usage: ./msr hostname\n");
                                        return 1;
                                }
                                double stcpu, oldcpu, divisor;
                                double stmperf, oldmperf;
                                double staperf,oldaperf;
                                double temp;
                                double oldtime,STARTT,sttime;

                                getpowerunit_(&divisor);

                                cpupower_(&stcpu);
                                mytime_(&sttime);
                                mperf_(&stmperf);
                                aperf_(&staperf);
                                STARTT = sttime;

                                FILE *f;
                                char filename[80];
                                sprintf(filename, "/home/sarood1/zyang/msr/perf_%s_%lu.log", argv[1], (long)(STARTT/100));
                                f = fopen(filename,"w");
                                long ints=0;
                                while(ints<2400)
                                {
                                                                ints++;
                                                                oldcpu = stcpu;
                                                                oldtime = sttime;
                                                                oldmperf = stmperf;
                                                                oldaperf = staperf;

                                                                cpupower_(&stcpu);
                                                                mperf_(&stmperf);
                                                                aperf_(&staperf);
                                                                mytime_(&sttime);

                                                                gettemp(&temp);
                                                                double telap= sttime-oldtime;
printf("Time:%f Temp:%f CPU Power: %f Freq:%f \n",
                                                                        sttime-STARTT,
                                                                        temp,
                                                                        (stcpu-oldcpu)/(telap*divisor),
                                                                        2.0*(staperf-oldaperf)/(stmperf-oldmperf));
                                                                fprintf(f,"%f %f %f %f\n",
                                                                        sttime-STARTT,
                                                                        temp,
                                                                        (stcpu-oldcpu)/(telap*divisor),
                                                                        2.0*(staperf-oldaperf)/(stmperf-oldmperf));
                                                                fflush(f);
                                                                usleep(500000);
                                }
                                fclose(f);
                                return 0;
}

 


Somehow, I think I figure it

Somehow, I think I figure it out.

I have a problem for the above result.

Basically, the relationship between energy unit and wraparound value is: ENERGY_UNIT=1/WRAPAROUND_VALUE.

It's a little bit strange format that uses pow(2, 32) - 1. It works, but not obvious way for me. In my opinion, if he wants to get wraparound value by energy unit, the best and direct way is do WRAPAROUND_VALUE = 1 / ENERGY_UNIT. Otherwise, how he knows the hard coded 32 is just double as energy unit bits value(16).


Hi, I have read the code in

Hi, I have read the code in http://software.intel.com/en-us/articles/intel-power-gadget-20 as you recomended.

The author said default wraparound value for total energy consumption is 65536 Jules, which I checked with my code and turn to be correct (but I haven't found documents talk about this value even in Intel mauals...)

However, the way that author get for wraparound value is:    

MAX_ENERGY_STATUS_JOULES = (double)(RAPL_ENERGY_UNIT * (pow(2, 32) - 1));

Instead of using 65536 directly. What I got from my machine for MAX_ENERGY_STATUS_JOULES is 68719476720.000000, which is unreasonable...

Do you have any insight about that?

Thank you in advance

 


Hello

Hello

I tried to set directly in MSR and try to use power_gov utility to increase the time window where CPU in full turboboost (where consumption above nominal TDP). Cooling system can heat desipate, but after 10 seconds CPU reduse power to nominal TDP.

How to increase the time window, as I understand it mode is power_limit_2?

I know that this is done by other people but can't understand how? Which MSR registers can help increase this time window ? Can power_gov do this ?


Hi Sankarp, 

Hi Sankarp, 

Can you please provide the command that you are using in order to set the power limit? After you set the power limit, can you use the "-i" option in order to verify that your change actually took effect? Are you also using power_gov in order to measure your power consumption. If so, could you please provide the command you are using and a short sample of the output demonstrating that the power limit is not working? 

thank you

martin


Hello, I am trying to set a

Hello, I am trying to set a power cap for the system through the power limit registers. Currently, I am trying to set the limit by setting the RAPL setting of PKG power plane. But when I stress the system by running some workloads, the PKG power is greater than the power limit that I set. My system contains the i5-2500K processor (sandybridge) and the turbo boost is enabled in the system. Any idea of why the power limit registers are not working?


Pages