| Last Modified On : | October 1, 2008 4:53 PM PDT |
Rate |
|
By Venkatesh Pallipadi
The Pentium® M processor supports Enhanced Intel SpeedStep® Technology as an advanced means of enabling very high performance while also meeting the power-conservation needs of mobile systems. Conventional Intel SpeedStep Technology switches both voltage and frequency in tandem between high and low levels in response to processor load. Enhanced Intel SpeedStep Technology builds upon that architecture using design strategies that include the following:
Because Enhanced Intel SpeedStep Technology reduces the latency associated with changing the voltage/frequency pair (referred to as P-state), those transitions can be practically undertaken more often, which enables more-granular demand-based switching and the optimization of the power/performance balance based on demand. This article gives developers an overview of the support for Enhanced Intel SpeedStep Technology and demand-based switching under Linux. It is also a ready reference for developers interested in new user-level or in-kernel policy based on Enhanced Intel SpeedStep Technology.
Cpufreq is the subsystem of the Linux kernel that allows clock speed to be explicitly set on mobile processors. A lot of recent work has advanced the modularization of generic kernel cpufreq architecture. The following figure depicts the 2.6.8 kernel cpufreq infrastructure at a high level:

Figure 1. High-level overview of the cpufreq infrastructure
The primary components of this infrastructure are as follows:
Cpufreq module: The cpufreq module provides a common interface to the various low level, CPU-specific frequency-control technologies and high-level CPU frequency-controlling policies. cpufreq decouples the CPU frequency-controlling mechanisms and policies and helps in independent development of the two. It also provides some standard interfaces to the user, with which the user can choose the policy governor and set parameters for that particular policy governor.
CPU-specific drivers: Various low-level, CPU-specific drivers implement various CPU frequency-changing technologies, such as Intel SpeedStep Technology, Enhanced Intel SpeedStep Technology, and Pentium® 4 processor clock modulation. On a given platform, one or more frequency-modulati on technologies can be supported, and a proper driver must be loaded for the platform to perform efficient frequency changes. The cpufreq infrastructure allows the user to use one CPU-specific driver per platform. The low-level CPU-specific drivers acpi and speedstep-centrino handle Enhanced Intel SpeedStep Technology-enabled CPUs.
In-kernel governors: The cpufreq infrastructure allows for frequency-changing policy governors, which can change the CPU frequency based on different criteria such as CPU usage. The cpufreq infrastructure can show various governors that are available for use on the system and allows the user to select one governor to manage CPU frequency.
Three governors can be default boot-time governors:
A new in-kernel governor called the ondemand governor, which is discussed in more detail below, is added to make the most of low-latency CPU frequency/voltage-pair transitions supported in Enhanced Intel SpeedStep Technology.
Cpufreq exposes a number of standard interfaces in the /sys filesystem, which the user can use to change characteristics such as the CPU frequency and the policy governor. Each interface provided by cpufreq is explained below. Note that /sys should be mounted automatically.
mkdir /sys mount -t sysfs sys /sys
A per-CPU directory is available in sysfs under /sys/devices/system/cpu. Note that Hyper-Threading Technology siblings will each appear separately in sysfs.
linux: # pwd /sys/devices/system/cpu linux: # ls -l total 0 drwxr-xr-x 6 root root 0 Jul 31 22:50 . drwxr-xr-x 8 root root 0 Jul 31 22:50 .. drwxr-xr-x 3 root root 0 Jul 31 22:50 cpu0 drwxr-xr-x 3 root root 0 Jul 31 22:50 cpu1 drwxr-xr-x 3 root root 0 Jul 31 22:50 cpu2 drwxr-xr-x 3 root root 0 Jul 31 22:50 cpu3
All the information exported by the cpufreq interface can be found under the cpuX/cpufreq directory.
linux: # pwd /sys/devices/system/cpu/cpu1/cpufreq linux: # ls -l total 0 drwxr-xr-x 2 root root 0 Jul 31 22:51 . drwxr-xr-x 3 root root 0 Jul 31 22:50 .. r—-r-—r-- 1 root root 4096 Jul 31 22:51 cpuinfo_max_freq r—-r-—r-- 1 root root 4096 Jul 31 22:51 cpuinfo_min_freq r—-r-—r-- 1 root root 4096 Jul 31 22:51 scaling_available_frequencies r—-r—-r- - 1 root root 4096 Jul 31 22:51 scaling_available_governors r—-r—-r-- 1 root root 4096 Jul 31 22:51 scaling_cur_freq r—-r—-r-- 1 root root 4096 Jul 31 22:51 scaling_driver rw-r—-r-- 1 root root 4096 Jul 31 22:51 scaling_governor rw-r-—r-- 1 root root 4096 Jul 31 22:51 scaling_max_freq rw-r—-r-- 1 root root 4096 Jul 31 22:51 scaling_min_freq
All the files in the above listing are readable (cat {filename}) and will contain the value for that particular variable. Some of the files are writable (echo {value} > {filename}), and they set a particular value to the variable.
Descriptions of the key entries here are as follows:
The entry scaling_driver contains the name of the low-level CPU-specific driver that is being used on this system:
linux: # cat scaling_driver acpi-cpufreq
The entry scaling_available_frequencies contains a list of all the frequencies supported on the processor (all frequency values are expressed in KHz):
linux: # cat scaling_available_frequencies 3400000 2800000
The entry scaling_cur_freq provides an interface to get the current frequency:
linux: # cat scaling_cur_freq 3400000
The entry scaling_available_governors lists all the governors that can be used in this system:
linux: # cat scaling_available_governors ondemand userspace performance
The entry scaling_governor provides a read-write interface that shows the current policy governor being used. A new value, from among the list obtained from scaling_available_governors, can be written into this file, and the system will start using this new governor on this CPU:
linux: # cat scaling_governor performance linux: # echo “userspace” > scaling_governor linux: # cat scaling_governor userspace
The entries scaling_max_freq and scaling_min_freq are the parameters to the policy governor. They are read-write and control the behavior of the governor:
linux: # cat scaling_max_freq 3400000 linux: # cat scaling_min_freq 2800000
One of the major advantages that Enhanced Intel SpeedStep Technology brings is lower latency associated with P-state changes – on the order of 10mS. In order to reap maximum benefit, the OS must perform more-frequent P-state transitions to match the current processor utilization. Doing frequent transitions with a user-level daemon will involve more kernel-to-user transitions, as well as a substantial amount of kernel-to-user data transfer. An in-kernel P-state governor, which dynamically monitors the CPU usage and makes P-state decisions based on that information, is therefore better-suited to taking full advantage of Enhanced Intel SpeedStep Technology. The ondemand policy governor is one such in-kernel P-state governor. The basic algorithm employed with the ondemand governor is as follows:
Every X milliseconds
Get the current CPU utilization
If the utilization is more than UP_THRESHOLD %
Increase the P-state to the maximum frequency
Every Y milliseconds
Get the current CPU utilization
If the utilization is less than DOWN_THRESHOLD %
Decrease P-state to next available lower frequency
The ondemand governor, when supported by the kernel, will be listed in the /sys interface under scaling_available_governors. Users can start using the ondemand governor as the P-state policy governor by writing onto scaling_governor:
linux: # cat scaling_available_governors ondemand userspace performance linux: # echo “ondemand” > scaling_governor linux: # cat scaling_governor ondemand
Note that this sequence must be repeated on all the CPUs present in the system. Once this is done, the ondemand governor will take care of adjusting the CPU frequency automatically, based on the current CPU usage. CPU usage is based on the idle_ticks statistics.
Because a single policy governor cannot satisfy all of the needs of applications in various usage scenarios, the ondemand governor supports a number of tuning parameters. The following explanation reveals what each tuning parameter means, as well as the default value of each:
linux: # echo “ondemand” > scaling_governor linux: # ls -l total 0 drwxr-xr-x 3 root root 0 Aug 5 04:23 . drwxr-xr-x 3 root root 0 Jul 31 22:50 .. r—-r—-r-- 1 root root 4096 Jul 31 22:51 cpuinfo_max_freq r—-r—-r-- 1 root root 4096 Jul 31 22:51 cpuinfo_min_freq drwxr-xr-x 2 root root 0 Aug 5 04:23 ondemand r—-r—-r-- 1 root root 4096 Jul 31 22:51 scaling_available_frequencies r—-r-—r-- 1 root root 4096 Jul 31 22:51 scaling_available_governors r—-r-—r-- 1 root root 4096 Jul 31 22:51 scaling_cur_freq r—-r-—r-- 1 root root 4096 Jul 31 22:51 scaling_driver rw-r—-r-- 1 root root 0 Aug 5 04:23 scaling_governor rw-r—-r-- 1 root root 4096 Jul 31 22:51 scaling_max_freq rw-r—-r-- 1 root root 4096 Jul 31 22:51 scaling_min_freq
Note the new ondemand sub-directory.
linux: # cd ondemand/ linux: # ls -l total 0 drwxr-xr-x 2 root root 0 Aug 5 04:23 . drwxr-xr-x 3 root root 0 Aug 5 04:23 .. rw-r—-r-- 1 root root 4096 Aug 5 04:23 down_threshold rw-r—-r-- 1 root root 4096 Aug 5 04:23 sampling_down_factor rw-r—-r-- 1 root root 4096 Aug 5 04:23 sampling_rate r-—r—-r-- 1 root root 4096 Aug 5 04:23 sampling_rate_max r—-r—-r-- 1 root root 4096 Aug 5 04:23 sampling_rate_min rw-r—-r-- 1 root root 4096 Aug 5 04:23 up_threshold
This directory contains all the ondemand algorithm-tuning v ariables.
linux: # cat sampling_rate_max 55000000 linux: # cat sampling_rate_min 55000
These times are measured in microseconds, denoting the minimum and maximum sampling rate. These values are read-only, and predetermined by the kernel as a factor of P-state transition latency.
linux: # cat sampling_rate 110000
This read-write field denotes a value in microseconds. The ondemand governor checks CPU utilization and tries to increase the CPU frequency at this rate.
linux: # cat sampling_down_factor 10
This is a read-write integer value that is multiplied with sampling_rate and used to the rate at which the ondemand governor checks CPU utilization and tries to lower the CPU frequency.
linux: # cat up_threshold 80 linux: # cat down_threshold 20
These are read-write fields that stand for CPU-utilization thresholds. Whenever the current utilization is more than up_threshold, the ondemand governor will increase the frequency to the maximum. Whenever the CPU utilization is less than the down_threshold, the ondemand governor tries to reduce the frequency to the next available lower P-state.
The Userspace Govenor
The Userspace governor allows any user-space utility to change the CPU frequency, by writing some value a file under /sys.
linux: # echo “userspace” > scaling_governor linux: # cat scaling_governor userspace linux: # ls -l total 0 drwxr-xr-x 2 root root 0 Jul 31 22:51 . drwxr-xr-x 3 root root 0 Jul 31 22:50 .. r--r--r-- 1 root root 4096 Jul 31 22:51 cpuinfo_max_freq r--r--r-- 1 root root 4096 Jul 31 22:51 cpuinfo_min_freq r--r--r-- 1 root root 4096 Jul 31 22:51 scaling_available_frequencies r--r--r-- 1 root root 4096 Jul 31 22:51 scaling_available_governors r--r—r-- 1 root root 4096 Jul 31 22:51 scaling_cur_freq r--r—r-- 1 root root 4096 Jul 31 22:51 scaling_driver rw-r--r-- 1 root root 0 Jul 31 22:51 scaling_governor rw-r--r-- 1 root root 4096 Jul 31 22:51 scaling_max_freq rw-r--r-- 1 root root 4096 Jul 31 22:51 scaling_min_freq rw-r--r-- 1 root root 4096 Aug 4 18:46 scaling_setspeed
Note a new file here, scaling_setspeed:
linux: # cat scaling_setspeed 3400000 linux: # echo 2800000 > scaling_setspeed linux: # cat scaling_setspeed 2800000
The file scaling_setspeed is read-write. When read, it denotes the current CPU frequency. The user can write a value to this field (using the list in scaling_available_frequencies), and the CPU will change the frequency to the one specified by the user. There are different CPU frequency governors, implemented as user-space daemons. They control the frequency by reading and writing values to and from scaling_setspeed. Two such user-level daemons that are widely used are powersaved and cpuspeed.
The relative advantages of in-kernel governors and user-space governors are listed here. This will help users (or administrators) in determining what kind of governor one should use under various system-usage scenarios. The advantages of in-kernel governors include the following:
The advantages of user-space governors include the following:
Cpufreq and Enhanced Intel SpeedStep Technology-related CONFIG options look like the following:
General config menu ->
Power management options ->
ACPI Support
<*> ACPI Support
[*] Processor
General config menu ->
Power management options ->
CPU Frequency scaling
[*] CPU Frequency scaling
<*> /proc/cpufreq interface (deprecated)
[*] /proc/sys/cpu/ interface (2.4. / OLD)
Default CPUFreq governor (userspace) --->
<*> 'performance' governor
<*> 'powersave' governor
--- 'userspace' governor for userspace frequency scaling
<*> 'ondemand' cpufreq policy governor
<*> CPU frequency table helpers
--- CPUFreq processor drivers
<*> ACPI Processor P-States driver
[ ] /proc/acpi/processor/../performance interface
(deprecated)
<*> Intel Enhanced SpeedStep
[*] Use ACPI tables to decode valid frequency/voltage pairs
(EXPERIMENTAL)
< > Intel Speedstep on ICH-M chipsets (ioport interface)
< > Intel SpeedStep on 440BX/ZX/MX chipsets (SMI interface)
< > Intel Pentium 4 clock modulation
: : :
: : :
: : :
|
The options colored in blue are the required options to support Enhanced Intel SpeedStep Technology, cpufreq interfaces, and demand-based switching. If they are configured as modules and added after boot, care should be taken to maintain the following ordering:
| UP EST (i386) | 2.6.8 and above |
| SMP EST(i386) | 2.6.9 and above |
| UP EST(EM64T) | 2.6.9 and above |
| SMP EST EM64T | 2.6.9 and above |
| Ondemand governor | 2.6.9 and above |
Developers who target Linux operating systems for mobile applications should use the cpufreq module to implement Enhanced Intel SpeedStep Technology. The very low latency associated with changes to the voltage/frequency pair under this technology enables high-performance demand-based switching, which can save substantially on battery life in systems based on Intel® Centrino® mobile technology.
The programming interfaces associated with the cpufreq module provide a robust environment for the implementation of this technology. Assuming that Linux continues to gain ground as a mobile operating system, power savings in Linux-based mobile applications will become more valuable as a means of gaining competitive advantage for ISVs. Implementing good support for Enhanced Intel SpeedStep Technology today is therefore a sound strategy toward positioning mobile applications for success.
Note: Powersaved is a user-level CPU frequency manager found in SuSE Linux installations.
Get the possible governors on this system:
#!/bin/bash
get_governors_on_cpu() {
if [ “$#” -ne 1 ]
then
echo $”Usage: get_governors_on_cpu {cpu}”
return 1
else
cpu=$1
fi
dir=/sys/devices/system/cpu/cpu${cpu}/cpufreq
if [ -d $dir ]
then
cd $dir
cat scaling_available_governors
else
echo “CPU” ${cpu} “ available policy get FAILED”
return 1
fi
return 0
}
get_governors() {
if [ “$#” -ne 0 ]
then
echo $”Usage: get_governors”
return 0
fi
p1=‘pwd‘
if [ ! -d /sys ]
then
mkdir /sys
fi
if [ ! -d /sys/devices ]
then
mount -tsysfs sysfs /sys
fi
get_governors_on_cpu 0
RETVAL=$?
if [ $RETVAL -ne 0 ]
then
cd $p1
return $RETVAL;
fi
cd $p1
return 0
}
get_governors $1
exit $?
|
Set a policy governor on all CPUs:
#!/bin/bash
set_governor_on_cpu() {
if [ “$#” -ne 2 ]
then
echo $”Usage: set_governor_on_cpu {cpu} {governor}”
return 1
else
cpu=$1
governor=$2
fi
dir=/sys/devices/system/cpu/cpu${cpu}/cpufreq
if [ -d $dir ]
then
cd $dir
echo $governor > scaling_governor
if [ ‘cat scaling_governor‘ == $governor ]
then
echo “CPU” ${cpu} “policy set to” $governor “governor”
echo -n “maximum frequency:”
cat scaling_max_freq
echo -n “minimum frequency:”
cat scaling_min_freq
else
echo “CPU” ${cpu} “policy setting to” $governor “governor FAILED”
return 1
fi
else
echo “CPU” ${cpu} “policy setting to” $governor “governor FAILED”
return 1
fi
return 0
}
set_governor() {
if [ “$#” -ne 1 ]
then
echo $”Usage: set_governor {governor}”
echo $”Example: set_governor ondemand”
return 0
else
governor=$1
fi
p1=‘pwd‘
if [ ! -d /sys ]
then
mkdir /sys
fi
if [ ! -d /sys/devices ]
then
mount -tsysfs sysfs /sys
fi
num_cpus=‘cat /proc/cpuinfo| grep “^processor”| wc -l ‘
i=0
while [ $i -lt $num_cpus ]
do
set_governor_on_cpu $i $governor
RETVAL=$?
if [ $RETVAL -ne 0 ]
then
cd $p1
return $RETVAL;
fi
i=‘expr $i + 1‘
done
cd $p1
return 0
}
set_governor $1
exit $?
|
Print (periodically) the speed of all CPUs:
#!/bin/bash
get_speed_on_cpu() {
if [ “$#” -ne 1 ]
then
echo $”Usage: get_speed_on_cpu {cpu}”
return 1
else
cpu=$1
fi
dir=/sys/devices/system/cpu/cpu${cpu}/cpufreq
if [ -d $dir ]
then
SPEED=‘cat /${dir}/scaling_cur_freq‘
echo “CPU”${cpu} “:” $SPEED “kHz”
else
echo “CPU” ${cpu} “ current frequency get FAILED&
rdquo;
return 1
fi
return 0
}
get_speed() {
if [ “$#” -gt 1 ]
then
echo $”Usage: get_speed [time in usec(default print once and exit)]”
return 0
fi
ONESHOT=0
if [ “$#” -eq 1 ]
then
TIME=$1
else
ONESHOT=1
fi
if [ ! -d /sys ]
then
mkdir /sys
fi
if [ ! -d /sys/devices ]
then
mount -tsysfs sysfs /sys
fi
num_cpus=‘cat /proc/cpuinfo| grep “^processor”| wc -l ‘
while :
do
i=0
while [ $i -lt $num_cpus ]
do
get_speed_on_cpu $i
RETVAL=$?
if [ $RETVAL -ne 0 ]
then
return $RETVAL;
fi
i=‘expr $i + 1‘
done
if [ $ONESHOT -eq 1 ]
then
return 0
fi
UTIME=‘expr $TIME*1000|bc‘
usleep $UTIME
echo
done
return 0
}
get_speed $1 $2
exit $?
|
| November 2, 2007 5:25 PM PDT
Intel(R) Software Network Support |
Michael, our engineering contacts responded that they are unaware of any Intel(R) multi-core processor that has this behavior; it is actually with non-Intel processors that the TSC does not count at the highest frequency when demand-based switching is enabled. Those who are not using an Intel processor can disable demand-based switching in Windows* XP SP2 such that the processor always runs at the highest frequency by changing the Power Scheme, in the Power Options accessed via the Control Panel, to Always On. If in fact you are seeing this with an Intel multi-core processor, please send us an email with additional details. |
| February 29, 2008 4:44 PM PST
ethana2 |
"Windows routines don't properly handle.." SURPRISE! Those coders in Redmond assume all kinds of bizarre stuff- and it's not intel's job to fix it, because, well, they can't. ...which isn't the case with the linux kernel, and I applaud them for taking advantage of that. ...I still would love to see intel make SPARC T2 chips. I mean, that design and ISA on intel's state of the art chip processing technology.. *drools* It's openly licensed and such; one of the best things about FOSS is that it liberates us from any given ISA. Intel SPARC, in my opinion, would be the best of the best of the best. Or even PPC.. or both. </rant> |
| December 5, 2008 11:40 PM PST
Colin Williams | how about some dumbed down directions for us end users of ubuntu 8.10 who do not wish to compile their kernel? |
| April 5, 2009 11:49 PM PDT
Mark UK |
Hi Intel .. i have a celeron m 530 in my acer 5220 laptop here.. and i also run LINUX - i am using Mepis 8 at the moment because the cpu fan does occasionally switch off on this distro. i recompiled my new vanilla kernel from kernel.org and enabled all kinds of speed stepping but sadly it went wrong somewhere, i like compiling the odd app, but the kernel is a bit beyond me at the moment.. i cant find a definitive document that tells me if this celeron m 530 actually has any speedstep capabilities, the various linux forums i seem to find seem to be rather quiet on the issue of cpu scaling in the kernel.. though i know of no distro;s that do it be default. annd i have tried from slackware to jaunty. i have been told i could replace my cpu with a dual core option which is lower power, faster and fully supports cpu scaling. i shall keep trying a kernel recompile .. Colin <above post> good luck mate :) cheers Intel Mark UK |
| September 10, 2009 12:21 AM PDT
งงงงง | งงงคับเเปลไทยมีที่ไหนนี่ |
| October 8, 2009 3:58 AM PDT
Saurabh Sinha |
Thanks for the awesome tutorial. Needed to know about DVS/DVFS schemes currently in use and this one helped. Regards Saurabh |
| November 13, 2009 7:01 PM PST
Karan Khanna |
Hi Intel, This was a very helpful overview of Linux Kernel's cpufreq infrastructure. Thanks! |

Michael
How about showing us how to get the CPU to ALWAYS run at FULL SPEED in a Windows machine? We need this because Intel foolishly did not continue the TSC at full speed when the CPU clock is throttled. This SCREWS up timing and the Windows routines don't properly handle this. This is especially a problem in Multi-Core machines. So, I need to keep the CPU at FULL speed.