CPU Power Utilization on Intel® Architectures

by Karthik Krishnan


Introduction

How Intel® dual-core mobile platforms are affected by high clock-interrupt rates and sleep states.

The Win32* interface provides various APIs for periodic execution of an application code at a desired frequency. These are based on the periodic timer ticks of the system clock built into the Hardware Abstraction Layer (HAL) of the Microsoft Windows* operating system. Various applications such as multimedia playback have threads for disk I/O, decoding, audio-video output and UI. Typically they use the timer based interrupts for periodic execution of a code section before or at a particular timeline (for example, play audio every 90 milliseconds, etc.) The following are the most commonly used APIs for this scenario:

  • timeSetEvent(UINT uDelay, UINT uResolution, LPTIMECALLBACK lpTimeProc, DWORD_PTR dwUser, UINT fuEvent)

    This is a multimedia timer from Win32 that runs in its own thread that is available from winmm.lib. The callback function can be either executed periodically or scheduled once, depending on the fuEvent parameter.
  • SetTimer(HWND hWnd, UINT_PTR nIDEvent,UINT uElapse, TIMERPROC lpTimerFunc) available through user32.dll

    The most common usage of this function is to post a WM_TIMER message periodically to an application window and invoke the callback while handling the timer message.

 

There are additional APIs such as WaitableTimers(), QueuedTimers(), etc.; however, all of them use the system clock to trigger timer-based code execution.


Interrupt Rate Granularity

Typically the operating system receives periodic timer-based interrupts every 10-15.6 milliseconds. Let us consider an example where an application uses timeSetEvent() API for an audio playback every 32 milliseconds, and the operating system receives an interrupt every 15 milliseconds. The operating system will check for deadlines during every timer tick and for the first two timer ticks (i.e. 30-milliseconds elapsed time) where the application playback deadline would not have been met. During the third timer tick (after 45 milliseconds), the operating system will realize that the deadline for the playback has already expired and will fire the playback. The problem here is that the playback will be delayed by 13 milliseconds longer than what was programmed by the application. This shows that the granularity of the timer interrupts may have an impact with the timely firing of periodic calls.

The Microsoft Windows Multimedia SDK provides timeBeginPeriod()API to change the default interrupt frequency to as minimum as 1 millisecond on Windows XP. The following code shows an example:

Modifying Interrupt Rate Resolution

void SetResolution(int delta)

{

	TIMECAPS tc;

	UINT     wTimerRes;

	if (timeGetDevCaps(&tc, sizeof(TIMECAPS)) == TIMERR_NOERROR)

{

		wTimerRes = min(max(tc.wPeriodMin,

    (UINT)delta), tc.wPeriodMax);

		timeBeginPeriod(wTimerRes); 

}

}

 

Increasing the interrupt rate granularity will improve the timeliness of the periodic calls more precisely. For example, in the previous example, since the interrupts are triggered every millisecond, the playback will occur at the right time. But increasing the interrupt rate granularity comes with severe power cost. Also, not all multimedia content may require such high interrupt frequency. The following sections will analyze the impact of having such a high interrupt frequency on the sleep states and CPU power consumption on current and future Intel architectures. Included in the analysis are experiments with a few multimedia applications to demonstrate that the user experience and overall performance can still be maintained with a default interrupt rate for certain media contents. Readers are advised to refer to http://www.microsoft.com/whdc/system/sysinternals/mm-timer.mspx* for a detailed discussion on interrupt rates.

Interrupt Rate Impact on Intel Architectures

Intel mobile architectures such as the Intel® Pentium® M processor include Enhanced Intel SpeedStep® Technology to optimize power and performance according to the demand on the system. The technology operates by providing multi-point operating modes (referred to as P-State, P0 being highest CPU frequency) on the CPU that increments or decrements the processor frequency depending on the demand. When there is negligible demand on the system and the CPU is idling, it provides multiple processor sleep states (referred to as C-State; higher C-states such as C4 refers to deeper sleep state) that reduce the overall power consumption significantly.

Every interrupt will pull back the CPU from a deeper sleep state to C0 due to the interrupt handler that services the interrupts. This impacts the sleep state residencies that are critical to optimize the power consumed. There is also an energy cost associated in transitioning between multiple C-states. If the interrupt rates are high, the power savings due to deeper sleep states are negatively impacted due to the decrease in sleep state residencies and the cost associated with C-state transitions. While Intel® platforms offer platform-specific C3-like states, they require long residency (multiple milliseconds) to fully amortize their transitional costs. An aggressive interrupt rate can potentially negate the benefits of deep sleep states offered by the platform.

C-State residency and CPU power impact on Intel® Core™ Duo (formerly Yonah) processor

The author has used a simple ticker code to analyze the impact of high interrupt rate on C-state residencies and average CPU power on an Intel® Core™ Duo B0 Processor (2.0GHz), 1GB 533 running Microsoft WinXP*-SP2. All the power data included in this whitepaper was estimated using NetDAQ* instrumentation of the CPU. The following code (referred to as MyTicker) takes a command line argument and sets the interrupt rate accordingly and idles forever:

MyTicker code.

int

 _tmain(int argc, _TCHAR* argv[])

{

	if (argc != 2) 

		return 1;

	int tick = (int)argv[argc];

	TIMECAPS tc;

	UINT     wTimerRes;

	HANDLE hThread;


	if (timeGetDevCaps(&tc, sizeof(TIMECAPS)) != TIMERR_NOERROR)  

	{

		return -1; //error

	}

	wTimerRes = min(max(tc.wPeriodMin, tick), tc.wPeriodMax);

	timeBeginPeriod(wTimerRes); 

	Sleep(INFINITE);

	return 0; //not reached!

}

 

The following charts provide the comparison of CPU utilization, interrupts per second, C3/C4 residency, and average CPU power between “MyTicker 1 millisecond” that triggers interrupts in 1-millisecond intervals and an IDLE system that sets the default interrupt interval. The purple line represents an IDLE system with default interrupt (~15.6 milliseconds) and the blue line represents an idle system with a 1-millisecond interrupt frequency.

Figure 1: Interrupts Per Second (MyTicker 1ms vs IDLE)

As expected the interrupts per sec increased from ~100 to ~1000 here.

Figure 2: Sleep State Residency (MyTicker 1ms vs IDLE)

The %C3 residency here reflects the sleep state residency (C3/C4) between the two scenarios. As the interrupts per second were increased from ~100 to ~1000 the C3 residency decreased from ~97% to ~81%. Decrease in sleep state residency implies increase in average CPU power.

Figure 3: Average CPU Power (MyTicker 1ms vs IDLE)

As can be clearly seen from above, there is a significant increase in the average power consumption by CPU (~0.5W to ~1.05W) when the interrupts are increased. The power penalties on future platforms are expected to be significantly higher than these, and there is a need for the applications to be sensitive to the interrupt rate.


Analysis with Multimedia Applications

The following section provides the investigation as done above with some real-world multimedia applications. The focus of the analysis is the following:

  • Ascertain the built-in tick rate set by multimedia applications
  • Develop methodology to modify the built-in tick rates and investigate any change in overall user experience (dropped frames, audio-video sync issues, etc.)
  • Investigate the power impact between high and low interrupt rates

 

Since we had to study different applications and did not have source for those, an interception methodology to intercept and modify the timer APIs was implemented. The methodology works in a similar fashion as the Microsoft Detours* library. At a high level, it works by modifying the binary image of the applications to create a gating function for the timer APIs. The following shows the methodology using timeSetEvent() API:

Figure 4: API Interception Methodology

As a first step, the code containing the interception functionality (MyIntercept function, Trampoline function) is pushed into the address space of calling process (Iexplore.exe). The first few instructions of the function that need to be intercepted are overwritten with an unconditional jump to MyIntercept function. Trampoline function contains those overwritten instructions and an unconditional jump back to timeSetEvent(). Since any existing timer call will pass through MyIntercept(), we can modify the built-in interrupt rate by modifying the stack values in MyIntercept().

Interrupt Values in Multimedia Applications

We chose a list of commonly used multimedia applications to investigate the interrupt frequencies used. The table below shows that most of the multimedia applications use a high interrupt rate (1 millisecond).

Application TimerAPI Interrupt Frequency
(milliseconds)
Interrupt Value
(milliseconds)
Multimedia Plugin1* SetTimer() 1 10-30
Multimedia Plugin2* SetTimer() 1 15-30
Multimedia Plugin3* SetTimer() 5 10
Multimedia Plugin4* SetTimer() 1 10
Multimedia Plugin5* timeSetEvent() 1 16

 


Final Results

The following table shows the final results with the multimedia applications with a high and low interrupt rate (1 millisecond Vs 10 milliseconds). The content for each did not consume a lot of CPU power on the Intel Core Duo processor tested. This leaves a lot of headroom for various activities and also reflects the usage where the playback does not consume significant resources. The table includes C0/C3 residency along with the average power consumed by CPU. The %C3Res includes the residency of all the C3 plus deep sleep states.

The frames per second and audio/video sync were used to measure user experience. Any impact on those will reflect on poor user experience. With the content we have tested, no noticeable difference in user experience was seen when the interrupt rates were decreased to 10 milliseconds.

Application %C0 %C3Res Interrupts/Sec Average Power Content
Multimedia Plugin1*:
1 millisecond
~25% ~75% ~1087 1.24W Online Ad
Multimedia Plugin1*:
10 milliseconds
~10% ~90% ~71 1.10W
Multimedia Plugin2*:
1 millisecond
~36% ~64% ~1110 1.84W 640x248, Movie Trailer
Multimedia Plugin2*:
10 milliseconds
~20% ~80% ~80 1.32W
Multimedia Plugin3*:
1 millisecond
~57% ~43% ~1035 2.788W 1280x720, HD Movie
Multimedia Plugin3*:
10 milliseconds
~47% ~53% ~65 2.45W

 

As the interrupt frequency was decreased from 1 millisecond to 10 milliseconds, there is a noticeable increase in C3 residency and decrease in C0 residency. This results in significant power savings in the CPU (max 0.5W) depending on the content and applicat ion. The savings in power is expected to grow significantly in future architectures.

Persistent Interrupts

Another issue noticed with some of the multimedia applications includes sticky interrupts that are not shut down even after the playback is complete. The following shows an example scenario.

Figure 5. An example of persistent interrupts

Not shutting down the interrupts will maintain the CPU in active state, and decrease deep sleep residency even while idling.


Conclusion

  • The interrupt rate impacts deep sleep residency and power consumption by CPU significantly. The power penalty is expected to grow significantly in future platforms.
  • Many media playback applications use a very high interrupt rate (1 millisecond).
  • Depending on the content, it is possible to maintain the same user experience while reducing the interrupt rate, thereby conserving CPU power.
  • It is recommended that high interrupt rate (1 millisecond) be used only when the media content absolutely requires it.
  • Many applications create sticky interrupts that stay even after the content playback is complete. Such sticky interrupts decrease sleep state residency even while idling. It is recommended that the high interrupt rate be shut down once the playback is finished.

 


References

 


About the Author

Karthik Krishnan is an applications engineer working for Intel's Software and Solutions group. He joined Intel in 2001 and has been working with various software vendors to optimize their products on Intel® Mobile and desktop platforms. Prior to joining Intel, he has worked for Fluent Inc. as a software developer dealing with parallel programming.

 


Per informazioni più dettagliate sulle ottimizzazioni basate su compilatore, vedere il nostro Avviso sull'ottimizzazione.