This is part 2 of an article - You can
read part 1 here.
Explore power characteristics of the Intel® Atom™ platform geared for the MID (Mobile Internet Device) category of devices. Providing recommendations for SW developers on how to best optimize applications for power efficiency for this new line of Intel processors/platforms. Some topics also apply to Intel® Atom™ processor based NetBooks.
Workloads
The following MID workload categories were analyzed:
- Idle behaviour in typical system idle states
- Video playback using Moblin Media Player & Helix framework
- Multi-threaded Video decode, audio transcode and Flash workloads used for HT impact analysis
- Browsing using Moblin Browser
All workload measurements are performed in steady state unless otherwise noted. Linux* kernel version 2.6.22 was used.
5.1. Idle modesPower data was captured for different processor configurations (modified via BIOS):
The following idle modes were analyzed:
- Home Screen - HTML
- Home Screen - HTML - Screen off
- Home Screen - Clutter (OpenGL)
- Home Screen - Flash (UI created in Flash, embedded in HTML)
- XTerm
- Moblin Browser with default home page loaded
Note: As the Flash Home Screen feature was broken, no Flash application icons were visible just a Flash shell. This was considered an acceptable approximation. After launch, a regular terminal shell was invoked (ctrl-alt-F1), the "on-demand" governor activated and measurements were made.
Observe that this is by no means an exhaustive list. Additional measurements are needed to provide a more complete overview of possible application/launcher user interfaces.
Note: As the Flash Home Screen feature was broken, no Flash application icons were visible just a Flash shell. This was considered an acceptable approximation. After launch, a regular terminal shell was invoked (ctrl-alt-F1), the "on-demand" governor activated and measurements were made.
Observe that this is by no means an exhaustive list. Additional measurements are needed to provide a more complete overview of possible application/launcher user interfaces.5.1.1. Processor sleep state behaviourThe following NetDAQ data was captured with HT turned on. C6 or C4 (C6 off) was configured as the lowest possible processor sleep state.
Below data was captured for when in idle on HTML Home Screen.
Note the ~80mW difference in average power for C6 on vs. off. It is clear that the C6 state has an important positive impact on average power when processor is experiencing low load or is mostly in idle. In essence the new C6 deep sleep state significantly reduces average Intel Atom™ processor power in idle.
Refer to chapter 6: "Appendix A" for details on Intel Atom™ sleep states and their characteristics.
5.1.2. Processor & Chipset power usage in Idle modesThe following NetDAQ data was captured with C6 and HT turned on. Idle power behaviour for the different Home Screens (application launchers) was compared to the Browser and Xterm in idle.
The Home Screens tested represent various UI technologies used for displaying application icons and launching applications. The UI technologies tested was HTML, OpenGL (Clutter) and Flash
1.

The above graph illustrates the normalized power behaviour for the 6 targeted idle workloads.
Contrary to what might be expected the Clutter home screen (OpenGL) does not lead to increased chipset average idle power. The HTML and OpenGL UI home screen solutions used are quite power friendly.
Automatically turning the screen off after an interval of no user input lowers chipset average power significantly (~13.5% below HTML idle). More importantly, powering off the screen also saves LCD power.
Moblin browser in idle mode on the default home page consumes a slightly higher average power for processor/chipset (~3.5 % above HTML idle). Observe that the default page did not have any advanced content such as Flash* or Ajax*, etc.
When idle on Flash home screen a completely different pattern is revealed. Due to the high number of interrupts, 250 wakeups/s (captured by PowerTOP, see below), the benefit of C6 sleep state is not fully utilized (even though the processor sometimes moves into C6). This is apparent from the much higher average power usage for processor and chipset (33% above HTML idle).
An alternative view of the power behaviour is available from the data captured by PowerTOP. Wakeup/s and deeper C state residency is captured in the graphs below.

From the above data it is easy to see the impact of a high number of wakeup/s on processor deep sleep state residency. For instance, the Flash home screen wakes up the processor ~250 times/s resulting in ~60% of the time spent in C4-C6, while the HTML home screen wakes up the processor 35 times/s resulting in ~98% of the time spent in C4-C6 resulting in significant power improvements.
Recent measurement using latest versions of Flash 9 and 10 reveals an improved Flash wakeup pattern resulting in improved power characteristics. Still, even during playback of the simplest Flash content with a frame rate of 1 fps, the wakeups per second in idle does not move below ~100 wakeups/s.
For all workloads, the processor spent > 95% in the lowest frequency mode (LFM, 800MHz) execution state (P-state).
5.2. Video playbackPower measurements were captured while Moblin Media player (utilizing the Helix framework) played back video of the following formats:
The media was encoded with H.264 for the video stream and AAC for the audio stream. Furthermore media playback with and without Helix HW acceleration was measured.
Observe that 1080p is only supported for some of the Intel Atom™ SKUs.
Note that other media frameworks such as GStreamer* also feature HW acceleration of video for the Intel® Atom™ platform.
NetDAQ data was captured with C6 turned on and HT was toggled on/off depending on the target measurement.
5.2.1. SW codecs vs. HW accelerated codecsThe normalized graphs below compare average normalized power and C0 state residency during SW codec media playback.

Decoding 480p (~3.5x more data than CIF handled) results in almost 2x in average processor power compared to CIF. Note that in using SW codecs, the processor is heavily utilized even for low resolution playback, as can be seen from the C0 residency graph.
There is clearly a need for HW acceleration to allow playback of higher resolution video workloads. The following graph compares average normalized power and C0 state residency during playback with SW codecs vs. playback using HW accelerated codecs.

From the above graphs it is clear that HW accelerated codecs have great benefits with regards to average power during high definition video playback. The required processor power scales gracefully for increased video resolutions. For instance, using HW acceleration, the platform is able to process 20x more data playing back 1080p vs. CIF with only a minor increase (~25%) in processor power. The average chipset power using HW acceleration also scales well, as will be illustrated below.
With regards to C0 state residency the processor for high resolution playback using HW acceleration, the processor shows moderate utilization.
Note: The release of the Helix framework used was not optimized for the MID platform and some kernel bottlenecks were identified. These issues have been addressed in more recent Moblin releases leading to much fewer wakeups (reaping the benefits of C6 sleep state) and much improved C0 state residency all leading to lower average power footprint for HW accelerated video playback.Note that using SW codecs, media with resolution greater than 480p cannot be played back due to performance limitations.
For all HW accelerated workloads the processor spent > 90% in the lowest frequency mode (LFM, 800MHz) P-state. Using SW codecs, the P-state residency indicates very high processor load. For instance, during 480p playback the processor spent just 3% in LFM.
The normalized graph below compares average chipset for HW accelerated codec media playback vs. SW codec media playback.

From the graph is clear that not only does HW acceleration enable playback of 1080p, it also improves the average chipset power. As can be seen in the graph the chipset average power used scales gracefully for higher resolution content. Also note that playing back 1080p using HW acceleration requires about the same average chipset power as playing back CIF content using SW codecs.
Data collected with PowerTOP indicates that the processor wakes up 300-600 times/s during video content playback, depending on the media format. The processor therefore has very limited benefits of the C6 sleep state.
5.2.2. Memory load impact on powerAnother important aspect of media playback is how frequent data is read/written to memory. Large volumes of data transferred to/from memory results in increased average system power. The normalized graph below illustrates the increased playback power use for various definitions of video content.

Platform memory subsystem uses on average 2.5x more power for 1080p playback vs. CIF playback.
The RAM is exercised approximately to the same degree for both HW accelerated playback and playback using SW codecs.
Note that several memory access improvements have been introduced to the framework and recent Linux kernels such as 2.6.28 have lead to improved average RAM power.
5.3. Benefits of HT on threaded workloadsBelow processor data was captured with HT turned on/off while running various multi-threaded SW video decode and audio transcode workloads. Observe that HW acceleration was not used for the following workloads.
The following three graphs show typical relative processor performance, power and energy for the workloads with HT turned on/off. Analysis included a range of video workloads with various resolutions, one audio workload (transcoding "wav" to "mp3") and four Flash animation (no Flash video) workloads. The workloads tested were all multithreaded to take advantage of multithreaded processor architectures.
Decoding was performed at highest possible rate (disregarding specified media rate) completing workload as fast as possible. When HT was turned on this generally resulted in higher power during workload execution, faster completion and thereby overall energy savings.

The graph clearly shows the performance benefit of the HT feature. Over the measured workloads we see a 32% geomean performance gain. The gain in performance naturally comes at an expense of average power. The graph below details the relative power overhead when HT is turned on/off.

From the measured data we see a geomean power overhead of 15%. Due to the increased performance the workloads generally completed faster which has an impact on the energy used by the processor. Below graph details the calculated processor energy benefits of the HT feature.

From the above graph we see 14% energy savings for the measured workloads.
The benefits of increased performance can also be seen for workloads such as Flash video playback. Contrary to former workloads the following workloads do not complete faster with HT turned on. Instead the increased performance gained by HT enabled translates into an increased frame rate (Note that Flash will play back the media at highest possible frame rate, up to the specified media frame rate).
The below graph shows measured frame rate for three different Flash video workloads with HT turned on/off.

From the above graph we measured a geomean frame rate gain of 14%. Besides the frame rate gains the use of HT also improves average power as the below graph illustrates.

From the measured data we found a geomean power saving of 19%, mainly due to extended time spent in lower P-state.
The reason for the decreased power, for Flash video, with HT enabled is due to the increased ability to move to lower P-states. Note that this behaviour is dependent on power management policy and workload.
In summary, for the workloads tested, the overall performance gain with HT activated compared to HT disabled was ~32% (ranging from 4-76%) while the overall power overhead with HT activated compared to HT disabled was ~15% (ranging from 5-26%). Due to the increased performance, with HT activated, the workload completion time was shortened leading to a net energy savings of ~14% (ranging from 0-28%). Flash video workloads showcase a frame rate gain of 14% and power savings of 19%.
5.4. BrowsingThe following NetDAQ data was captured with both C6 and HT enabled.
Moblin browser, based on Firefox 3, was used. A very simple Flash content (swf) media file was opened up in browser. Flash content, displaying small flashing text was measured in "idle". The processor and chipset power was measured and compared to Browser idle power behaviour.

From the above normalized graph it is clear that even for the simplest Flash content the processor is very active. Additional data captured with PowerTOP reveals that the processor wakes up ~340 times/s compared to ~75 times/s when the Browser is in idle on default home page.
If Flash content is heavily used during browsing, frequent processor activity will cause a significant decrease in the amount of time available on one battery charge.
As future editions of the Flash engine evolve make sure to utilize the latest Flash release as future releases does feature improved power efficiency.
Appendix A - Atom™ processor specifications
Details on available Intel® Atom™ processor SKUs including data on clock speed, TDP, idle power, FSB, sleep state details and more can be found on the Intel® Atom™ Processor Technology resource site.
http://www.intel.com/products/atom/index.htm
Appendix B - HW setup and NetDAQ power measurement setup
System setup overview
Platform specification and configuration
The SDP processor used is considered equivalent to the
Intel® Atom™ Z530 SKU.
NetDAQ configuration
Instrumented SDP was connected to NetDAQ for measurements via the 4 modules connected to the sense resistors on board. The NetDAQ was in its turn connected via Ethernet loopback cable to host PC where the measurements were collected. The SDP was additionally connected to external LCD kit via USB and to keyboard via PS/2.
Fluke NetDAQ collects measured current and voltage from the board sense resistors and transfers the data to the NetDAQ SW tool on the host machine which calculates, adjusts for board specific offsets and accumulates power data according to the power measurement objects listed above.
Acronyms
Maximillian Domeika (Intel)
717
Status Points:
217
Max