Creating Energy-Efficient Software (Part 2)

Data Efficiency

As stated in the introduction, data efficiency reduces energy costs by minimizing data movement. As summarized in [Ref2], data efficiency can be achieved by designing:

  • software algorithms that minimize data movement
  • memory hierarchies that keep data close to processing elements
  • application software that efficiently uses cache memories

Just as we demonstrated with computational efficiency, data efficiency delivers performance benefits and saves energy. The following sections examine several areas where data efficiency methods can be applied to save energy: DVD playback, native command queuing, and use of cache.

DVD Playback[5]

This section analyzes the power consumption of three different DVD playback software applications, and provides recommendations for reducing power consumption. For this study, three DVD playback applications were analyzed (DVD App #1, #2, and #3) and run with the multiple configurations available while power measurements were taken, primarily maximum power-saving mode vs. no power-saving mode. The workload used for the analysis was a standard definition of DVD content shipped with the MobileMark* 2005[6] benchmarking tool. We provide our recommendation on an optimal strategy for reducing power consumption while playing content from a DVD-ROM. See Appendix A for a description of the test environment used for these experiments.

Test Observations

As a baseline, the measured data in Figure 9 was used to target the energy-savin gs analysis. Figure 9 indicates that DVD drive spin-up is the most power hungry (requiring over 4W for a brief period) and that reducing spin-up can lead to energy savings. Continuous DVD read consumes about 2.5W of power and is another area for potential savings.

Figure 9: Baseline Power Consumption during Typical DVD Playback States

Figure 10 shows the actual ~20 minute sample from one of the DVD playback applications and is representative of the power demand for all applications operating in maximum performance mode (no power savings). Note that the average power consumed by the DVD player for continuous read is about 2.5 watts.

Figure 10: DVD Power from continuous Read

The chart in Figure 11 illustrates the power signature when the DVD mechanism is spun down and no reads are occurring. This signature was evident in Application #1 when it used max power saving mode.

Figure 11: DVD Power consumed in Max Power Saving Mode

Figure 12: Application #2 CPU Power Profiles

 

The data in Figure 12 was captured from Application #2, which seemed to exhibit a dramatic difference in CPU power consumption between the no power saving mode and the maximum power saving mode. The reason for this is that in the no power saving mode, the application changed the system power scheme to run with the maximum frequency available and then restored the original power scheme after the run was completed. It’s clear from this observation that potential energy savings are possible by considering CPU energy consumption in the software design.

Test Results

Table 1 shows the actual measured energy usage for the three DVD Playback applications using the MobileMark 2005 DVD Playback load. Figure 13 shows the plot of this data. It’s valuable to note from this data that difference between the worst case energy consumption (App 2, 10143 mWHrs) and the best case (App 3, 6023 mWHrs) energy consumption is over 4000 mWHrs and appears to be entirely due to the design choices made by the application developers. This is about a 40% energy savings. Even a 10% energy savings on a 4-hour battery would provide 24 minute more battery time.

Application

Mode

DVD Energ y (mWHrs)

CPU Energy (mWHrs)

Platform Energy (mWHrs)

DVD App 1

No Save

869.84

663.92

6618.76

Max Save

263.41

762.99

6039.41

DVD App 2

No Save

897.82

3329.53

10143.56

Max Save

895.57

1064.02

7509.18

DVD App 3

No Save

780.93

703.25

6202.04

Max Save

781.04

554.87

6023.57

Table 1: Energy Consumed during DVD Playback

Figure 13: Energy Consumed during DVD Playback

The analysis in Table 2 indicates that the combined energy savings measured for the CPU and the DVD are largely responsible for the overall platform energy savings during playback (from 83 to 87%).

Application

Mode

DVD + CPU Energy (mWHrs)

Platform Energy (mWHrs)

DVD/CPU Save %

DVD App 1

No Save

1533.76

6618.76

87.57%

Max Save

1026.4

6039.41

DVD App 2

No Save

4227.35

10143.56

86.08%

Max Save

1959.59

7509.18

DVD App 3

No Save

1484.18

6202.04

83.08%

Max Save

1335.91

6023.57

Table 2: % Energy Savings Attributed to DVD & CPU

Recommendations

From the results of the studies performed, three guidelines emerge that can help save energy during DVD playback:

Buffering: The studies shown above indicate that the technique of buffering implemented by DVD Playback Application #1 reduces DVD power consumption by 70% and overall platform power consumption by about 10%, as compared to other techniques.

Minimize DVD drive use: It is always recommended to reduce DVD spin-up, spin-downs, and read accesses in order to save power.

Let the OS manage the CPU frequency: We do not recommend changing the CPU power scheme to run the processor at the highest available frequency. The Operating System will apply Intel SpeedStep Technology and automatically change the operating frequency as processing demand increases and bump up the frequency as needed.

Disk I/O[7]

This section summarizes the analysis of power characteristics of the disk during sequential/random reads and native command queuing, and provides an analysis on file fragmentation and disk thrashing. This section also provides guidelines on optimizing the power during disk I/O in various us age models along with the power impact. For additional details as well as sample code, see [Ref3].

Background

The analyses are based on the typical performance characteristics of hard disk drives (HDD) which is affected by RPM, seek time, rotational latency, and the sustainable transfer rate. Furthermore, the actual throughput of the system will also depend on the physical location of the data on the drive. Since the angular velocity of the disk is constant, more data can be read from the outermost perimeter of the disk than the inner perimeter in a single rotation.

When a read request is placed by an application, the disk may have to be spun-up first, the read/write head must be positioned at the appropriate sector, data is then read and optionally placed in OS file system cache, and then copied to the application buffer.

Table 3 shows the relative time (in milliseconds) involved in these operations, based on the theoretical specification of the SATA drive used. While the actual numbers will vary between different drives, this gives a relative idea on the times taken. (Platform specs: Intel® Core™ Duo 2.0GHz, Jamison Canyon* CRB, 2x512MB DDR2, 40GB SATA 5400RPM-2.5’’ mobile, Windows* XP-SP2). Figure 14 shows the average power consumed during idle, read/write, and spin-up.

Avg Seek Time (ms)

Avg Latency Time (ms)

Spindle Start Time (ms)

12

5.56

4000

Table 3: HDD Performance Data Figure 14: HDD Average Power Consumed

It’s clear from the data that disk spin-up takes the most time and consume the most power. In fact, the power profile is similar to that of the DVD player described in the previous section. Applications performing disk I/O should take this power profile into consideration and optimize for the power and performance of the hard disk.

Test Results

Five separate experiments were con ducted to assess better understand the energy usage of hard disk drives using various I/O methodologies:

  • Impact of block size on sequential reads
  • Effect of buffering during multimedia playback
  • Impact of file fragmentation
  • Impact of native command queuing on random reads
  • Disk I/O in multi-threaded code

The setup, results, and recommendations are described below. For complete details, see [reference].

Impact of Block Size on Sequential Reads

Hypothesis

When reading a large volume of sequential data, reading the data in larger chunks requires lower processor utilization and less energy

Setup

Observations

Recommendations

We created a large file (around 1GB) and read the entire file in blocks of various sizes. As a general rule of thumb for any disk I/O, we rebooted the system between runs to avoid any file-system cache interference.

We measured of CPU utilization and energy as we varied block size from 1 bit up to 64KB. As expected, the CPU utilization and the energy required dropped as the block size increased. The CPU utilization and energy usage leveled off with greater block sizes.

  • Use block sizes of 8KB or greater for improved performance

 

Buffering during Multimedia Playback

Hypothesis

For multimedia playback, reading ahead and caching media content will save energy

Setup

Observations

Recommendations

We compared the energy usage of reading/playing an MP3 file of (4MB) in two ways – reading in ~2KB chunks and reading/ buffering the entire file.

Similar to the DVD playback experiment, when reading the data in small chunks the hard disk remains active and consumes more power than if we read the entire file, access it from buffer, and let the HD go idle.

  • Utilize a buffering strategy in multimedia playback to minimize disk reads and save energy

 

Impact of File Fragmentation

Hypothesis

The performance and energy costs to read a fragmented file are greater than that of a contiguous file.

Setup

Observations

Recommendations

Store a 256MB file in fragmented and unfragmented states. Read the files and compare the results.

As expected the fragmented file took longer to read – over twice as long – ~26 seconds fragmented and ~11 seconds for the contiguous file. The energy savings were proportional.

  • Avoid by pre-allocating large sequential files when they are created, e.g. SetLength in .Net* framework
  • Use NtFsControlFile() to aid in defragmenting files
  • End users can defragment their volumes periodically

 

Impact of Native Command Queuing on Random Reads

Hypothesis

Effective use of asynchronous I/O with NCQ improves performance and saves energy.

Setup

Observations

Recommendations

We chose a random set of files and started reading at 64KB from a random offset of each file. We compared the use of synchronous I/O to asynchronous with NCQ.

When NCQ was utilized, the total time reduced by ~15% and there was a similar reduction in total energy for the task.

  • Applications that deal with random I/O or I/O operations with multiple files should use asynchronous I/O to take advantage of NCQ.
  • Queue up all the read requests and use events or callbacks to determine if the read requests are complete.

 

Disk I/O in Multi-threaded Code

Hypothesis

The performance and energy costs to read a fragmented file are greater than that of a contiguous file.

Setup

Observations

Recommendations

For this analysis, we developed a bitmap-to-JPEG algorithm based on the IJG library that converts a large set of BMPs to JPEGs. We created a serial version of the application and several multi-threaded versions:

  1. Two threads that split the files and work independently – competing for disk access
  2. Add a thread to coordinate the buffer read/writes and handle requests sequentially
  3. Use queued I/O in the coordinating thread to optimize read/writes

Thread solution #1 provided almost no performance improvement over the serial version due to I/O contention and thrashing. Solutions 2 and 3 (buffered I/O and queuing I/O) provided ~1.52x-1.56x scaling) – a significant performance gain. Solution 2 and 3 also yielded ~30% reduction in total energy cost over the serial version.

  • For multiple threads competing simultaneously for disk I/O, queue the I/O calls and utilize NCQ. Reordering may help optimize the requests, improve performance, and save energy.
  • When multiple threads competing for the disk causes signif icant disk thrashing, consolidate all the read/write operations in a single thread to reduce read/write head thrashing and reduce frequent disk spin-ups as well.

 

File Transfer over Wireless

Another opportunity to affect data efficiency is by investigating the power consumption of a laptop while transmitting compressed and non-compressed data over a wireless network to determine if there are more power efficient methods. In these experiments we focused on how the compression ratio or size of the file affects power consumption. We did not compare performance of various compression algorithms.

The goal of the research was to obtain data that would help answer questions such as:

  • For upload, is it better to compress the data before transmission or leave it uncompressed?
  • Is it better to compress a file before downloading?
  • How will the wireless adapter, CPU utilization, data compression ratio, and transmission time affect the laptop power consumption?

The general methodology used was to transmit compressed (using GZip 1.2.4) and uncompressed data with varying file sizes and measure platform power with a Fluke NetDAQ3 . For complete details of the methodology and setup, please see the white paper on this topic.[8] Note that this this power study examined only the client side (laptop) and did not include the server side.

Test Parameters

To achieve reproducible and consistent results, certain parameters were adopted as follows:

  1. To minimize noise and interference, a controlled network was used:
    • A wireless network was set-up in an isolated environment to reduce noise and interference
    • A dedicated, private network (via access point), with only a single client transmitting data
  2. Data sets: Different “txt” and “tif” files with varying compression ratios
  3. Compression algorithm:
    • Gzip 1.2.4* selected since it is open source and easy to customize.
    • Different compression algorithms can affect the compression ratio but the study only focused on the size of compression ratio rather than the algorithm. The differences between Gzip 1.2.4* and other compression algorithms beyond the scope of this study.
  4. Test runs were scaled to maintain workloads long enough (in duration) to minimize errors in platform power measurements. Power consumption per run is the average value of 100 iterations.

Data Sets and Test Procedure

The compression ratio of a given data set plays a significant role in determining whether to send/receive uncompressed data or to use compression before transmitting the data. Five different data sets were used, with the corresponding data size and compression ratios shown in Table 4. The compression ratios range from low (1.2x) to high (14.04x).

Data Set

Original size (KB)

Compression Rate Gzip 1.2.4

Description

Tulips.tif

1179

1.2x

Med size file, very low compression ratio

Book1

751

2.45x

Med size file, low compression ratio

World95.txt

2935

5.06x

Large size file, high compression ratio

Pic

502

8.96x

Small size file, high compression ratio

Frymire.tif

3708

14.04x

Large size file, very high compression ratio

Table 4: Data sets were from Jeff Gilchrist Archive Compression Test (ACT)* which are set of benchmarks for data compression. http://www.compression.ca/act/act-files.html*

To perform the tests, the test system (Intel® Core Duo/Customer Reference Board) was network mapped (via access point) to the server’s file system and the Windows™ XP internal “COPY” command was used to transfer the data from the client to the server and vice versa over the wireless network. To reduce the affect of anomalous events, each experiment was repeated 100 times while measuring the power. The final power number was divided 100 to determine the average energy used for that data set.

Each of the data sets was transmitted as follows:

  • Upload the uncompressed file to the server
  • Compress the file and upload the compressed file to the server
  • Download the uncompressed file from the server
  • Download the compressed file from the server and then uncompress on the client

Wireless Adapter Power Profile

Before looking into various case studies, let us look at the wireless adapter power profile. Table 5 lists the test platform’s power profile when wireless adapter is disabled, on with no connection, on and connected to an access point, and on but searching for signal.

Scenario

Average Platform Power (W)

Average WLAN Power (W)

Total

WLAN Radio Off

13.2

0

13.2

WLAN Radio On (no connection to AP)

14.1

0.35

14.45

WLAN Radio On (connected to AP)

14.2

0.45

14.65

WLAN Radio On (searching for AP)

15.7

1.6

17.3

Table 5: WLAN Adapter Average Power Consumption

The wireless adapter uses most power when actively seeking an access point (AP) although this is typically just a brief period of time. When the “radio is on” and the system is connected to the network but not transmitting any data, the average power consumption is ~450mW. While, when searching for AP, the power consumption is ~ 1600mW.

Observations

CPU Profile

We observed that the CPU utilization is high when compressing and uncompressing the data (99-100% when compressing and 84-100% when uncompressing). It drops to 4-7% when transmitting the data regardless if it is compressed or not. As expected the processor frequency goes to maximum (highest Performance Frequency State) when compressing and uncompressing. For transmitting the data over network the processor remains at a lower Performance Frequency state since the CPU utilization is low (4-7%)

Upload Power Consumption

In various runs, the total power consumption of uploading uncompressed data is compared with compressing and uploading the data. Figure 15 shows power consumption comparison for uploading uncompressed data vs. compressing and uploading the data. The secondary Y axis in Figure 15 plots corresponding compression ratio for the given data set. Note that the data sets with higher compression ratios (higher than 1.2x) show benefit for power consumption when compressing first and then uploading the data set. For the data set with the lowest compression ratio (1.2x in this case), uploading uncompressed data is more power efficient by a small amount.

Figure 15: Upload over WLAN Total Power Consumption (Energy)

Download Power Consumption

Similarly, the same data set is used for investigating the download power consumption. Figure 16 indicates the power consumption for downloading uncompressed data vs. downloading compressed data and then uncompressing it.

The graph in Figure 16 indicates average power consumption for each data set as well as the compression ratio on secondary Y axis. As indicated, for data sets with higher compression ratios, downloading compressed data and then uncompressing is more power efficient than downloading uncompressed data. For the data set with the lowest compression ratio (1.2x in this case), downloading uncompressed data is more power efficient. For the ‘Book1’ data set (compression ratio 2.45x) the power consumption of downloading uncompressed data vs. downloading compressed data and then uncompressing demonstrates minimal difference.

Figure 16: Download over WLAN Total Power Consumption (Energy)

Conclusion

The size of a data file being transferred over a wireless network directly affects the elapsed time to transfer and therefore the power consumption used not only for the wireless adapter but the entire platform. For improved data efficiency and energy efficiency, we recommend the following:

  • For data sets with higher compression ratios (more than 3.0x), uploading/downloading compressed data provides better power savings as compared to transmitting uncompressed data. It is beneficial for applications to transmit compressed data for data sets having higher compression ratios.
  • For data sets with lower compression ratios (~1.2x in this case which is hardly compressed), compressing the data before uploading/ decompressing after download adds extra overhead. We recommend uploading/downloading uncompressed data in these cases.
  • For data sets with compression ratio around 2.5-3.0x, there is a minimal difference in the power saving when uploading/downloading compressed data vs. uncompressed data.

 


[1] It is interesting to note that this is not always true. Due to the quadratic relationship between processor states and voltage, it can be demonstrated that a process running for a longer time at a lower P-state may actually use less total energy t han running the same process at a high P-state for less time. This is an area of future research.

[2] More detailed coverage of this topic can be found at: EPA & Intel-Advancing Sustainability[3] NetDAQ* Networked Data Acquisition Unit

 

[4] GV3is a Microsoft hotfix (KB896256) to change the kernel power manager to track CPU utilization across the entire package instead of individual cores. It resolves an issue the power manager had with incorrectly calculate the optimal target performance state for the processor when one core was much less busy than the others. The  performance state was set too low and performance suffered in adaptive mode.

[5] More detail on this study can be obtained from: DVD Playback Power Consumption Analysis

[6] For details on MobileMark 2005, see: http://www.bapco.com/

[7] More details from this analysis are available at: Power Analysis of Disk I/O Methodologies

[8] More detailed coverage of this topic can be found at: Data Transfer over Wireless LAN Power Consumption Analysis

[9] For complete details of this study, please see: Enabling Games for Power

[10] For details on Extech Power Analyzers, see: http://www.extech.com/instrument/products/310_399/380803Power.html

[11] See: http://LinuxPowerTOP.org

 

Prev 13 4 Next

Page 2 of 4
For more complete information about compiler optimizations, see our Optimization Notice.

Comments

sriram.venkat's picture

Nice paper! Would be interested to know , how many of the business applications out there are energy efficient?