<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Sun, 12 Feb 2012 03:51:24 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/mobility/type/technical-article/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network articles Feed</title>
    <link>http://software.intel.com/en-us/articles/mobility/type/technical-article/</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>Ultrabook™ and the Intel® Energy Checker SDK</title>
      <description><![CDATA[ <h2 class="sectionHeading">Abstract</h2>
With the advent of the Ultrabook™<sup>1</sup>, the demand for applications that are power misers continues to rise. The Intel® Energy Checker SDK can be used to instrument an application and collect data to help a developer pinpoint power hungry features that can be optimized for power. This article gives an overview of the Intel Energy Checker SDK and discusses how it can be used to advantage when improving energy usage on an Ultrabook.<br /><br />
<h2 class="sectionHeading">More Work, Less Power</h2>
An Ultrabook™ needs to budget its power consumption very carefully to extend usefulness while running on battery. Therefore, applications that use less energy are preferred. Often, application developers create their program on a desktop system where power/energy consumption is less important than raw performance. Not only should applications be developed to conserve power when active, they should also be developed to minimize energy usage during program idle periods, this is often overlooked and can greatly extend battery life. If power issues are ignored, running a program on an Ultrabook will result in unpleasant surprises for the user. If developers test their application on an Ultrabook system during development, they will gain insight into how well the program runs in a power limited environment. An analysis tool such as the <a href="http://software.intel.com/en-us/articles/intel-energy-checker-sdk/">Intel® Energy Checker SDK</a> can be a powerful companion during the optimization phase for software designed for an Ultrabook.<br /><br />
<h2 class="sectionHeading">Energy Efficency</h2>
Before explaining what Intel Energy Checker SDK contains, a discussion on Energy Efficiency (EE) is in order. This is a term that is used extensively in the Intel Energy Checker SDK. There is no universally accepted definition of EE, so for the purposes of this tool it is defined as:<br />
<p ><em>EE=Work/Energy</em></p>
<em>Work</em> is defined as the amount of “<em>useful work</em>” done by a software application. There is no concise, easy definition of the term <em>useful work</em> either, as what is considered <em>useful work</em> in one program may be quite different in another application. The developer is required to make that determination. For example, one might consider the areas of a movie player program where it provides the customer value (such as decoding the movie) as useful work whereas areas of the program that are accessing resources, waiting on input, or performing synchronization would not.<br /><br />
<h2 class="sectionHeading">Code Instrumentation</h2>
The first step in using Intel Energy Checker SDK to help determine an application’s EE is to create and use “counters” in the software to determine quantities of “useful work”. A counter is defined as a 64-bit (8 byte) variable that keeps a running total of how many times a particular event occurs. In the “C” language, this becomes an unsigned long long data type. A developer can create one or more counters during the initialization portion of the software. Next, a container for the counters can be created, called a “Productivity Link” (PL)<sup>2</sup>. Each PL holds up to 512 counters, and up to 10 different PL’s can be open at one time, but most software will require far smaller numbers of counters and PL’s.<br /><br />During the application runtime, values can be written to any counter in the PL, based on the developer’s requirements. Intel Energy Checker SDK can collect the information from the PL’s in order to determine how much work was done.<br /><br />
<h2 class="sectionHeading">Energy Consumed</h2>
The second part of finding the EE of a software application is to measure how much energy was consumed while the program was running. To do this, Intel Energy Checker SDK uses two tools which are included in the SDK download: Energy Server (ESRV) and Temperature Server (TSRV). ESRV is used to monitor energy and power consumption as reported by external power tools while TSRV monitors temperature related information as reported by environmental probes. ESRV and TSRV counters can be accessed by any program using the Intel Energy Checker API. In addition to the counters created by the developer to determine quantities of work, the developer will want to add counters to collect information from ESRV and possibly TSRV. There are three different ways to set up ESRV:<br /><br /><ol>
<li>Use a power meter to collect actual “platform energy and power” information.<br /><br />There are several different power meters that work with the Intel Energy Checker SDK. Please consult the <em>Intel® Energy Checker SDK User Guide</em> included in the download or found on the <a href="http://software.intel.com/en-us/articles/intel-energy-checker-sdk/">Intel® Energy Checker SDK page</a> to determine which power meters will work and how they should be attached to the test system.<br /></li>
<li>Use <a href="http://software.intel.com/en-us/articles/intel-power-gadget/">Intel® Power Gadget</a> to collect “processor energy and power” usage information on 2nd Generation Intel Core™ processor family. External power meters can also be used which report platform power together with Intel Power Gadget that provides processor power.The blog Accessing Intel® Power Gadget From Intel® Energy Checker SDK by Intel engineer Jun De Vega discusses how to enable Intel® Power Gadget with Intel® Energy Checker.<br /></li>
<li>Choose to use the simulation method which will use the CPU utilization percentage returned from the OS. This method does not require a hardware probe. The Intel Energy Checker SDK offers this method as an option for all processors (rather than just the 2nd Generation Intel Core processor family as with the Intel Power Gadget) in order for enable the user who does not have a power meter. Included in the SDK is a support library for accessing this metric.</li>
</ol>
<p ><img src="http://software.intel.com/file/41168" /><br /><br /><strong>Figure 1:</strong> Conceptualized drawing of Intel Energy Checker setup with Instrumented Application, Power Meter and Environmental probes attached</p>
<h2 class="sectionHeading">Intel Energy Checker Extras</h2>
There are two companion tools that are bundled with the Intel Energy Checker SDK in addition to those already mentioned. The PL GUI Monitor is a user interface that displays Productivity Link (PL) counters in a running program that has already been instrumented with the Intel Energy Checker API. The PL CSV Logger<sup>3</sup> is an application that can collect and write PL counters to a CSV file for later analysis in a variety of spreadsheet applications.<br /><br />Included with the Intel Energy Checker SDK is the <em>Intel® Energy Checker SDK Companion Application User Guide</em> that discusses the features and capabilities of both of these tools.<br /><br />
<p ><img src="http://software.intel.com/file/41169" /><br /><br /><strong>Figure 2:</strong> PL GUI Monitor running while a picture is being rendered</p>
The entire Intel Energy Checker SDK includes other build, scripting, interoperability, and monitoring tools to help developers instrument code and collect energy metrics.<br /><br />A white paper entitled “<em>How Green Is Your Software?</em>” is available for download from the SDK site. This paper discusses approaches for making software power efficient. Look for it in the “Code, Resources and Documentation” section of the <a href="http://software.intel.com/en-us/articles/intel-energy-checker-sdk/">Intel Energy Checker SDK page</a>. Several blogs about Intel Energy Checker that were written by Intel Engineer Jamel Tayeb will also be helpful:<br /><br /><a href="http://software.intel.com/en-us/blogs/2010/04/15/using-the-intel-energy-checker-sdk-at-home/?wapkw=(Energy+Checker)">Using the Intel® Energy Checker SDK at Home</a><br /><br /><a href="http://software.intel.com/en-us/blogs/2010/02/19/creating-a-simple-device-library-for-intel-energy-checker-sdk/?wapkw=(Energy+Checker)">Creating a Simple Device Library for Intel® Energy Checker SDK</a><br /><br /><a href="http://software.intel.com/en-us/blogs/2010/03/30/measuring-the-energy-consumed-by-a-command-using-the-intel-energy-checker-sdk/?wapkw=(Energy+Checker)">Measuring the energy consumed by a command using the Intel® Energy Checker SDK</a><br /><br />All of these resources allow a developer to get started in gathering helpful information.<br /><br />
<h2 class="sectionHeading">Optimizing Applications for Ultrabooks</h2>
Once a program has been instrumented to collect counter information and an energy collection plan is in place (either simulation or power meter), the setup is complete. The developer will then be able to gather information about the application’s energy usage profile and to incorporate optimizations to improve results.<br /><br />There are several areas of optimization the Ultrabook developer can select for improvements:<br /><br />
<div >Consider modifying the application to be aware of the power status and changing usage to reduce energy consumption when the system is on battery.<br /><br />Check the hardware and software system power management possibilities to choose a balanced power setting. This could be a recommended setting suggested in application documentation.<br /><br />Reduce power usage while the application is actively running or doing work. Compute intensive parts of the program will likely benefit from multi-threading and vectorization techniques.<br /><br />Reduce power usage while the application is idle. Being able to minimize the timer tick rate or setting up periodic actions to happen within the same wakeup period are examples of how to reduce idle application power usage.</div>
<br /><br />
<h2 class="sectionHeading">Summary</h2>
With the growth of Ultrabook devices, it will benefit program designers and developers to take a look at ways to save energy while providing a great user experience on an Ultrabook. Intel Energy Checker SDK can provide the means to identify the key areas of focus and confirm the positive results achieved after optimization. Long live Ultrabook!<br /><br />
<h2 class="sectionHeading">About the Author</h2>
<img src="http://software.intel.com/file/41170"  /> Judy Hartley is a Software Applications Engineer who has been working in the Software and Services Group since 2005. She has contributed to many software products and written about her experiences through blogs and whitepapers. Recently Judy has been working on Graphics and Power tools and training for future Intel processors.<br /><br  />
<hr />
<br /><sup>1</sup> Ultrabook is a trademark of Intel Corporation in the U.S. and/or other countries.<br /><br /><sup>2</sup> A Productivity Link is a term used by Intel Energy Checker to represent an arbitrary or logical collection of counters.<br /><br /><sup>3</sup> CSV is the acronym for Comma Separated Values.<br /><br /> ]]></description>
      <link>http://software.intel.com/en-us/articles/ultrabook-and-the-intel-energy-checker-sdk/</link>
      <pubDate>Tue, 24 Jan 2012 00:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/ultrabook-and-the-intel-energy-checker-sdk/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/ultrabook-and-the-intel-energy-checker-sdk/</guid>
      <category>Mobility</category>
      <category>What If Experimental Software</category>
      <category>Tools</category>
      <category>Intel Software Network communities</category>
      <category>Intel SW Partner program</category>
      <category>Code &amp; Downloads</category>
      <category>Power Efficiency</category>
      <category>Resources For Software Developers</category>
      <category>Ultrabook</category>
    </item>
    <item>
      <title>How to build ffmpeg to run under Moblin 2</title>
      <description><![CDATA[ <div class="Section1">
<p class="MsoNormal"><span class="sectionBodyText"><br />The application ‘ffmpeg’ consist of three executables and 5 libraries. All the sources can be downloaded using the following svn command:</span></p>
<p class="MsoNormal"><span class="sectionBodyText"></span></p>
<p class="MsoNormal"><span class="sectionBodyText"></span></p>
<p class="MsoNormal" ><span >Ø<span >      </span></span><span class="sectionBodyText">svn checkout svn://svn.ffmpeg.org/ffmpeg/trunk ffmpeg </span></p>
<p class="MsoNormal"><span > </span><strong class="sectionHeading">Required patch</strong></p>
<p class="MsoNormal"><span class="sectionBodyText">One version which I tried ( download: svn checkout svn://svn.ffmpeg.org/ffmpeg/trunk ffmpeg –r 17944)</span></p>
<p class="sectionBodyText">has a known problem with the sources. If you try to build the code, then you will get a compile time error:</p>
<p class="MsoNormal"><span class="sectionBodyText"> </span><span class="sectionBodyText">libswscale/swscale.c:488: error: ‘PIX_FMT_YUV420PLE’ undeclared</span></p>
<p class="MsoNormal"><span class="sectionBodyText"></span></p>
<p class="MsoNormal"><span class="sectionBodyText">The solution is solved by copying the contents of the libswscale directory from .</span><span ><span class="sectionBodyText">  </span> <a href="http://www.ffmpeg.org/releases/ffmpeg-0.5.tar.bz2" title="http://www.ffmpeg.org/releases/ffmpeg-0.5.tar.bz2" >http://www.ffmpeg.org/releases/ffmpeg-0.5.tar.bz2</a>    (see </span><span lang="EN" ><a href="http://www.ffmpeg.org/download.html" >http://www.ffmpeg.org/download.html</a>)</span></p>
<p class="MsoNormal"><span class="sectionBodyText"></span></p>
<p class="MsoNormal"><span class="sectionBodyText">Please Note, <b><i>only</i></b> the libswscale directory should be copied <b><i>over the top</i></b> of the existing files.</span></p>
<h3> </h3>
<h3><span class="sectionHeading" lang="FR">Build Instructions</span></h3>
<p class="MsoNormal"><span class="sectionBodyText"></span></p>
<p class="MsoNormal"><span class="sectionBodyText">Following instructions will help you build ffmpeg under a Linux system assuming that the tar ball from the download is copied to &lt;install_directory&gt;:<br /></span><br /> Ø      cd  &lt;install_directory&gt;<br /><br /> Ø      cd tar xvfz  ffmpeg.tar.gz   #Extract the tar ball to your local directory. You will find a directory tree under the name ffmpeg installed. All the required sources are installed within this tree provided that you are not enabling any additional third party libraries.<br /><br /> Ø      cd ffmpeg</p>
<p class="sectionBodyText"> Ø      ./configure –help &lt;cr&gt; # to show all the options available to rebuild ffmpeg. For a reference build with <b>gcc</b> there is no need to provide any parameters to configure. However if you would like to enable any third party libraries like ‘libfaad’, ‘libx264’ etc. then you need to add --enable-libfaad --enable-libx264 etc. as parameters to configure. If this is required make sure that you have these sources available on your development environment and that the third party libraries are rebuilt with the selected compiler / compiler options.</p>
<p class="sectionBodyText"> Ø      ./configure &lt;parameters as below&gt;</p>
<p class="MsoNormal"><span class="sectionBodyText"> </span>Ø      make &lt;cr&gt; #recommended sequence is ‘make clean’ followed by ‘make’. The rebuild will take several minutes (depending on your development environment). Two versions of each executable are produced: ex. ‘ffmpeg’ and ‘ffmpeg_g’. The ‘*_g’ contains the executable with debug information while the ‘*’ is the stripped version. In addition ‘ffplay, ffplay_g, ffserver and ffserver_g’ are produced as executables and libavutil, libavcodec, libavformat, libavdevice and libswscale are rebuilt.</p>
<p class="sectionHeadingText"> <b><span >Icc build</span></b>:</p>
<p class="MsoNormal"><span class="sectionBodyText">Depending on what version of icc you are using different configure parameters may be necessary.</span></p>
<p class="MsoNormal"><span class="sectionBodyText"> </span><span class="sectionBodyText">Using icc v10.1.xxx:</span></p>
<p class="MsoNormal"><span class="sectionBodyText"> </span>Ø      ./configure cc=icc --extra-cflags=”-xL –O3” --extra-libs=-lsvml &lt;cr&gt; # Without linking in the extra library (libsvml.so) you get several references unresolved.</p>
<p class="MsoNormal"><span class="sectionBodyText"> </span><span class="sectionBodyText">Using icc v11.0.xxx:</span></p>
<p class="MsoNormal"><span class="sectionBodyText"> </span>Ø      ./configure cc=icc –extra-cflags=”-xSSE3_ATOM –O3” –extra-libs=-lsvml &lt;cr&gt; #  The option –xSSE3_ATOM do require that you run the code on a processor which does support the ‘movbe’ instruction. If you would like to test the code on a processor not supporting the ‘movbe’ instruction you can add the option ‘-minstruction=nomovbe’ in the extra-cflag part of the command line above.<br />Ø      Make sure that the LD (linker) options do not contain ‘–march=generic’ in the config.mak file. This option causes an error from the compiler.<br /><span class="sectionBodyText"><br />Using icc v11.1.xxx:<br /></span><br />Ø      To avoid a run time erratum following source change is needed:   Open file ./libavcodec/x86/dsputil_mmx.c and on line #2944 insert // before || __ICC &gt; 1100. The line is highlighted below.</p>
<p class="MsoNormal" ><span ><v:shapetype coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f" id="_x0000_t75"><v:stroke join></v:stroke><v:formulas><v:f eqn="if lineDrawn pixelLineWidth 0"></v:f><v:f eqn="sum @0 1 0"></v:f><v:f eqn="sum 0 0 @1"></v:f><v:f eqn="prod @2 1 2"></v:f><v:f eqn="prod @3 21600 pixelWidth"></v:f><v:f eqn="prod @3 21600 pixelHeight"></v:f><v:f eqn="sum @0 0 1"></v:f><v:f eqn="prod @6 1 2"></v:f><v:f eqn="prod @7 21600 pixelWidth"></v:f><v:f eqn="sum @8 21600 0"></v:f><v:f eqn="prod @7 21600 pixelHeight"></v:f><v:f eqn="sum @10 21600 0"></v:f></v:formulas><v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"></v:path><o:lock v:ext="edit" aspectratio="t"></o:lock></v:shapetype><v:shape type="#_x0000_t75" id="_x0000_s1025" ><v:imagedata src="new_page_21_files/image001.png"><img height="151" width="576" src="http://software.intel.com/file/22666" /></v:imagedata></v:shape></span><br /><br />Ø      After the change it will look like: #if ARCH_X86_64 || ! ( __ICC) // || __ICC &gt; 1100</p>
<p class="MsoNormal" ><span class="sectionBodyText"></span></p>
<p class="sectionBodyText" >Ø      ./configure cc=icc –extra-cflags=”-xSSE3_ATOM –O3” &lt;cr&gt; # This simple configure does work but you get better performance if you also add the ‘--extra-lib=-lsvml’. Additional compiler switches can be used to improve the performance – for example ‘-no-prec-div’, ‘-vec-‘. As the no-prec-div has an effect on the fp calculations please refer to the compiler documentation before it is used. In addition the ‘-xSSE3_ATOM’ switch are no longer requiring a processor with ‘movbe’ support.</p>
<p class="sectionHeadingText"> Cross compilation.</p>
<p class="MsoNormal"><span class="sectionBodyText">It is standard practice to build the library and executables on a development machine and then copy the resultant files to the MOBLIN target.  You can use the option --enable–cross-compile in this case.</span></p>
<p class="MsoNormal"><span class="sectionBodyText"></span></p>
<p class="MsoNormal" ><span class="sectionBodyText"></span></p>
<p class="MsoNormal"><span class="sectionBodyText">Once you have completed the configure stage you can just run the ‘make clean’ followed by ‘make’. You could expect additional warnings produced.</span></p>
<p class="MsoNormal"><span class="sectionHeading"></span></p>
<p class="MsoNormal"><b><span ><span class="sectionHeading">Building ffmpeg with third party libraries:</span></span></b></p>
<p class="MsoNormal"><b><span ><span class="sectionHeading"></span></span></b></p>
<p class="MsoNormal"><span class="sectionBodyText">The application can support 21 different external 3<sup>rd</sup> party libraries. Each of those are selected in the configure process by adding the option ‘--enable-&lt;library-name&gt;’. </span></p>
<p class="MsoNormal"><span class="sectionBodyText">Below is an example of using some of the different external libraries available:</span></p>
<p class="MsoNormal"><span class="sectionBodyText"> </span>Ø      ./configure &lt;other options as above&gt; --enable-libfaac --enable-libfaad --enable-libmp3lame --enable-libtheora --enable-libx264 --enable-libxvid<span class="sectionBodyText"> </span></p>
<p class="MsoNormal"><span class="sectionBodyText">The invocation above assumes that you will be using 6 external 3<sup>rd</sup> party libraries:</span></p>
<p class="MsoNormal"><span class="sectionBodyText"> </span><span class="sectionBodyText">- libfaac       # for example you can download ‘faac-1.28.tar.gz’ or later version from the web.</span></p>
<p class="MsoNormal"><span class="sectionBodyText"> </span><span class="sectionBodyText">- libfaad       # for example you can download ‘faad2-2.7.tar.gz’ or later version from the web.</span></p>
<p class="MsoNormal"><span class="sectionBodyText"> </span><span class="sectionBodyText">- libmp3lame # for example you can download ‘lame-398-2.tar.gz’ or later version from the web.</span></p>
<p class="MsoNormal"><span class="sectionBodyText"> </span><span class="sectionBodyText">- libtheora    # for example you can download ‘libtheora-1.0.tar.tar’ or later version from the web. Please note that you can not use ICC v 10.1 to rebuild this library due to a compiler issue. This has been fixed with 11.1 version of the compiler.</span></p>
<p class="MsoNormal"><span class="sectionBodyText"> </span><span class="sectionBodyText">- libx264      # for example you can download ‘x264-snapshot-20090117-2245.tar.bz2’ or later version from the web.</span></p>
<p class="MsoNormal"><span class="sectionBodyText"> </span><span class="sectionBodyText">- libxvid      # for example you can download ‘xvidcore-1.2.1.tar.gz’ or later version from the web.<br /></span><span class="sectionBodyText"><br /><strong>Performance Notes:</strong> <br /><br />Best option combination for ICC: “-xSSE3_ATOM –O3 –vec- -static –no-prec-div –ansi_alias”. Also make sure you add –extra-libs=-lsvml in your configure invocation.<br /><br />Two ways of improving the performance of ffmpeg:<br /><br />1) Use the external 3rd party libraries which are supported – 20+ are available. They are optimized for their specific field. You need to select the suitable library – download the code and build them using the ICC. <br /><br />2) Use PGO. Here it is important to produce a set of .dyn files for the most common conversion you will be using. The code size improves a bit and the performance also gets a boost provided that you use any of the conversions for which you generated a .dyn file. If you select a totally different conversion you actually could get a slower performance compared to both gcc and icc (general optimized version). For those who are not familiar with the PGO optimization this is a three step procedure:<br /><br />a. Add the option –prof-gen to the extra-cflags section for the configure generation. Build ffmpeg. <br /><br />b. Run the resulting binary on your target. Make sure you use representative input files and output formats. You may want to repeat this process several times. For each run you will have a .dyn file generated on the target. Copy these files over to your development system.<br /><br />c. Now use the options –prof-use and –prof-dir &lt;directory path where you have the .dyn files which you copied in step b&gt; [make sure to remove –prof-gen option]. Generate a new make file with configure and build your ffmpeg. You may see a number of warnings ‘missing .dpi information for &lt;file name.’ This is information only and can be ignored. The finally generated ffmpeg should be both smaller and faster. <br /><br /><br />
<table cellpadding="5" cellspacing="0" rules="none" border="1">
<tbody>
<tr>
<th align="left" valign="middle" >Optimization Notice</th>
</tr>
<tr bgcolor="#ccecff">
<td>
<p>Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.</p>
<p align="right">Notice revision #20110804</p>
</td>
</tr>
</tbody>
</table>
<br /><br /><br /><br /><br /></span></p>
</div> ]]></description>
      <link>http://software.intel.com/en-us/articles/how-to-build-ffmpeg-to-run-under-moblin-2/</link>
      <pubDate>Wed, 23 Sep 2009 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/how-to-build-ffmpeg-to-run-under-moblin-2/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/how-to-build-ffmpeg-to-run-under-moblin-2/</guid>
      <category>Mobility</category>
      <category>Open Source</category>
      <category>Tools</category>
      <category>Intel® AppUp(SM) Developer Community</category>
      <category>MID</category>
      <category>Intel® Software Development Tool Suites for Intel® Atom™ Processor Knowledge Base</category>
    </item>
    <item>
      <title>Building the Moblin Clutter Library with the Intel Compiler</title>
      <description><![CDATA[ <p><b>Building the Moblin Clutter Library with the Intel Compiler</b></p>
<p><b>Introduction</b></p>
<p><b>Build environment</b></p>
<p>The build instructions below are based on the assumption that the build environment is a Fedora10 installation. </p>
<p>A copy of the installation disks can be found here. <a href="http://fedoraproject.org/get-fedora">http://fedoraproject.org/get-fedora</a></p>
<p>It is important that the GCC tools are installed when the Fedora installation is installed.</p>
<p><b>Cross-development</b></p>
<p>The clutter libraries  and test environment  are first built on a Fedora installation, and then the resultant executables and libraries can then be copied over to a MOBLIN2 platform.</p>
<p><b>Virtualisation</b></p>
<p>The steps below can be undertaken on a virtual machine, the only restriction being that not all virtual machines support video acceleration  </p>
<p><b>The tests</b></p>
<p>There are a number of tests that can be used to benchmark the clutter library. The benchmark performance is measured by how many Frames Per Second (FPS) are achieved.<b><br clear="all" /></b></p>
<p><b>Building the Clutter Libraries</b></p>
<p><b>Downloading the sources</b></p>
<p>The sources for the clutter library can be obtained by using the following command.</p>
<p>In a development directory of your choosing download the sources</p>
<p>   git clone git://git.clutter-project.org/clutter</p>
<p>   git clone git://git.clutter-project.org/clutter-box2d</p>
<p>Progress for each download should be reported similar the following:</p>
<p> </p>
<p>$ git clone git://git.clutter-project.org/clutter<br /><br />Initialized empty Git repository in /home/intel/dv/clutter/clutter/.git/<br />remote: Counting objects: 25225, done.<br />remote: Compressing objects: 100% (10261/10261), done.<br />remote: Total 25225 (delta 20575), reused 18319 (delta 14944)<br />Receiving objects: 100% (25225/25225), 6.99 MiB | 104 KiB/s, done.<br />Resolving deltas: 100% (20575/20575), done.</p>
<p><b>Building the Sources</b></p>
<p>Directory structure should look similar to this:</p>
<p>             &lt;dev-dir&gt;/clutter/<br />            &lt;dev-dir&gt;/clutter-box2d</p>
<p> </p>
<p>In each of these sub directories build the libraries as follows:</p>
<p><b>Choosing which compiler </b></p>
<p><b>Building with GCC</b></p>
<p>For our installation, we'll use the environment variable  $PREFIX custom directory so as not to override any existing installation</p>
<p>     export PREFIX=/opt/custom/gcc<br />    ./autogen.sh --prefix=$PREFIX</p>
<p><b>Building with ICC</b></p>
<p>To build the library with the Intel compiler use the following commands</p>
<p>    export CC=icc<br />    export CXX=icc</p>
<p>   export PREFIX=/opt/custom/icc<br />  ./autogen.sh --prefix=$PREFIX</p>
<p><b>Continuing the build</b></p>
<p>When autogen has completed you should get a message similar to the display below </p>
<p>  <br />                                  Clutter    0.9.7<br />                         ====================<br />                                  prefix:   /opt/custom/gcc<br />                                Flavour:   glx/gl<br />                                 XInput:   no<br />                          GL headers:   GL/gl.h<br />                    Image backend:   gdk-pixbuf<br />                       Target library:   libclutter-glx-0.9.la<br />               Clutter debug level:   yes<br />                 COGL debug level:   minimum<br />                      Compiler flags:   -Wall -Wshadow -Wcast-align -Wno-uninitialized -Wno-strict-aliasing -Wempty-body -Wformat-security -Winit-self<br />        Build API documentation:   no<br />  Build manual documentation:   no<br />         Build introspection data:   auto</p>
<p>  </p>
<p>Now build the source by calling make</p>
<p>The progress of the make will be reported, the last few lines looking similar to this:</p>
<p>   Making all in tools<br />     CC    disable-npots.o<br />     LINK  libdisable-npots-static.la<br />     LINK  libdisable-npots.la <br />   Making all in po</p>
<p><b>Installing the new library</b></p>
<p>To install the new library call:</p>
<p>make install</p>
<p>Note, depending on the permissions of the installation directory (set with the $PREFIX variable in the previous steps) you may need to do this as root.</p>
<p> </p>
<p><b>Building the tests</b></p>
<p>The test directory has a number of tests. The README describes the tests as follows </p>
<p><i>"The conform/ tests should be non-interactive unit-tests that verify a single feature is behaving as documented. See conform/ADDING_NEW_TESTS for more details.</i></p>
<p><i> </i><i>The micro-bench/ tests should be focused performance test, ideally testing a single metric. Please never forget that these tests are synthetic and if you are using them then you understand what metric is being tested. They probably don't reflect any real world application loads and the intention is that you use these tests once you have already determined the crux of your problem and need focused feedback that your changes are indeed improving matters. There is no exit status requirements for these tests, but they should give clear feedback as to their performance. If the frame rate is the feedback metric, then the test should forcibly enable FPS debugging.</i></p>
<p><i> </i><i>The interactive/ tests are any tests who's  status can not be determined without a user looking at some visual output, or providing some manual input etc. This covers most of the original Clutter tests. Ideally some of these tests will be migrated into the conformance/ directory so they can be used in automated nightly tests."</i></p>
<p>To build the tests, from the top level of the test directory do: </p>
<p>    make</p>
<p>The build will report:</p>
<p>    Making all in data <br />    Making all in conform<br />    Making all in interactive<br />    Making all in micro-bench<br />    Making all in tools</p>
<p><b>Running the tests</b></p>
<p>Having built the tests, each test can be run from the command line.</p>
<p><b>Automatic running of  tests.</b></p>
<p>Many of the tests do not run to completion, but keep running,  The script in the appendix below gives an example of how the tests can be automatically called and killed after a predefined time.</p>
<b><br clear="all" /></b>
<p><b>Appendix 1 -  Script to drive test cases</b></p>
<p>#!/usr/bin/perl<br /># list of tests <b>TODO: EDIT THESE TO YOUR REQUIREMENTS<br /></b>@Tests=("test-actors", "test-behave", "test-clip","test-cogl-offscreen","test-cogl-primitives","test-cogl-tex-convert","test-cogl-tex-foreign","test-cogl-tex-getset","test-cogl-tex-polygontest","test-cogl-tex-tile","test-depth","test-layout","test-multistage","test-pixmap","test-project","test-random-text","test-rotate","test-scale","test-script","test-sharder","test-text","test-texture-quality","test-textures","test-threads","test-unproject","test-viewport");<br /><br />#@Tests=("test-actors", "test-behave");<br />@Results=();<br /><br /># for each test run it for 10 seconds<br />my $Timeout = 10;<br />## chdir interactive;<br />my $bStarted = 0;<br /><br />foreach my $Test (@Tests)<br />{<br />    # print "executing $Test\n ";<br />    eval {<br />        local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required<br />        alarm $Timeout;<br />        # system("./$Test --clutter-show-fps &gt; res.$Test.txt");<br /><br />        # mark end of prev test<br />        push @Results, "END" if $bStarted;<br />        push @Results, "TEST:$Test\n-----------------------\n";<br />        $bStarted = 1;<br />        open (RUN,"./$Test --clutter-show-fps &amp;|");<br />        while(&lt;RUN&gt;)<br />        {<br />            push @Results, $_;<br />        }<br />        alarm 0;<br />    };<br />    if ($@) <br />    {<br />        die unless $@ eq "alarm\n"; # propagate unexpected errors<br />        # timed out<br />        system("killall -9 lt-test-interactive");<br />     }<br />}<br /># mark the end of the last test<br />push @Results, "END" if $bStarted;<br /><br /># process the results<br />my $NumTests = 0;<br />my $Total = 0;<br />my $Max = 0;<br />my $Min = 9999;<br />my $TestName="";<br /><br />print "\nName Min Max NumTests Average\n";<br />foreach my $Line (@Results)<br />{<br />    if($Line=~/^TEST:(.*)/)<br />    {<br />        # print "IN TEST $Line";<br />        $TestName= $1;<br />        chomp $TestName;<br />        $NumTests = 0;<br />        $Total = 0;<br />        $Max = 0;<br />        $Min = 9999;<br />    }<br /><br />    if($Line=~/END/)<br />    {<br />        # print "IN END :$Line";<br />        $Average = 0;<br />        $Average = $Total/$NumTests if $NumTests &gt; 0;<br />        print "$TestName $Min $Max $NumTests $Average\n";<br />    }<br /><br />    if($Line=~/\*\*\* FPS: (.*) \*\*\*/)<br />    {<br />        # print "IN FPS: $Line";<br />        $NumTests++;<br />        $Total = $Total + $1;<br />        $Max = $1 if $1 &gt; $Max;<br />        $Min = $1 if $1 &lt; $Min;<br />    }<br />}<br /><br />print "exiting ..\n";<br /> </p>
<p> </p>
<b><br clear="all" /></b>
<p><b> Appendix 2 - Essential notes on installing Fedora  </b></p>
<p>When installing Fedora you must install the Software Development tools.</p>
<p> </p>
<p> </p> ]]></description>
      <link>http://software.intel.com/en-us/articles/building-the-moblin-clutter-library-with-the-intel-compiler/</link>
      <pubDate>Sun, 14 Jun 2009 16:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/building-the-moblin-clutter-library-with-the-intel-compiler/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/building-the-moblin-clutter-library-with-the-intel-compiler/</guid>
      <category>Mobility</category>
      <category>Tools</category>
      <category>MID</category>
      <category>Intel® Compilers</category>
      <category>Intel® Software Development Tool Suites for Intel® Atom™ Processor Knowledge Base</category>
    </item>
    <item>
      <title>Developing for Mobile Internet Devices, Part 2: Designing, Coding and Testing a Twitter* Location-Based Application</title>
      <description><![CDATA[ By Paul Ferrill<br /><br />In<strong> </strong><a href="http://software.intel.com/en-us/articles/developing-for-mobile-internet-devices-part-1-tools-choices-and-development-environment-configuration/"><span ><strong>part 1</strong></span></a> of this series, I showed you what it takes to build applications for the Mobile Internet Device (MID) platform. I covered the basic issues of setting up a development workstation, choosing a programming language, and making good choices in terms of external libraries. I also discussed issues like device emulation, the importance of working through a prototype “hello world” application, and common problems to avoid.<br /><br />This installment gets down to the business at hand and walks through the process of designing and coding a working application. I chose a Twitter* application that will report the user’s current location as a way to exercise as many of the MID features as possible. Although it’s a simple application in terms of what it does, it has many of the parts you’d need to build any general-purpose user application.
<p class="sectionHeading"><br />Design Choices</p>
<p>Many of the functional parts of this application are foundational to everyday applications, requiring features such as user authentication, configuration storage and retrieval, network communications, and user interaction. It’s important to modularize the design as much as possible to both break the overall project down into manageable pieces and to make it easier to reuse the code.</p>
<p>The application checks each time it runs to see whether any user credential information has been entered. If not, it presents a typical user name/password login dialog box and stores the information in an encrypted local configuration file. A preferences dialog box makes it possible to change the user name and password after initial setup along with other program options.</p>
<p>For the initial release of this program, I assume that there is an active network connection. This is a bit of a stretch for the Compal JAX10* MID I tested on: It did not have an active 3G radio, so there was no “always-on” Internet connection. I was able to simulate having a consistent connection by using Wi-Fi, instead. Later versions might check for connectivity and, if currently disconnected, queue up the message to send the next time a connection to the Internet is available.</p>
<p>Another design consideration for the mobile form factor is deciding what happens when a button is pressed. With this particular application, there is a consequence for choosing single-key action; every time the user presses one of the action buttons, a message is sent to Twitter. If you assume the user won’t inadvertently press a button or do something like put the device in a back pocket with the application running, it shouldn’t be a problem. Alternatives might include adding a confirmation dialog box to each action (“Really send message?”), adding a button or a timer to lock the screen, or exiting the program by default after the message is sent.</p>
<p class="sectionHeading">Building a UI for the Small Screen</p>
<p>One of nice things about Python* is the abundance of libraries to do just about anything you need to do. The Compal* MID used for this project has a number of useful libraries installed as a part of the base operating system, and I chose to take advantage of them. The GTK+* user interface (UI) library contains a wealth of resources for building everything from a simple dialog box to a complex data-entry form. PyGTK is a wrapper around the GTK+ library and provides access to virtually every routine through standard Python objects.<br />Building a login dialog using PyGTK consists of creating a simple window with individual text boxes for username and password. You can find a good example and explanation of these techniques on the <a href="http://www.pygtk.org/pygtk2tutorial/sec-TextEntries.html">PyGTK tutorial page</a>. In the following code snippet the pass_input.set_visibility(False) line causes the password to be blanked by a dot symbol:</p>
<p> </p>
<pre name="code" class="cpp">def login(self):<br />        dialog = gtk.Dialog('Login', <br />                            self.window,<br />                            flags=gtk.DIALOG_MODAL | gtk.DIALOG_DESTROY_WITH_PARENT,<br />                            buttons=(gtk.STOCK_CANCEL, gtk.RESPONSE_REJECT,<br />                                     gtk.STOCK_OK, gtk.RESPONSE_ACCEPT))<br />        dialog.set_default_response(gtk.RESPONSE_ACCEPT)<br />        <br />        userbox = gtk.HBox(False)<br />        <br />        user_label = gtk.Label('Username:')<br />        userbox.pack_start(user_label)<br />        <br />        user_input = gtk.Entry()<br />        user_input.set_activates_default(True)<br />        userbox.pack_start(user_input)<br />        <br />        dialog.vbox.pack_start(userbox)<br />    <br />        passbox = gtk.HBox(False)<br />        <br />        pass_label = gtk.Label('Password:')<br />        passbox.pack_start(pass_label)<br /><br />        pass_input = gtk.Entry()<br />        pass_input.set_activates_default(True)<br />        pass_input.set_visibility(False)<br />        passbox.pack_start(pass_input)<br />	  dialog.vbox.pack_start(passbox)<br /></pre>
Another chore for the small screen is creating a finger-friendly interface. For this application, that means buttons appropriately sized and spaced so that an adult‘s finger can easily press the button of choice. It is possible to code the button sizes specifically for the Compal screen size and resolution, but a more general approach would be to use the available system information to calculate the appropriate dimensions.
<p> </p>
<p>A second option with GTK+ is to create a table that fills the screen with a matrix of buttons that essentially takes up all the screen real estate. Although doing so might not be as visually appealing, it does accomplish the task of building a finger-friendly interface in which you can easily “click” the right button. It also provides a few more options for adding descriptive text to the button for the Twitter application. In Python, the matrix would look something like this:</p>
<p> </p>
<pre name="code" class="cpp">def create_table(self):<br />        self.table = gtk.Table(3,4,True) # Create a 3 row by 4 column table<br />        button1 = gtk.Button("Button 1") # Create a button named button1<br />        self.table.attach(button1,0,1,0,1) # put button 1 in location 0,1<br />        button2 = gtk.Button("Button 2")<br />        self.table.attach(button2,1,2,0,1)	<br />        button3 = gtk.Button("Button 3")<br />        self.table.attach(button3,1,2,0,1)<br />.<br />.<br />.	<br />        button12 = gtk.Button("Button 12")<br />        self.table.attach(button12,1,2,0,1)<br /></pre>
<p> </p>
<p>The lines with a single period are meant to indicate that the sequence repeats down to the final definition of button12.</p>
<p class="sectionHeading">Coding Practices</p>
<p>Python is a language that allows you to write code by the brute-force method, as in the lines above, or in a more elegant way using concepts like iteration. If you were to define all 12 buttons in a linear fashion, you would need 26 lines of code. Using the Python for construct, you can accomplish the same task in a mere seven lines of code. That equates to less than one-third of the code for this simple example, but the difference would be substantial for a larger table. Here’s the code in a more “Pythonic” way:</p>
<pre name="code" class="cpp">    def create_table(self):<br />        self.table = gtk.Table(3,4,True)<br />        for row in range(3):<br />            for col in range(4):<br />            	name = 'Twitter %i' % (row*4 + col + 1)<br />            	button = gtk.Button(name)<br />            	self.table.attach(button, col, col+1, row, row+1)<br /></pre>
<br />
<p>Keeping your code manageable is important when scripts start to get large. Python functions are a good way to break down your code into small, manageable pieces. It’s also important to point out that everything in Python is an object. You can see this to some extent in the create_table function through the use of the self construct. The function create_table creates a gtk.Table and returns it as an object—hence, the use of self to refer to the object being created. There is an abundance of resources on the Web if you’re not familiar with object-oriented programming concepts.</p>
<p>Taking advantage of all the built-in language features and functions is another way to keep your source code manageable. Python has a module for reading and writing configuration files named ConfigParser. This module provides all the tools you need to save and read program configuration information. It supports different sections and creates a file of name–value pairs within each section. There are even individual methods to retrieve specific types, such as getboolean, getint, and getfloat.</p>
<p>If you can’t find what you need in the Python standard library, chances are that someone else has already written what you need. A quick Google* search typically returns multiple choices for a specific tool. Python-Twitter is a good example of a helper library to accomplish the heavy lifting of sending messages to the Twitter service. It’s hosted on Google Code* and even comes with several sample applications.</p>
<strong>Testing and Debugging</strong>
<p>You can test code on the Compal MID in several ways. File transfer over a USB port is drop-dead simple and works well.  Optionally, you could attach a USB keyboard to the Compal MID and use the VI editor directly on the device for your editing and the Python interpreter for testing. This method works okay for small proof-of-concept efforts but gets out of control for anything but small, simple programs. Another, similar approach is to use Virtual Network Computing* (VNC) to remotely view the screen on the device through your workstation.</p>
<p>Another way is to use the emulator approach.  The Moblin* project has a tool called the Moblin Image Creator* (MIC) for building platform-specific images. With MIC you can also use the Xephyr* emulator tool to launch an independent session for testing purposes. This method has the advantage of a rapid build / test cycle to help work the bugs out of your code in short order.</p>
<p>A final way might be to test the initial version of the software on your Linux* desktop.  The advantage is that you don’t need MIC.  The disadvantages are that you may need additional hardware for your desktop (GPS, for example) and that you can’t test MID specific functionality (such as screen characteristics).</p>
<p class="sectionHeading">Lessons Learned</p>
<p>Don’t get bogged down in the details too early in the process. It’s important to completely flesh out your requirements in the beginning, then make some design decisions based on a clear picture of what you’re trying to accomplish. Get comfortable with your development tools—especially the debugging portion—as you’ll probably use them more than you think.</p>
<p>The easier it is for you to test and debug your code, the quicker you’ll get it running.</p>
<p>Have a convenient way to transfer files to your device that doesn’t require a lot of motion. This could be as simple as keeping an easily accessible USB cable plugged into your workstation. When you get down to squashing bugs, it helps to make the process as smooth and painless as possible, especially if you’re editing all the code on a workstation and have to move it over to the device for testing.</p>
<p class="sectionHeading">Summary</p>
<p>Building a solid application for the MID platform requires the same set of disciplined steps you would use in any software project. Be sure you don’t skip steps like these:</p>
<ul>
<li>Take your time on the design process, and consider alternatives.</li>
<li>Think through the UI design from a user’s perspective before you start coding.</li>
<li>Write your code with testing in mind.</li>
<li>Have a clear set of requirements that you can test.</li>
<li>Use tools such as source code control, and check in your code frequently.</li>
</ul>
<br />
<p class="sectionHeading">About Author</p>
<p>Paul Ferrill has been writing in the computer trade press for more than 20 years. He got his start writing networking reviews for PC Magazine on products like LANtastic and early versions of Novell Netware. Paul holds both BSEE and MSEE degrees and has written software for more computer platforms and architectures than he can remember.</p>
<br /><span class="sectionHeading"><br />Link to Part 1 of this series</span><br /><br /><a href="http://software.intel.com/en-us/articles/developing-for-mobile-internet-devices-part-1-tools-choices-and-development-environment-configuration/"><span ><strong>Tools, Choices, and Development Environment Configuration</strong></span></a>.<br /><br /> ]]></description>
      <link>http://software.intel.com/en-us/articles/developing-for-mobile-internet-devices-part-2-designing-coding-and-testing-a-twitter-location-based-application/</link>
      <pubDate>Tue, 19 May 2009 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/developing-for-mobile-internet-devices-part-2-designing-coding-and-testing-a-twitter-location-based-application/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/developing-for-mobile-internet-devices-part-2-designing-coding-and-testing-a-twitter-location-based-application/</guid>
      <category>Mobility</category>
      <category>Intel® AppUp(SM) Developer Community</category>
      <category>MID</category>
    </item>
    <item>
      <title>Power Efficiency – Analysis and SW Development Recommendations for Intel® Atom™ based MID platforms</title>
      <description><![CDATA[ <p class="sectionHeading">Download PDF</p>
<p><a href="http://software.intel.com/file/15306"><br />Power Efficiency – Analysis and SW Development Recommendations for Intel® Atom™ processor based MID platforms</a> [PDF | 2MB ]</p>
<p class="sectionHeading">Background</p>
<p>The objective of this paper is to investigate Intel® Atom™ processor based MID (Mobile Internet Device) platform power (including chipset components) characteristics for typical workloads, such as media playback, browsing, idle, etc. The results are aimed at providing recommendations for SW developers on how to best optimize applications for power efficiency.<br />The platform power characteristics were captured using two methods:</p>
<ul>
<li>Instrumented Crown Beach/Menlow Software Development Platform (SDP) (including Intel® Atom™ processor and chipset) enabling discrete component power measurements using Fluke NetDAQ* equipment</li>
<li>Linux* tool, <a href="http://www.lesswatts.org/projects/powertop/">PowerTOP*</a>, to extract information on processor C-state (sleep) residency, P-state (execution) residency and wakeup characteristics.</li>
</ul>
<p>The Intel® Atom™ processor provides low power features such as:</p>
<ul>
<li>New core x86 micro-architecture    
<ul>
<li>In-order execution</li>
<li>Multithreading support. Hyper Threading Technology (HT) aka. Simultaneous Multi-Threading (SMT). <em>Not available for all Atom SKUs.</em></li>
<li>Enhanced macro-op execution</li>
</ul>
</li>
<li>45nm technology</li>
<li>Enhanced Intel® SpeedStep® Technology </li>
<li>Up to SSSE3 instruction set support</li>
<li>Deep Power-Down Technology - C6 processor sleep state</li>
<li>Dynamic cache sizing</li>
<li>Enhanced dynamic clock gating</li>
<li>Enhanced L2 data pre-fetcher</li>
</ul>
<p>The chipset provides features such as:</p>
<ul>
<li>Graphics HW acceleration (including video acceleration used by media player)</li>
<li>Intel® Display Power Saving Technology</li>
<li>Intel® Rapid Memory Power Management</li>
</ul>
<p>The OS/SW stack used for this investigation is <a href="http://moblin.org/" target="_top">Moblin* Linux</a> with the <a href="https://helixcommunity.org/" target="_top">Helix* media framework</a> using HW accelerated video codecs. Other OS or SW stacks such as RedFlag*, MIDinux*, or Microsoft Windows* Vista are out of scope for this investigation.</p>
<p>The "Recommendations" chapter summarizes the findings from the analysis and provides guidelines on how to develop power efficient SW on the MID platform. The following chapters are divided into "Technical Background", which describes some key technical concepts, "Measurement procedure", which outlines the methods used to capture the data presented in this paper and "Workloads", which breaks down results for each of the workloads analyzed.</p>
<p><em>To read the rest of the article, please download the PDF below.</em></p>
<p class="sectionHeading">Recommendations</p>
From the Intel Atom™ processor platform power data analyzed in the following chapters a number of recommendations were assembled on how to design SW for power efficiency on the MID platform.<br /><br />In essence the goal is to maintain the performance while improving application power efficiency leading to extended device use time between charges.<br /><br /><strong>2.1. Reduce application demands for processor wakeups</strong><br /><br />By minimizing the frequency (or coalescing) of wakeups, the processor is able to move to, and reside longer in, deeper sleep states such as C6, allowing lower average processor power. Using C6 deep sleep state the processor is able to operate at an average power of 80mW when idle.<br /><br />Optimally, a power efficient application in idle shall showcase the same (or minimal difference) processor wakeups per second as when system is in idle (application not running). A basic approach to measure the above is measuring "power at the wall" or verifying application wakeup behavior using PowerTOP. PowerTOP provides a very good method for measuring the C/P state residency of the system and for finding main sources of wakeups.<br /><br />Refer to chapter 5.1.1: Processor sleep behaviour" and "5.2: Video playback" for details.<br /><br /><strong>2.2. Utilize an energy-efficient UI and framework technology</strong><br /><br />It is important to be aware of the UI power implications regardless if your application is native or if it runs on-top of a runtime environment or framework. If possible, when developing the application try to select a UI, runtime and framework that are energy efficient. For instance, an application based on Flash will be less power friendly due to the high number of wake ups in idle and the runtime overhead.<br /><br />Although somewhat OEM specific, if possible, utilize a home screen solution (application launcher) with low average idle power footprint such as an HTML or OpenGL based solution.<br /><br />Refer to chapter "5.1.2: Processor &amp; Chipset power usage in Idle modes" for details.<br /><br /><strong>2.3. Thread applications well, even for single core processors</strong><br /><br />When HT is enabled it will allow an additional thread to execute per processor core in case the first thread is stalled. This will in many cases improve performance for well threaded workloads. Additionally, threaded video workloads often experience increased frame rate, due to the increased performance.<br /><br />The use of HT for threaded workloads increases average power but due to improved performance workloads will complete faster, resulting in lower net energy compared to executing the workload with HT disabled. In essence, if an application exploits multi threading well, energy can be decreased significantly, improving battery life.<br /><br />Observe that some of the Intel® Atom™ processor SKUs do not feature HT.<br /><br />Refer to chapter "5.3: Benefits of HT on threaded workloads": Benefits of HT on threaded workloads" for details.<br /><br /><strong>2.4. Media Workloads: Take full advantage of hardware accelerated codecs</strong><br /><br />Media frameworks such as the Helix framework, provide HW accelerated codecs for video standards such as H.264. HW acceleration greatly improves average power footprint and device up-time. Another benefit with HW acceleration is that processor is offloaded allowing it to perform other tasks.<br /><br />Higher definition video scales well and does not significantly increase average power footprint of chipset and processor. Although, it is important to note that the average power required by memory increases rapidly for higher definition content due to greatly increased memory access.<br /><br />It is not recommended to use SW codecs for Intel® Atom™ processor based platforms as the processor gets maxed out even at low resolutions (480p and below) resulting in very high average power.<br /><br />Refer to chapter "5.2: Video playback" for details.<br /><br />HW accelerated codecs for the Linux MID platform are available through the Helix community, <a href="https://rp4mid.helixcommunity.org/" target="_blank">https://rp4mid.helixcommunity.org/</a>.<br /><br /><strong>2.5. General power efficiency recommendations</strong><br /><ol>
<li>Be aware that one power inefficient application in Idle is enough to cripple battery life for the whole system.</li>
<li>Develop context aware applications that adapt to system state changes Refer to <a href="http://software.intel.com/en-us/articles/energy-efficient-software-developing-power-aware-apps">http://software.intel.com/en-us/articles/energy-efficient-software-developing-power-aware-apps</a> for articles on developing for context awareness.    
<ul>
<li>Power state change (system sleep/wake transitions): The application shall handle system power state change gracefully and not incur unnecessary delays. The application shall not prevent system from changing power state unless absolutely necessary. </li>
<li>Battery state change: Adapt behavior due to switch from AC to/from battery power. Some features, such as automatic updates, could potentially be deactivated to save power when powered by battery.</li>
<li>Network state change: Adapt application behavior to network connect/disconnect events. Some features could potentially be deactivated, saving power when not connected to network (flight mode). </li>
</ul>
</li>
<li>Develop "Data Efficient" applications that minimize data movement to/from external storage or RAM. Also investigate if data movement can be grouped or batched to decrease frequency of drive spin up/down.</li>
<li>Utilize extended SIMD instruction sets such as SSE3, improving performance and indirectly reducing application energy usage.</li>
<li>Utilize tools to achieve power optimizations or identify power optimization opportunities   
<ul>
<li>Utilize the Intel® Compiler for MID to compile the code into a binary specifically adapted to the Intel® Atom™ micro-architecture. This has the potential to greatly improve performance and in some cases also energy efficiency</li>
<li>Utilize PowerTOP to find components waking up the system excessively</li>
<li>Utilize Application Energy Toolkit to graph/analyze energy consumption</li>
</ul>
</li>
</ol><strong>2.6. Platform configuration recommendations</strong><br />Note that the following recommendations are somewhat OEM SW/OS specific and sometimes not applicable to application development.<br /><ol>
<li>Make sure the system is configured to power the screen off if there is no user input for a suitable time interval. This reduces the average power in the chipset and allows additional power savings from LCD power down.<br /><br />Refer to chapter "5.1.2: Processor &amp; Chipset power usage in Idle mode" for details.</li>
<li>Using on-demand governor to regulate P-state, depending on workload, improves system power efficiency.</li>
</ol><strong>2.7. Suggested power optimization process</strong><br /><br /><ol>
<li>Optimize to meet performance goals</li>
<li>Establish baseline before power optimizations    
<ul>
<li>Gather data using PowerTOP in single threaded mode (Powertop currently not as accurate in HT mode) <ol>
<li>Wakeups/s </li>
<li>Cx state residency</li>
<li>Px state residency</li>
</ol></li>
<li>If possible utilize external power meter to measure average power during application idle compared to system idle and during key workload execution</li>
</ul>
</li>
<li>Investigate behavior during application idle compared to system idle. Explore difference in wakeups/s, Cx/Px states. Optimally an application in idle should have none or minimal impact on total system idle power</li>
<li>Use Intel® VTune™ for MID to identify functions that show high C0 residency running key application workloads. Utilize processor "unhalted ref clock" counter to assess C0 residency. Note that "Unhalted core clock" is unreliable due to frequency change (if Intel® SpeedStep® Technology is enabled) and that "Time stamp counter" is unreliable at and below C4 due to disabled PLL. "unhalted ref clock" / wall clock time can be used to compute true C0 residency.</li>
<li>Re-architect/re-code    
<ul>
<li>Look for activities that force processor into C0 state (interrupts)</li>
<li>Coalesce or remove unnecessary activities to reduce wakeups</li>
<li>Look into data efficiency bottlenecks (disk, memory, network)</li>
<li>Note that optimizing for speed may increase C0% which might be OK since overall energy reduces due to performance speed up</li>
</ul>
</li>
</ol>
<p>For further details on the above and related topics please find detailed articles at Intel Software Network (ISN): Energy Efficient Software <a href="http://software.intel.com/en-us/articles/energy-efficient-software/">http://software.intel.com/en-us/articles/energy-efficient-software/</a>.</p>
<p class="sectionHeading">Technical Background</p>
<p>This chapter explains some of the technologies in focus for this paper.</p>
<strong>3.1. C6 - Deep Power Down State</strong><br /><br />The Intel® Atom™ processor features a new deep sleep state named C6. The new sleep state enables the processor to move into deeper sleep during inactive intervals between executions. The benefit of the new state is a much lower average power in idle mode in comparison to the C4 sleep state.<br /><br /><img src="http://software.intel.com/file/15733" alt="" /><br /><br />The above illustration shows the impact on the processor when C6 is active. For instance in deep power down the core voltage is substantially decreased (~0.3V) and the caches are turned off (flushed) leading to significant power savings. Observe that the bars do not depict the exact values but illustrate the value relative to other states.<br /><br />The deeper sleep state has a side effect in that the processor has a much longer wake-up time compared to wake-up from C4. Depending on the frequency of processor wakeups the benefit of C6 will vary. On average, if the processor has ~200 wakeups per second the benefits of C6 are lost due to the latency/power cost sleep state transitions. For clear benefits of C6 we recommend &lt;100 wakeups per second.<br /><br />The applications running on the platform are very much affecting the frequency of the processor wakeups. For instance, by carefully selecting periodic timers with longest possible interval, the application allows the processor to move to and stay longer in deeper sleep states.<br /><br /><strong>3.2. Hyper-Threading Technology (HT)</strong><br /><br />Hyper-Threading Technology (also Simultaneous Multi-threading (SMT)) improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle. The benefit is especially useful for a single core processor such as Intel® Atom™ processor , as it enables execution of an additional thread while execution is stalled for first thread, for instance due to a cache miss.<br /><br />Observe that the OS scheduler perceives the processor as having two cores when HT is enabled. <br /><br />HT may improve the performance for multithreaded code running on the processor. The exact performance benefit depends on the specific workload and on how well threaded the code is.<br /><br />
<p class="sectionHeading">Measurement Procedure</p>
<p>Note that this paper does not analyze all component parts of a typical MID form factor device. MID devices will feature a range of wireless communication components such as WiFi, WiMax, WWAN or BT all increasing the average platform power footprint. Extended analysis of form factor devices might include networked workloads such as audio/video streaming, Chat or Email, VOIP, P2P solutions, connected browsing and gaming.</p>
<strong>4.1. System setup</strong><br /><br />The system BIOS menu provides access to settings for several key processor features such as turning HT on/off, selecting "deepest" sleep state (C4 and C6 used for this study).<br /><br />After system start-up, the "on-demand" power governor was activated, enabling the processor to throttle P-state depending on the performance needs of the workload.<br /><br />Refer to chapter 7: "Appendix B" for details on system configuration and setup.<br /><br /><strong>4.2. NetDAQ measurements</strong><br /><br />Using Fluke NetDAQ equipment we were able to measure voltage and current for discrete components on the instrumented Crown Beach/Menlow platform board. The board was instrumented by connecting wires from 4 NetDAQ measurement modules to sense resistors on the board. This enabled us to capture voltage and current for components such as processor, chipset, RAM, PATA and PCI-bus etc. and calculate the power usage per component or group of components.<br /><br />Several measurements for all workloads were collected, abnormal captures omitted and most common representative collection of measured data selected.<br /><br />Refer to chapter 7: "Appendix B" for details on NetDAQ setup.<br /><br /><strong>4.3. PowerTOP measurements</strong><br /><br />PowerTOP (<a href="http://www.lesswatts.org/projects/powertop/" target="_blank">http://www.lesswatts.org/projects/powertop/</a>) is a Linux tool that measures how well the system uses various hardware power-saving features. It also highlights culprit software components that prevent optimal usage of hardware power savings and provides tuning suggestions. Furthermore the tool provides measured C (sleep) and P (execution) state residency, and the number of wakeups/s for the processor as a whole and for top software component culprits.<br /><br />For this investigation some automated scripts were created to launch the workload and then capture PowerTOP data over an interval of time. Several measurements for all workloads were collected, abnormal captures omitted and most common representative collection of measured data was selected.<br /><br />The PowerTOP measurements were not synchronized with measurements gathered using the NetDAQ, as a direct correlation technique was not available.<br /><br />To examine the actual processor C-state residency it is important to understand the mapping between the ACPI C-state (presented by PowerTOP) and the actual processor C-state. Unfortunately ACPI in its current form does not provide greater granularity. Refer to the below table for mapping ACPI C-state to processor C-state (depending on lowest sleep mode selected in BIOS).<br /><br /> 
<table class="tableformat1" border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td>ACPI: C-state</td>
<td>Processor: C-state (with C6 activated in BIOS)</td>
<td>Processor: C-state (with C6 deactivated in BIOS)</td>
</tr>
<tr>
<td>C1</td>
<td>C1</td>
<td>C1</td>
</tr>
<tr>
<td>C2</td>
<td>C2</td>
<td>C2</td>
</tr>
<tr>
<td>C3</td>
<td>C4</td>
<td>C4</td>
</tr>
<tr>
<td>C4</td>
<td>C6</td>
<td>-</td>
</tr>
</tbody>
</table>
<br />Below is an example of a typical PowerTOP dialog, where C0 is computed by PowerTOP from the captured C1-C4 (ACPI) data.<br /><br /><img src="http://software.intel.com/file/15734" alt="" /><br /><br /><em>In the above example, PowerTOP indicates that, over the measurement interval, the system was in C2 87.1% of the time and at the lowest P-state (600MHz) 89.4% of the time. Wakeups per second during the interval averaged 81.2. PowerTOP also suggests that turning off Bluetooth will save power.<br /><br /></em><a href="http://software.intel.com/en-us/articles/power-efficiency-analysis-and-sw-development-recommendations-for-intel-atom-based-mid-platforms-2/">Continue to Part 2 of this article.</a><em></em> ]]></description>
      <link>http://software.intel.com/en-us/articles/power-efficiency-analysis-and-sw-development-recommendations-for-intel-atom-based-mid-platforms/</link>
      <pubDate>Tue, 05 May 2009 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/power-efficiency-analysis-and-sw-development-recommendations-for-intel-atom-based-mid-platforms/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/power-efficiency-analysis-and-sw-development-recommendations-for-intel-atom-based-mid-platforms/</guid>
      <category>Mobility</category>
      <category>Intel® AppUp(SM) Developer Community</category>
    </item>
    <item>
      <title>An Edge By Design for MID Applications</title>
      <description><![CDATA[ <em>Design research and design-driven best practices can help your application development get an edge.<br /><br />By Knut Graf.  Knut is a Principal Designer at <a href="http://www.frogdesign.com/" target="_blank">frog design</a>, a global innovation firm. <br /></em><br /><span class="sectionHeading">Abstract<br /></span><br />The Mobile Internet Device (MID) platform has to establish a following of users and a set of high quality applications, which presents a kind of chicken and egg challenge. The lack of a track record to refer to, along with other factors that are common to emerging platforms, makes application development on MIDs challenging. Yet, high-quality applications are critical to the platform's success. Design, as expertise and as a set of practices, can address many of the risk factors that characterize this situation.<br /><br />Design ensures that the product vision is relevant to the user by gathering knowledge about the user and the application's usage environment through informal discovery activities and through formal research. Based on the knowledge gained in this discovery phase, a crisp feature definition lays the groundwork for a great user experience, which is built out through a compelling look and feel. A prototype confirms the product vision amongst team members and guides implementation and testing, thereby reducing the need for lengthy and tedious traditional requirements documentation.<br /><br />The guidance and streamlining that design brings to an application development process allows the core development team members to devote proper amounts of attention to technical challenges and quality, greatly improving the odds that the project will be completed and successful.<br /><br /><span class="sectionHeading">Introduction</span><br /><br />The software industry has established a track record over the last few decades of slowly making its development practices more efficient and effective. Even so, software projects are still burdened with high failure risks. The quality of projects that make it to market is hard to predict. For example, in consumer-facing applications, for every well-rounded product, there is another one full of bugs and quirks. Properly introducing "design" into a software project is one way to increase its odds of success.<br /><br />Many processes and best practices address risk and quality concerns in software projects. In practice however, such practices compete with established habits and the temptation to take shortcuts.<br /><br />Design has a variety of effects. Approached naively, design can raise expectations and then confuse and complicate the actual process, inundating unprepared developers with distractions and implicit requirements. On the other hand, a proper approach to harnessing design for software development can ensure that the end product is both relevant and beautiful. Along the way, the collaborative aspects of design process can align stakeholder's expectations and even save development time by using prototypes to communicate project goals.<br /><br />Design - whether it be "Interaction Design," "User Interface Design," or "Experience Design" - is not retrofitting a screen with pretty graphics. Applying design to an existing project provides limited value. Design in software development is much more valuable when applied as a way to discover, refine, and execute proper success criteria.<br /><br /><span class="sectionHeading">Addressing the MID Form Factor</span><br /><br />The Mobile Internet Device, or MID, is a low-power mobile device with many of the capabilities of a full PC. It employs a small, high-resolution touch screen and a reduced keypad. The MID platform provides unique new usage opportunities and advantages that translate into an exciting new market opportunity. MIDs offers new capabilities in terms of computing power, display quality, touch-input quality, and connectivity, along with a high amount of energy invested in the software tools. <br /><br />The way the platform's promise translates into value to the end user is through the software running on it. Software turns abstract capabilities into user-facing features. Software applications will decide the fate of the platform: Compelling usefulness will give mobile internet devices a place in our everyday lives, bringing success to the platform. Falling short, MIDs will join our Palm* Pilots in the gadgets drawer, awaiting their fate at the next garage sale.<br /><br />The platform's promise does not guarantee success. MIDs faces some specific challenges in terms of application development - challenges that design can help address.<br /><br /><strong>Thinking Small<br /></strong><br />With or without all the power delivered by a MID, the small device form factor limits the gamut of interactions that people will want to engage in. Comparing the same task on a small device and a large device, the small device requires more dexterity and concentration from the user. Simplifying tasks for small devices means challenging the user's habits and expectations.<br /><br /><strong>Lean Teams<br /></strong><br />The limited out-of-the-gate user base constrains the economically sensible investment to be made. Development teams will be lean and schedules will be short, even as the application development community is still familiarizing itself with the MID development environment.<br /><br /><strong>Porting</strong><br /><br />Many projects are ported from existing implementations on other platforms. Considerable updates to such projects are often required to allow them to take full advantage of the specific conditions offered by MIDs.<br /><br /><strong>Device Definition</strong><br /><br />Smart phones, with ever-growing capabilities, and the booming netbook contest the space for the MID platform. Acceptance for MIDs depends on unique value which neither of these other device classes can offer.<br /><br /><span class="sectionHeading">Establishing a Target Design</span><br /><br />In software application development, challenges are not unusual. Any software development project, regardless of the platform, comes with challenges. The way to mitigate the risk associated with these challenges is to decide upon a focused, valuable target, and to stay on this target. <br /><br />At the heart of a software application is the basic idea, "does it do something useful?" For the application to be successful, the answer must be "yes." A software project starts with existing ideas for specific features, or with general goals. These existing goals are formalized, to arrive at an implementable specification. During this process of formalization, the goals are examined, refined and adjusted.<br /><br />The first step towards the feature definition is learning who the end user is, what the end user likes, and what the end user does.<br /><br />A certain amount of general knowledge about users can be derived from the MID platform goals as a whole, as well as from interpreting the usage opportunities offered by target devices appearing on the market. Beyond this common sense approach, design provides a set of tools to get a clear picture of the end user and of the opportunities for application usage afforded to the user.<br /><br /><strong>User Personas</strong><br /><br />A user persona provides a description of an imaginary, but specific person who would use the application. A user persona is meant to be realistic and robust. A user persona describes the habits, preferences and environment of the target character, to provide a framework from which the character would make judgments and decisions. The user persona is a conceptual simulator for the practical value of new features.<br /><br /><strong>Idealized Personas<br /></strong><br />The practical use of an application is not the only question to consider. An application can be an enabler to the user, providing functionality that is socially attractive and desirable, regardless of its usefulness in a traditional sense. <em>Idealized personas</em> provide a handle on defining this type of functionality. Idealized personas are traditionally used in the discipline of marketing, not in the more "down to earth" context of usability. Actual people aspire to the attributes the idealized persona embodies. The idealized persona is more sophisticated then a real person, bound by fewer constraints. Some stereotypes might be used to define this persona. The idealized persona provides a sounding board for empowering, uncommon feature possibilities.<br /><br /><strong>Scenarios</strong><br /><br />A scenario describes a situation, and a sequence of events, in which the device and the software application running on it are used. The scenario is meant to be realistic, to provide a background to judge the value of the planned application and its features. The more detailed the scenario is, the more it can serve as a source and validation point for specific ideas.<br /><br />Other, less standardized forms of documentation can be used as a basis for feature definition. The assumptions and conclusions in any feature documentation can be tested and strengthened by personas and scenarios as verification points.<br /><br />Personas and scenarios are speculative in nature: they are based on assumptions. To add a grounding in realism to these assets, we recommend that design research activities be performed. <br /><br /><strong>Mining Reality for Knowledge: Research</strong><br /><br />Design research mines real life for project-relevant information. Common methods include surveys, user interviews, and contextual inquiry, in which the researcher engages with a user one-on-one to observe activities. Ranging from simple to very involved, the value of the research results grows with the depth of user-engagement. <br /><br />Design research requires preparation time and careful analysis of data. The schedule impact must be weighed against the benefits to the project. Consider that often the benefit is a drastic reduction of risk. <br /><br />For applications that provide narrow specialized feature sets for vertical markets, the value of research is fairly obvious: it provides knowledge of the problem domain to the team. For common applications with mass-market, mainstream functionality, the value proposition is more subtle: while the team is familiar with the domain, the deeper examination can provide insights into key differentiators that escape the naked eye.<br /><br /><strong>Generating the Design<br /></strong><br />Equipped with a picture of a target user, and solid knowledge of the real-world context, designers unearth feature opportunities for the application. Some opportunities will be obvious from looking at a scenario. Interpreting research data and applying abductive reasoning to the problem space knowledge uncovers further opportunities.<br /><br />Designers define features by mapping the opportunities and constraints to an actual software structure. To arrive at valid results, design principles are applied to guide this exercise. In absence of specific native design principles for MID devices, proven design principles from other small platforms can be used.<br /><br />The actual feature definition is expressed as a vision of the resulting program, and this is done in the form of diagrams, wireframes, and visual mock-ups. Abstract features are given a concrete expression. This documentation provides a moment of truth for project stakeholders. It serves as the first concrete shared picture of "the design" and a glance at the project outcome. As the project continues, collaborative discussions will adjust this picture. <br /><br />The design addresses core features first, and then it takes on secondary features such as peripheral details. One after another, parts of the application get addressed, designed, documented, designed and adjusted.<br /><br /><span class="sectionHeading">Design Principles for Mobile Platforms<br /></span><br />For relevance to the MID form factor, Table 1 lists a set of design principles that are relevant and apply to mobile platforms.<br /><br />
<table class="tableformat1" border="0" cellspacing="0" cellpadding="10">
<tbody>
<tr>
<td valign="top">Mobile Platforms Design Principal</td>
<td valign="top">Description</td>
</tr>
<tr>
<td valign="top"><strong>Immediate Results</strong></td>
<td valign="top">Small, mobile devices offer themselves to be used spontaneously, whenever opportunity beckons. Such an opportunity for using the device, especially when occurring in a mobile situation, may not last long, as other things compete for the user's attention. To ensure a successful user experience, the application accessed by the user must waste no time in providing results. The tasks offered by the application must be straightforward, leading to immediate results. Long sessions that require continuous user attention are better suited for a full-size PC.<br /><br />The home screen application of the HP* Mini 1000 MI (Mobile Internet) Edition netbook is an example for focus on immediate results. The HP MI Edition is the Linux* version of this product. Its home screen accumulates recent e-mail, web shortcuts and thumbnails for favorite music and photos. The traditional "launch and dig" sequence to get to content is eliminated. In contrast to this approach, most other netbooks just offer collections of program icons on their home screen.</td>
</tr>
<tr>
<td valign="top"><strong>Adequate Feature Density</strong></td>
<td valign="top">Besides being short, the path to results must also be obvious. Few users will tolerate forced way-finding exercises. Given the constrained screen real-estate, this principle is applied by carefully designing decision trees to present clear decision points, and by limiting the amount of inputs asked of the user. Full-size PC applications have more leeway for less-clear structure. Feature density is an important consideration especially when porting existing full-size applications to a MID.</td>
</tr>
<tr>
<td valign="top"><strong>Adequate Information Density</strong></td>
<td valign="top">While the small, high-resolution screen is a brilliant display, it is still, and foremost, simply small. The number of concurrently displayed content items should be more limited, to avoid intolerably microscopic text and graphics</td>
</tr>
<tr>
<td valign="top"><strong>Flow</strong></td>
<td valign="top">Since not every interaction can be reduced to a few buttons on a few screens, the user will inevitably spend time completing non-trivial tasks. Those tasks may be necessary sequences, such as forms that must be filled, or voluntary sequences, such as meandering through a media library. In either case, the immersion that is achieved by presenting a steady stream of simple choices, creates a level of comfort. The immersion must not be interrupted trivially by modal alerts or dead-end flows. This is a general design principle, but on a small form factor platform, the temptation to present such interruptions is higher then on full-size PC applications, because fewer opportunities are available to communicate secondary information in more subtle ways.</td>
</tr>
<tr>
<td valign="top"><strong>Interruptability</strong></td>
<td valign="top">It is a common full-size PC experience ritual to launch an application, and then to open a document, or to perform some other initiation procedure, before getting to the actual task at hand, and to save and quit the application when done. On a MID device, where application state management is not on the user's mind, and usage session durations are short, this ritual gets in the way. Dealing with any sort of sequential task becomes more feasible, even attractive, once it is possible to effortlessly "Pick up where you left off".</td>
</tr>
<tr>
<td valign="top"><strong>Progressive Disclosure</strong></td>
<td valign="top">Progressive disclosure, the hiding of secondary information on secondary screens, is a well-known, valuable design principle. Applied to a MID, it can be read slightly differently: high density of information, as encountered in an existing PC application, can be preserved, behind lower-density summary screens, when the application is ported to a MID. Such summary screens, presented as the initial entry points of an application, provide the right small-device scale and satisfy many scenarios. More traditional, higher-density screens underneath, can provide unexpected horsepower for full-featured PC functionality. Handled with care, progressive disclosure is a valuable principle for porting existing applications to MIDs.</td>
</tr>
</tbody>
</table>
<br /><strong>Table1: Design Principals for Mobile Platforms</strong><br /><br /><span class="sectionHeading">Look and Feel<br /></span><br />Expressing features as parts of a software application is not a mechanical procedure. It’s a subjective process, in which the designer makes personal decisions. As a result, the application gains a unique personality. This personality is expressed in the “look and feel,” which has a great impact on shaping the user experience. The user experience — literally the string of moments the user lives through while interacting with the product — determines the user's memory and judgment about the device.<br /><br />Look and feel ensures usability through design patterns that string user-facing objects together in ways that make sense. These patterns are reused across the application, providing comfort and predictability.<br /><br />Look and feel is concerned with the details of the experience: the central moments when the user pays detailed attention, and those less important moments that lie in between and help string the important ones together. All these moments benefit from the beauty of well-executed emphasis and guidance, through visual means like composition, contrasts and harmonies on a pixel level.<br /><br />The user’s experience of an application is shaped over time, as the application changes state during use, bringing up one screen after another, or updating the content that is shown on the screen. This temporal aspect of the experience is part of look and feel too. It can be shaped to provide a good flow, actively help the user with tracking the application’s state, through meaningful transitions.<br /><br />Iconic examples of strong, experience-shaping look and feel are the “Fluent” interface of Microsoft* Office 2007, or the category-defining appearance of the Adobe* Lightroom application. Such strong statements can be controversial at first, as they force users to jettison old habits. But they usually establish a following and invite imitations.<br /><br /><strong>Implementation Complexity as a Concern<br /></strong><br />While working on the ‘look and feel’, the designer must be aware of the implementation platform, and its constraints and opportunities as they relate to the user interface. For MIDs, Hildon* is the primary choice platform, and it has some very specific characteristics. It cannot be emphasized enough that the design must be implementable on the given platform, by the developers tasked with the job, in the timeframe available.<br /><br />To avoid surprises, it is essential that the actual development team members who are responsible for the UI execution provide their perspective to the designer during the look and feel work. A professional designer is prepared to listen.<br />With a design team on board, there usually is a temptation to go for a totally custom UI, to maximize usability, beauty, flow, and branding. These are noble intentions, but must be checked against the realities of schedule- and resource constraints. A simpler look and feel that is actually executable might be a better way to go. After all, the measure of success is the real product that makes it to market.<br /><br />To ensure that the project is on track, the design must be truly understood by all stakeholders. A particular asset can communicate the design much better then any combination of diagrams, wireframes and specification documents: the prototype.<br /><br /><span class="sectionHeading">Expressing Living Features: A Prototype</span><br /><br />As the regular implementation preparation proceeds, a design team can provide a valuable contribution typical project constituents can’t — a prototype. A developer, sometimes called a design technologist, puts the prototype together. This prototype shows core moments and aspects of the application. The ideas shown are those of the project team as a whole, as captured from the stakeholders. The design team shapes these ideas, infusing them with a user-facing structure that eventually evolves into a coherent look and feel. The main value of a prototype is its ability to play out concurrent changes from multiple separate angles of the project. Valuable time can be saved this way, and the project team can cover a lot of ground in the process. <br /><br />The prototype is an antidote to analysis paralysis. Application structure, functionality, and look and feel come together as a unit and are refined together. The in-progress result is visible to the entire team, serving as a validation point. The aspect of validation can be taken further, by presenting the prototype to users, who provide both usability feedback and emotional responses, without the need for preparation of additional test assets.<br /><br />The prototype neither has to be perfect nor complete, as long as it provides meaningful insights. Bugs and dead ends are allowed and expected, so the developer can focus on relevant details.<br /><br />As the prototype is built out, it takes on the role of a living specification, replacing much of the traditional specification documentation that otherwise must be written and maintained. Traditional specifications still play a supporting role, filling in detail for areas that the prototype doesn’t cover. The application developers refer directly to the prototype as a reference for the implementation. The QA team can do the same, informing a traditional QA process by referencing the prototype, bringing efficiencies to the testing process.<br /><br /><span class="sectionHeading">To Do This, Do You Need a Designer?<br /></span><br />The answer to the above question is, “it depends.” Do you have enough knowledge of your problem space to perform a well-grounded feature definition? Do you have the time to take care of the details for a good follow-through? Will your stakeholders take care of this for you?<br /><br /><strong>Common Sense…</strong><br /><br />It is evident that not every good piece of software is produced with the involvement of a dedicated designer or design team. Much useful work, in terms of crisp feature definition and high execution quality, can be done by “simply” assuming a design perspective.<br /><br />This reliance on common sense thrives when a self-motivated developer is driving the work and making her own decisions, within a manageable scope of work. But the efficiency of this approach does drop off, experience suggests, as the complexity of the challenge grows and more stakeholders come to the table. This is not at all a matter of malevolence by any stakeholder: all parties do want the best outcome. It is more a matter of entropy: as more perspectives and opinions come into play, more forces are at work. In such a situation, any perspective that is not formally represented will not be heard. Ask yourself: who is representing the end user?<br /><br /><strong>…And Addressing Complexity</strong><br /><br />As project size increases, the complexity of the design-addressable challenges also grows, quickly reaching a point where common sense falls short. Here, the benefits of a professional perspective become obvious. Just as the development team should not be handling accounting on the side, it is not in a position to be handling design. The design team’s role is not meant to weaken other project contributors design ideas. On the contrary, the design team synthesizes everyone’s contributions, amplifying the good ones. This synthesis is essential when many different perspectives need to be heard. On large projects without formal design participation, design will go wrong.<br /><br />Engaging a design team can bring otherwise unattainable results to a software project: a unique execution, with an outstanding look and feel. This can turn out to be the single, critical differentiator in the marketplace. <br /><br /><span class="sectionHeading">Change the Odds</span><br /><br />Full-size desktops and laptops have long given up their position as the only form factors for personal computing. After small form factors had been pioneered by the PDA, the mobile phone matured to provide ad-hoc information access and media consumption. The laptop as sole choice for mobile content creation and productivity has made room for the netbook, its smaller, less powerful sibling. Each of these form factors thrives on useful, attractive software. That’s evident when you consider the variety of software being used on near-identical hardware. <br /><br />The MID form factor is in a position to be the next step in this march of personal computing to ubiquity. New form factors start as an underdog. To fuel their possible success, great applications first have to be created, under developmental conditions that are more challenging than those on established platforms. As the software industry comes to terms with the platform, design can make a contribution to the quality of individual software experience offerings, which is what counts most at the current moment in the lifecycle of the platform.<br /><br /><span class="sectionHeading">About the Author</span><br /><br />Knut Graf is a Principal Designer at <a href="http://www.frogdesign.com/" target="_blank">frog design</a>, a global innovation firm. Frog Design* works with the world’s leading companies, helping them create and bring to market meaningful products, services, and experiences. Knut joined frog in 1997 and has since served as Developer, as Content Architect and as Senior Design Analyst before assuming the role of Principal Designer. Knut has worked with clients such as SAP*, Microsoft*, and HP*. His recent work includes contributing to the well-received HP Mini 1000 MI edition user interface as a design lead on the project.<br /><br />Knut's work focuses on software user interfaces, from idea development to implementation. He guides software designs through the time- and resource constraints that real-world software projects bring, ensuring that the user experience of the end product is of the highest quality possible.<br /><br />In his work, Knut brings a programmer’s view of software development to a German design education background, to create innovative solutions that empower both the end user and the development team that has to meet shipping deadlines. In Knut’s opinion, design work must be judged by the quality of the end product. It is the actual result that matters, not the unrealized potential. ]]></description>
      <link>http://software.intel.com/en-us/articles/an-edge-by-design-for-mid-applications/</link>
      <pubDate>Thu, 30 Apr 2009 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/an-edge-by-design-for-mid-applications/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/an-edge-by-design-for-mid-applications/</guid>
      <category>Mobility</category>
      <category>Intel® AppUp(SM) Developer Community</category>
      <category>MID</category>
    </item>
    <item>
      <title>Power Efficiency – Analysis and SW Development Recommendations for Intel® Atom™ based MID platforms 2</title>
      <description><![CDATA[ This is part 2 of an article - You can <a href="http://software.intel.com/en-us/articles/power-efficiency-analysis-and-sw-development-recommendations-for-intel-atom-based-mid-platforms/">read part 1 here</a>.<br /><br />Explore power characteristics of the Intel® Atom™ platform geared for the MID (Mobile Internet Device) category of devices. Providing recommendations for SW developers on how to best optimize applications for power efficiency for this new line of Intel processors/platforms. Some topics also apply to Intel® Atom™ processor based NetBooks.<br /><br />
<p class="sectionHeading">Workloads</p>
The following MID workload categories were analyzed:<br /> 
<ul>
<li>Idle behaviour in typical system idle states</li>
<li>Video playback using Moblin Media Player &amp; Helix framework</li>
<li>Multi-threaded Video decode, audio transcode and Flash workloads used for HT impact analysis</li>
<li>Browsing using Moblin Browser</li>
</ul>
<br />All workload measurements are performed in steady state unless otherwise noted. Linux* kernel version 2.6.22 was used.<br /><br /><strong>5.1. Idle modes</strong><br />Power data was captured for different processor configurations (modified via BIOS):   
<ul>
<li>HT on/off</li>
<li>C6 on/off</li>
</ul>
The following idle modes were analyzed:<br /> 
<ul>
<li>Home Screen - HTML </li>
<li>Home Screen - HTML - Screen off</li>
<li>Home Screen - Clutter (OpenGL)</li>
<li>Home Screen - Flash (UI created in Flash, embedded in HTML)</li>
<li>XTerm</li>
<li>Moblin Browser with default home page loaded</li>
</ul>
Note: As the Flash Home Screen feature was broken, no Flash application icons were visible just a Flash shell. This was considered an acceptable approximation. After launch, a regular terminal shell was invoked (ctrl-alt-F1), the "on-demand" governor activated and measurements were made.<br /><br />Observe that this is by no means an exhaustive list. Additional measurements are needed to provide a more complete overview of possible application/launcher user interfaces.<br /><br /><em>Note: As the Flash Home Screen feature was broken, no Flash application icons were visible just a Flash shell. This was considered an acceptable approximation. After launch, a regular terminal shell was invoked (ctrl-alt-F1), the "on-demand" governor activated and measurements were made.<br /><br />Observe that this is by no means an exhaustive list. Additional measurements are needed to provide a more complete overview of possible application/launcher user interfaces.</em><br /><br /><strong>5.1.1. Processor sleep state behaviour</strong><br /><br />The following NetDAQ data was captured with HT turned on. C6 or C4 (C6 off) was configured as the lowest possible processor sleep state.<br /><br />Below data was captured for when in idle on HTML Home Screen.<br /><br /> 
<table class="tableformat1" border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td>C-State Configuration</td>
<td>Average CPU Idle Power</td>
</tr>
<tr>
<td>C6 off</td>
<td>~160mW</td>
</tr>
<tr>
<td>C6 on</td>
<td>~80mW</td>
</tr>
</tbody>
</table>
<br />Note the ~80mW difference in average power for C6 on vs. off. It is clear that the C6 state has an important positive impact on average power when processor is experiencing low load or is mostly in idle. In essence the new C6 deep sleep state significantly reduces average Intel Atom™ processor power in idle.<br /><br />Refer to chapter 6: "Appendix A" for details on Intel Atom™ sleep states and their characteristics.<br /><br /><strong>5.1.2. Processor &amp; Chipset power usage in Idle modes</strong><br /><br />The following NetDAQ data was captured with C6 and HT turned on. Idle power behaviour for the different Home Screens (application launchers) was compared to the Browser and Xterm in idle.<br /><br />The Home Screens tested represent various UI technologies used for displaying application icons and launching applications. The UI technologies tested was HTML, OpenGL (Clutter) and Flash<sup>1</sup>.<br /><br /><img src="http://software.intel.com/file/15735" alt="" /><br /><br />The above graph illustrates the normalized power behaviour for the 6 targeted idle workloads.<br /><br />Contrary to what might be expected the Clutter home screen (OpenGL) does not lead to increased chipset average idle power. The HTML and OpenGL UI home screen solutions used are quite power friendly.<br /><br />Automatically turning the screen off after an interval of no user input lowers chipset average power significantly (~13.5% below HTML idle). More importantly, powering off the screen also saves LCD power.<br /><br />Moblin browser in idle mode on the default home page consumes a slightly higher average power for processor/chipset (~3.5 % above HTML idle). Observe that the default page did not have any advanced content such as Flash* or Ajax*, etc.<br /><br />When idle on Flash home screen a completely different pattern is revealed. Due to the high number of interrupts, 250 wakeups/s (captured by PowerTOP, see below), the benefit of C6 sleep state is not fully utilized (even though the processor sometimes moves into C6). This is apparent from the much higher average power usage for processor and chipset (33% above HTML idle). <br /><br />An alternative view of the power behaviour is available from the data captured by PowerTOP. Wakeup/s and deeper C state residency is captured in the graphs below.<br /><br /><img src="http://software.intel.com/file/15736" alt="" /><br /><br />From the above data it is easy to see the impact of a high number of wakeup/s on processor deep sleep state residency. For instance, the Flash home screen wakes up the processor ~250 times/s resulting in ~60% of the time spent in C4-C6, while the HTML home screen wakes up the processor 35 times/s resulting in ~98% of the time spent in C4-C6 resulting in significant power improvements.<br /><br />Recent measurement using latest versions of Flash 9 and 10 reveals an improved Flash wakeup pattern resulting in improved power characteristics. Still, even during playback of the simplest Flash content with a frame rate of 1 fps, the wakeups per second in idle does not move below ~100 wakeups/s.<br /><br />For all workloads, the processor spent &gt; 95% in the lowest frequency mode (LFM, 800MHz) execution state (P-state). <br /><br /><strong>5.2. Video playback</strong><br /><br />Power measurements were captured while Moblin Media player (utilizing the Helix framework) played back video of the following formats:<br /><br /> 
<table class="tableformat1" border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td>Video standard</td>
<td>Fps</td>
<td>Rate (kbps)</td>
<td>Resolution(WxH)</td>
</tr>
<tr>
<td>CIF</td>
<td>30</td>
<td>510</td>
<td>352x288</td>
</tr>
<tr>
<td>480p</td>
<td>30</td>
<td>1480</td>
<td>720x480</td>
</tr>
<tr>
<td>720p</td>
<td>30</td>
<td>4400</td>
<td>1280x720</td>
</tr>
<tr>
<td>1080p</td>
<td>30</td>
<td>9800</td>
<td>1920x1080</td>
</tr>
</tbody>
</table>
<br />The media was encoded with H.264 for the video stream and AAC for the audio stream. Furthermore media playback with and without Helix HW acceleration was measured.<br /><br />Observe that 1080p is only supported for some of the Intel Atom™ SKUs.<br /><br />Note that other media frameworks such as GStreamer* also feature HW acceleration of video for the Intel® Atom™ platform.<br /><br />NetDAQ data was captured with C6 turned on and HT was toggled on/off depending on the target measurement.<br /><br /><strong>5.2.1. SW codecs vs. HW accelerated codecs</strong><br /><br />The normalized graphs below compare average normalized power and C0 state residency during SW codec media playback.<br /><br /><img src="http://software.intel.com/file/15737" alt="" /><br /><br />Decoding 480p (~3.5x more data than CIF handled) results in almost 2x in average processor power compared to CIF. Note that in using SW codecs, the processor is heavily utilized even for low resolution playback, as can be seen from the C0 residency graph.<br /><br />There is clearly a need for HW acceleration to allow playback of higher resolution video workloads. The following graph compares average normalized power and C0 state residency during playback with SW codecs vs. playback using HW accelerated codecs.<br /><br /><img src="http://software.intel.com/file/15738" alt="" /><br /><br />From the above graphs it is clear that HW accelerated codecs have great benefits with regards to average power during high definition video playback. The required processor power scales gracefully for increased video resolutions. For instance, using HW acceleration, the platform is able to process 20x more data playing back 1080p vs. CIF with only a minor increase (~25%) in processor power. The average chipset power using HW acceleration also scales well, as will be illustrated below.<br /><br />With regards to C0 state residency the processor for high resolution playback using HW acceleration, the processor shows moderate utilization.<br /><br /><strong><em>Note: The release of the Helix framework used was not optimized for the MID platform and some kernel bottlenecks were identified. These issues have been addressed in more recent Moblin releases leading to much fewer wakeups (reaping the benefits of C6 sleep state) and much improved C0 state residency all leading to lower average power footprint for HW accelerated video playback.</em></strong><br /><br />Note that using SW codecs, media with resolution greater than 480p cannot be played back due to performance limitations.<br /><br />For all HW accelerated workloads the processor spent &gt; 90% in the lowest frequency mode (LFM, 800MHz) P-state. Using SW codecs, the P-state residency indicates very high processor load. For instance, during 480p playback the processor spent just 3% in LFM.<br /><br />The normalized graph below compares average chipset for HW accelerated codec media playback vs. SW codec media playback.<br /><br /><img src="http://software.intel.com/file/15739" alt="" /><br /><br />From the graph is clear that not only does HW acceleration enable playback of 1080p, it also improves the average chipset power. As can be seen in the graph the chipset average power used scales gracefully for higher resolution content. Also note that playing back 1080p using HW acceleration requires about the same average chipset power as playing back CIF content using SW codecs.<br /><br />Data collected with PowerTOP indicates that the processor wakes up 300-600 times/s during video content playback, depending on the media format. The processor therefore has very limited benefits of the C6 sleep state.<br /><br /><strong>5.2.2. Memory load impact on power</strong><br /><br />Another important aspect of media playback is how frequent data is read/written to memory. Large volumes of data transferred to/from memory results in increased average system power. The normalized graph below illustrates the increased playback power use for various definitions of video content.<br /><br /><img src="http://software.intel.com/file/15740" alt="" /><br /><br />Platform memory subsystem uses on average 2.5x more power for 1080p playback vs. CIF playback.<br /><br />The RAM is exercised approximately to the same degree for both HW accelerated playback and playback using SW codecs.<br /><br />Note that several memory access improvements have been introduced to the framework and recent Linux kernels such as 2.6.28 have lead to improved average RAM power.<br /><br /><strong>5.3. Benefits of HT on threaded workloads</strong><br /><br />Below processor data was captured with HT turned on/off while running various multi-threaded SW video decode and audio transcode workloads. Observe that HW acceleration was not used for the following workloads.<br /><br />The following three graphs show typical relative processor performance, power and energy for the workloads with HT turned on/off. Analysis included a range of video workloads with various resolutions, one audio workload (transcoding "wav" to "mp3") and four Flash animation (no Flash video) workloads. The workloads tested were all multithreaded to take advantage of multithreaded processor architectures.<br /><br />Decoding was performed at highest possible rate (disregarding specified media rate) completing workload as fast as possible. When HT was turned on this generally resulted in higher power during workload execution, faster completion and thereby overall energy savings.<br /><br /><img src="http://software.intel.com/file/15741" alt="" /><br /><br />The graph clearly shows the performance benefit of the HT feature. Over the measured workloads we see a 32% geomean performance gain. The gain in performance naturally comes at an expense of average power. The graph below details the relative power overhead when HT is turned on/off.<br /><br /><img src="http://software.intel.com/file/15742" alt="" /><br /><br />From the measured data we see a geomean power overhead of 15%. Due to the increased performance the workloads generally completed faster which has an impact on the energy used by the processor. Below graph details the calculated processor energy benefits of the HT feature. <br /><br /><img src="http://software.intel.com/file/15743" alt="" /><br /><br />From the above graph we see 14% energy savings for the measured workloads.<br /><br />The benefits of increased performance can also be seen for workloads such as Flash video playback. Contrary to former workloads the following workloads do not complete faster with HT turned on. Instead the increased performance gained by HT enabled translates into an increased frame rate (Note that Flash will play back the media at highest possible frame rate, up to the specified media frame rate). <br /><br />The below graph shows measured frame rate for three different Flash video workloads with HT turned on/off.<br /><br /><img src="http://software.intel.com/file/15744" alt="" /><br /><br />From the above graph we measured a geomean frame rate gain of 14%. Besides the frame rate gains the use of HT also improves average power as the below graph illustrates.<br /><br /><img src="http://software.intel.com/file/15745" alt="" /><br /><br />From the measured data we found a geomean power saving of 19%, mainly due to extended time spent in lower P-state.<br /><br />The reason for the decreased power, for Flash video, with HT enabled is due to the increased ability to move to lower P-states. Note that this behaviour is dependent on power management policy and workload.<br /><br />In summary, for the workloads tested, the overall performance gain with HT activated compared to HT disabled was ~32% (ranging from 4-76%) while the overall power overhead with HT activated compared to HT disabled was ~15% (ranging from 5-26%). Due to the increased performance, with HT activated, the workload completion time was shortened leading to a net energy savings of ~14% (ranging from 0-28%). Flash video workloads showcase a frame rate gain of 14% and power savings of 19%.<br /><br /><strong>5.4. Browsing</strong><br /><br />The following NetDAQ data was captured with both C6 and HT enabled.<br /><br />Moblin browser, based on Firefox 3, was used. A very simple Flash content (swf) media file was opened up in browser. Flash content, displaying small flashing text was measured in "idle". The processor and chipset power was measured and compared to Browser idle power behaviour.<br /><br /><img src="http://software.intel.com/file/15746" alt="" /><br /><br />From the above normalized graph it is clear that even for the simplest Flash content the processor is very active. Additional data captured with PowerTOP reveals that the processor wakes up ~340 times/s compared to ~75 times/s when the Browser is in idle on default home page. <br /><br />If Flash content is heavily used during browsing, frequent processor activity will cause a significant decrease in the amount of time available on one battery charge.<br /><br /><em>As future editions of the Flash engine evolve make sure to utilize the latest Flash release as future releases does feature improved power efficiency.</em><br /><br />
<p class="sectionHeading">Appendix A - Atom™ processor specifications</p>
Details on available Intel® Atom™ processor SKUs including data on clock speed, TDP, idle power, FSB, sleep state details and more can be found on the Intel® Atom™ Processor Technology resource site. <a href="http://www.intel.com/products/atom/index.htm">http://www.intel.com/products/atom/index.htm</a><br /><br />
<p class="sectionHeading">Appendix B - HW setup and NetDAQ power measurement setup</p>
<strong>System setup overview</strong><br /><br /><img src="http://software.intel.com/file/15747" alt="" /><br /><br /><strong>Platform specification and configuration</strong><br /><br /> 
<table class="tableformat1" border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td>Intel® Atom™ processor</td>
<td>B1</td>
</tr>
<tr>
<td>Intel® Poulsbo chipset (SCH)</td>
<td>C0</td>
</tr>
<tr>
<td>FSB implementation</td>
<td>CMOS type</td>
</tr>
<tr>
<td>RAM</td>
<td>1 Gb</td>
</tr>
<tr>
<td>LFM</td>
<td>800</td>
</tr>
<tr>
<td>HFM</td>
<td>1600</td>
</tr>
<tr>
<td>BIOS revision</td>
<td>70</td>
</tr>
<tr>
<td>PSB drivers</td>
<td></td>
</tr>
<tr>
<td>Drivers</td>
<td>0.9</td>
</tr>
<tr>
<td>Video</td>
<td>0.15</td>
</tr>
<tr>
<td>Xpsb</td>
<td>0.7</td>
</tr>
<tr>
<td>Helix codecs</td>
<td>0.262 beta 3</td>
</tr>
<tr>
<td>Resolution</td>
<td>1024x600</td>
</tr>
<tr>
<td>Screen brightness</td>
<td>Default</td>
</tr>
<tr>
<td>processor frequency governor</td>
<td>On-demand</td>
</tr>
</tbody>
</table>
<br />The SDP processor used is considered equivalent to the <strong>Intel® Atom™ Z530</strong> SKU.<br /><br /><strong>NetDAQ configuration</strong><br /><br /> 
<table class="tableformat1" border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td>Sample interval</td>
<td>0.05s (20 times/s)</td>
</tr>
<tr>
<td>Measurement modules</td>
<td>4</td>
</tr>
<tr>
<td>Measurement points</td>
<td>36</td>
</tr>
<tr>
<td>Key power measurement objects</td>
<td></td>
</tr>
<tr>
<td>Total processor</td>
<td>VCC, VCC_CORE, VTT_processor</td>
</tr>
<tr>
<td>Total SCH (main)</td>
<td>VTT_SCH, SCH_VCORE, SCH(2), SCH_SUS, SM_SCH</td>
</tr>
<tr>
<td>RAM</td>
<td>DIMM</td>
</tr>
<tr>
<td>Total (main)</td>
<td>Total processor + Total SCH (main) + RAM</td>
</tr>
<tr>
<td>Other SCH</td>
<td>PCIE, DVLDS, SDVO, DPLLA, DPLLB, PCIEPLL, HPLL, AUSBPLL</td>
</tr>
<tr>
<td>Other board</td>
<td>MINIPCIE, DDR2, PWH, CH, KBC, PATA, USB, IMVP, processor_PHASE</td>
</tr>
<tr>
<td>Total</td>
<td>Total (main) + Other SCG + Other board</td>
</tr>
</tbody>
</table>
<br />Instrumented SDP was connected to NetDAQ for measurements via the 4 modules connected to the sense resistors on board. The NetDAQ was in its turn connected via Ethernet loopback cable to host PC where the measurements were collected. The SDP was additionally connected to external LCD kit via USB and to keyboard via PS/2.<br /><br />Fluke NetDAQ collects measured current and voltage from the board sense resistors and transfers the data to the NetDAQ SW tool on the host machine which calculates, adjusts for board specific offsets and accumulates power data according to the power measurement objects listed above.<br /><br />
<p class="sectionHeading">Acronyms</p>
<table class="tableformat1" border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td><strong>CIF</strong></td>
<td>Common Intermediate Format. Video media format of resolution 352x288</td>
</tr>
<tr>
<td><strong>C-state</strong></td>
<td>Processor Sleep state. C1 - C6 sleep states are available on Intel® Atom™. (Observe that C0 should be considered running state)</td>
</tr>
<tr>
<td><strong>FSB</strong></td>
<td>Front Side Bus. Interface between processor and SCH</td>
</tr>
<tr>
<td><strong>HDD</strong></td>
<td>Hard Disk Drive</td>
</tr>
<tr>
<td><strong>Helix</strong></td>
<td>Multimedia framework used as part of Moblin stack. https://helixcommunity.org/ https://rp4mid.helixcommunity.org/</td>
</tr>
<tr>
<td><strong>HFM</strong></td>
<td>Highest Frequency Mode</td>
</tr>
<tr>
<td><strong>HTT</strong></td>
<td>Hyper Threading Technology. Also Simultaneous Multi-Threading (SMT)</td>
</tr>
<tr>
<td><strong>LFM</strong></td>
<td>Lowest Frequency Mode</td>
</tr>
<tr>
<td><strong>MID</strong></td>
<td>Mobile Internet Device. Category of mobile devices based on the Intel® Atom™ processor.</td>
</tr>
<tr>
<td><strong>Moblin</strong></td>
<td>Open Source community for sharing and creating Linux reference stack for MID. http://moblin.org/</td>
</tr>
<tr>
<td><strong>NetDAQ</strong></td>
<td>Networked Data Acquisition Unit. HW from Fluke used to measure discrete component data such as Voltage and Current. http://us.fluke.com/usen/products/NetDAQ.htm?catalog_name=FlukeUnitedStates</td>
</tr>
<tr>
<td><strong>OpenGL</strong></td>
<td>Open Graphics Library. Used by the Clutter reference UI.</td>
</tr>
<tr>
<td><strong>PATA</strong></td>
<td>Parallel ATA (Advanced Technology Attachment). Interface for connecting storage devices such as HD or CD/DVD</td>
</tr>
<tr>
<td><strong>PowerTOP</strong></td>
<td>Measures system/application use of various hardware power-saving features. http://www.lesswatts.org/projects/powertop/</td>
</tr>
<tr>
<td><strong>PSB</strong></td>
<td>Intel® Atom™ MID chipset (also SCH)</td>
</tr>
<tr>
<td><strong>P-state</strong></td>
<td>Processor Execution state. LFM -&gt; HFM</td>
</tr>
<tr>
<td><strong>SCH</strong></td>
<td>System Controller Hub, aka. Poulsbo.</td>
</tr>
<tr>
<td><strong>SDP</strong></td>
<td>Software Development Platform. For the targeted platform also named Crown Beach.</td>
</tr>
<tr>
<td><strong>SIMD</strong></td>
<td>Single Instruction Multiple Data allowing data level parallelism</td>
</tr>
<tr>
<td><strong>SSE</strong></td>
<td>Streaming SIMD Extensions. Extended instruction set</td>
</tr>
<tr>
<td><strong>SMT</strong></td>
<td>Simultaneous Multi-Threading. Also. Hyper Threading Technology (HTT)</td>
</tr>
<tr>
<td><strong>SSD</strong></td>
<td>Solid State Disk</td>
</tr>
<tr>
<td><strong>TDP</strong></td>
<td>Thermal Design Power. Represents the maximum amount of power the thermal solution is required to dissipate</td>
</tr>
<tr>
<td><strong>Workload</strong></td>
<td>Isolated execution object with well defined behavior</td>
</tr>
</tbody>
</table> ]]></description>
      <link>http://software.intel.com/en-us/articles/power-efficiency-analysis-and-sw-development-recommendations-for-intel-atom-based-mid-platforms-2/</link>
      <pubDate>Tue, 21 Apr 2009 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/power-efficiency-analysis-and-sw-development-recommendations-for-intel-atom-based-mid-platforms-2/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/power-efficiency-analysis-and-sw-development-recommendations-for-intel-atom-based-mid-platforms-2/</guid>
      <category>Mobility</category>
      <category>Intel® AppUp(SM) Developer Community</category>
    </item>
    <item>
      <title>Intel® Hardware Accelerated High Definition Video Playback Power Analysis</title>
      <description><![CDATA[ <p>Recent media technologies like Blu-ray* have driven increased viewing of HD content on mobile computers. Correspondent interests have developed in exploring more power opportunities and creating context power awareness that can extend battery life. This white paper analysis validates two hours or more of HD video playback on an Intel-based platform and focuses on the impact of different software playback local power profiles while running various HD Blu-Ray* codecs. The analysis provides recommendations to help create HD energy efficient software and make mobile systems more power aware.<br /><br /><span class="sectionHeading">Download Article</span><br /><br /><a href="http://software.intel.com/file/14396">Download Intel® Hardware Accelerated High Definition Video Playback Power Analysis</a> [PDF 1.9MB]<br /><br /><span class="sectionHeading">Background</span><br /><span class="sectionHeadingText"><br />Platform Power Profile<br /></span>As a precursor to our research, it is important to understand how energy is being consumed in the mobile computer. The power profile provides a model of various components on a mobile computer (mobile platform). Measurement results vary depending on the usage model. For example, the relative contribution of processor power to the overall platform power will be significant in a CPU-intensive workload, but it will not be a dominant factor while the platform is idling. Furthermore, it may also vary depending on whether hardware acceleration is enabled or disabled, as well as the type of codecs that are used in case of video playback. These cases are studied in the scope of the paper.<br />Figure 1 shows how the power profile can vary during various usage models. For this particular profile, the CPU, memory, and file system tests were run using SiSandra benchmarks (<a href="http://www.sisoftware.co.uk">http://www.sisoftware.co.uk</a>) . Note that the platform power in Figure 1 does not include LCD since we have excluded it from our analyses due to the fact the monitor has its own external power supply. (Others include WLAN, HD-Audio, mini-card, ICH, and other peripherals.)<br /><br /><strong>Figure 1. Platform Power Profile<br /></strong><br /><img src="http://software.intel.com/file/10857" alt="" /><br /><br /><span class="sectionHeading">Testing Methodology</span><br /><br />Two HD video playback applications were tested on our in-house Software Development Platform (SDP) (Intel® Core™2 Duo Mobile Penryn Processor T9400). The SDP was instrumented for power measurement/characterization. The playback applications were characterized while playing three different video titles, each with a different high definition encoding format. We also tested the two applications while enabling video Hardware Acceleration via Video Hardware Configuration mode within the applications for each workload. <br /><br /><span class="sectionHeadingText">White Paper Goals</span><br />The goals of this paper are:</p>
<ul>
<li>To validate 2+ hours of HD Video playback on an Intel–based "Cantiga" platform equipped with hardware accelerated video decode.</li>
<li>To understand the impact of software playback application local power plans on creating energy efficient software and extending the battery life.</li>
</ul>
<br /><span class="sectionHeadingText">The Workload</span><br />The workloads used for power analyses on the Intel® SDP were two common HD Video playback applications while each application was playing one of the following video titles with their video configurations:<br />
<ul>
<li>"RV" – encoded in MPEG-2 &amp; Bitrate: 40.000 Mbps</li>
<li>"300" – encoded in VC-1 &amp; Bitrate: 29.999 Mbps</li>
<li>"Casino Royale" – encoded in H.264 &amp; Bitrate: 33.000 Mbps</li>
</ul>
<br />All tiles use the following video configurations<br />
<ul>
<li>Framerate: 23.976 Hz</li>
<li>Resolution: 1920x1080</li>
<li>Aspect ratio: 16x9</li>
<li>xvYCC Stream: No</li>
</ul>
<br /><span class="sectionHeadingText">Configuring the Experiment</span><br /><br />The experiment configuration settings were as follows:<br />
<ul>
<li>Storage: Blu-ray* discs</li>
<li>Optical Drive: Panasonic* Notebook SATA Blu-ray* drive</li>
<li>Memory: 2GB (2X1 GB) DDR3 1066MHz</li>
<li>CPU: T9400 @2.53 GHz Intel® Core™2 Duo Mobile (Penryn) Processor</li>
<li>Chipset Video mode was set to VLD (Default Setting).</li>
<li>HDD 80 GB SATA* Mobile HD 7200 RPM</li>
<li>Operating System: Microsoft Windows Vista* 32bit</li>
<li>Vista* Power Plan: Balanced</li>
<li>Screen mode: Full Screen</li>
<li>Battery Level: 100% at start of each test</li>
<li>Testing Time &amp; Nature: 5+ minutes per title starting at Chapter 1 for each test run</li>
</ul>
<br /><span class="sectionHeading">HD Video Playback Applications<br /></span><br />The graphs in each video playback application section below describe the power characterization results for the applications while running on the Intel® engineering system. Time was measured in milliseconds. Power measurements were acquired using a Fluke NetDAQ* system and corresponding software (v4.0), which reports average power (in watts [W]) which was then converted to total power using application run-time data. <br /><br /><span class="sectionHeadingText">HD Video Playback Application-1</span><br />Each study describes our analysis of a leading video playback application. Application-1 comes with a mobile specific pack. Tests were performed under the app’s Maximum Battery and Maximum Performance local application power settings. The hardware acceleration mode in this particular software uses two configurations: 1) Hardware Decode Acceleration and 2) Color Acceleration. We found that operating with hardware acceleration switched on delivered the anticipated power benefits, particularly within the CPU, and that the savings varied by codec. <br /><br />Figure 2 shows the relative average power consumption of various components in the platform with Intel® integrated graphics hardware acceleration switched on. This configuration clearly shows a reduction in the energy consumption of the CPU (resulting in platform level power savings). The data also shows the impact of two major power plan settings and the impact of various codecs on platform components, in particular the CPU and Blu-ray* drive. The specific data for Figure 2 is shown in Table 1.<br /><br /><strong>Figure 2. Application-1 Blu-ray* HD Video Playback<br /><br /><img src="http://software.intel.com/file/10858" alt="" /><br /><br />Table 1. Power Consumption during Application-1 Blu-ray* HD Video Playback<br /><br />
<table class="tableformat1" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td rowspan="2" width="109" valign="bottom">
<p><strong></strong></p>
</td>
<td colspan="2" width="181" valign="bottom">
<p><strong>H.264</strong></p>
</td>
<td colspan="2" width="188" valign="bottom">
<p><strong>VC-1</strong></p>
</td>
<td colspan="2" width="163" valign="bottom">
<p><strong>MPEG2</strong></p>
</td>
</tr>
<tr>
<td width="105" valign="bottom">
<p><strong>MaxBattery</strong></p>
</td>
<td width="76" valign="bottom">
<p><strong>MaxPerf</strong></p>
</td>
<td width="109" valign="bottom">
<p><strong>MaxBattery</strong></p>
</td>
<td width="79" valign="bottom">
<p><strong>MaxPerf</strong></p>
</td>
<td width="94" valign="bottom">
<p><strong>MaxBattery</strong></p>
</td>
<td width="68" valign="bottom">
<p><strong>MaxPerf</strong></p>
</td>
</tr>
<tr>
<td width="109" valign="bottom">
<p><strong>Blu-ray Drive</strong><strong></strong></p>
</td>
<td width="105" valign="bottom">
<p align="right"><strong>3.32</strong><strong> </strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>3.57</strong><strong> </strong></p>
</td>
<td width="109" valign="bottom">
<p align="right"><strong>3.74</strong><strong> </strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>3.74</strong><strong> </strong></p>
</td>
<td width="94" valign="bottom">
<p align="right"><strong>2.56</strong><strong> </strong></p>
</td>
<td width="68" valign="bottom">
<p align="right"><strong>1.40</strong><strong> </strong></p>
</td>
</tr>
<tr>
<td width="109" valign="bottom">
<p><strong>HDD</strong></p>
</td>
<td width="105" valign="bottom">
<p align="right"><strong>1.66 </strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>1.88 </strong></p>
</td>
<td width="109" valign="bottom">
<p align="right"><strong>1.60 </strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>1.61 </strong></p>
</td>
<td width="94" valign="bottom">
<p align="right"><strong>1.69 </strong></p>
</td>
<td width="68" valign="bottom">
<p align="right"><strong>1.55 </strong></p>
</td>
</tr>
<tr>
<td width="109" valign="bottom">
<p><strong>CPU</strong></p>
</td>
<td width="105" valign="bottom">
<p align="right"><strong>2.23 </strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>2.29 </strong></p>
</td>
<td width="109" valign="bottom">
<p align="right"><strong>3.26 </strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>3.22 </strong></p>
</td>
<td width="94" valign="bottom">
<p align="right"><strong>2.21 </strong></p>
</td>
<td width="68" valign="bottom">
<p align="right"><strong>2.41 </strong></p>
</td>
</tr>
<tr>
<td width="109" valign="bottom">
<p><strong>Memory</strong></p>
</td>
<td width="105" valign="bottom">
<p align="right"><strong>2.09 </strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>2.29 </strong></p>
</td>
<td width="109" valign="bottom">
<p align="right"><strong>2.33 </strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>2.32 </strong></p>
</td>
<td width="94" valign="bottom">
<p align="right"><strong>1.83 </strong></p>
</td>
<td width="68" valign="bottom">
<p align="right"><strong>1.91 </strong></p>
</td>
</tr>
<tr>
<td width="109" valign="bottom">
<p><strong>GMCH</strong></p>
</td>
<td width="105" valign="bottom">
<p align="right"><strong>4.55 </strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>4.79 </strong></p>
</td>
<td width="109" valign="bottom">
<p align="right"><strong>4.99 </strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>4.88 </strong></p>
</td>
<td width="94" valign="bottom">
<p align="right"><strong>4.52 </strong></p>
</td>
<td width="68" valign="bottom">
<p align="right"><strong>4.55 </strong></p>
</td>
</tr>
<tr>
<td width="109" valign="bottom">
<p><strong>Platform</strong></p>
</td>
<td width="105" valign="bottom">
<p align="right"><strong>19.29 </strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>20.06 </strong></p>
</td>
<td width="109" valign="bottom">
<p align="right"><strong>21.18 </strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>21.05 </strong></p>
</td>
<td width="94" valign="bottom">
<p align="right"><strong>16.34 </strong></p>
</td>
<td width="68" valign="bottom">
<p align="right"><strong>17.65 </strong></p>
</td>
</tr>
<tr>
<td width="109" valign="bottom">
<p><strong>LCD Display</strong></p>
</td>
<td colspan="6" width="532" valign="bottom">
<p><strong>Not instrumented</strong></p>
</td>
</tr>
</tbody>
</table>
</strong>
<p> </p>
<br />From the data in Table 1, we easily see that there is a slight difference between application-1’s two power plan settings, Maximum Battery (MaxBattery) and Maximum Performance (MaxPerf), on platform power components. The greatest total platform power consumption occurred with VC-1 decoding at 21.18W with MaxBattery and 21.05W with MaxPerf. On the other hand, the lowest total platform power consumption occurred with MPEG2 at 16.34W with MaxBattery 17.65W. In the case of H.264, total platform power was at 19.29W with MaxBattery and 20.06W with MaxPerf. <br /><br /><strong>Observations from this study:</strong> <br />1. Various codecs differ considerably in their computational demands and corresponding power consumption. MPEG-2 is the easiest to decode, followed by VC-1, and then H.264, which is the most complex encoding format.<br /><br />1.1 There was no significant difference seen between H.264 and VC-1 encodings. <br />1.2 VC-1 decoding was the highest in power consumption despite the fact H.264 <br />is more computing intensive.<br />1.3 In the case of MPEG2, Blu-ray* Drive power consumption was the lowest<br />compared to VC-1 and H.264, which suggests MPEG2 may have been <br />performing a device content caching or data buffering. <br /><br />2. The power savings due to hardware acceleration come almost completely from the power saved in the CPU. For example, with hardware acceleration on, the CPU power consumption was in the range of ~ (2.2W-3.2W) for all encodings and the different power settings resulted in significant reduction on total platform power in the range of ~ (16.3W-21.2W). Therefore, most of the savings came from the CPU with some additional savings from the memory and chipset. <br /><br />3. The data surprisingly showed MaxBattery local power setting having marginal impact on platform power components compared to the MaxPerf power setting. Also, in the case of VC-1 decoding, MaxBattery ironically had higher energy consumption than MaxPerf.<br /><br /><span class="sectionHeading">HD Video Playback Application-2</span><br /><br />Application-2 is another industry leading video playback application. It also offers special mobile features to enhance battery life for video playback on mobile platforms. Tests were performed with application local power settings Maximum Battery (MaxBattery) and Maximum Performance (MaxPerf). As indicated earlier, one of the primary aims of this paper is to study the impact of these applications local power plans on the platform to reach 2+ hours of playback. <br /><br />Figure 3 shows the platform power consumption. As with Application-1, all decoding was power hungry and higher in particular with H.264 and VC-1 decoding. Once again, significant differences were not seen between Application-2 local power settings except in the case of MPEG2 encoding. <br /><br /><strong>Figure 3. Application-2 Blu-ray* HD Video Playback<br /><br /></strong><img src="http://software.intel.com/file/10859" alt="" /><br /><br /><strong>Table 2. Power Consumption during Application-2 Blu-ray* HD Video Playback</strong><br /><br />
<table class="tableformat1" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td rowspan="2" width="108" valign="bottom">
<p><strong></strong></p>
</td>
<td colspan="2" width="182" valign="bottom">
<p><strong>H.264</strong></p>
</td>
<td colspan="2" width="157" valign="bottom">
<p><strong>VC-1</strong></p>
</td>
<td colspan="2" width="155" valign="bottom">
<p><strong>MPEG2</strong></p>
</td>
</tr>
<tr>
<td width="90" valign="bottom">
<p><strong>MaxBattery</strong></p>
</td>
<td width="92" valign="bottom">
<p><strong>MaxPerf</strong></p>
</td>
<td width="79" valign="bottom">
<p><strong>MaxBattery</strong></p>
</td>
<td width="78" valign="bottom">
<p><strong>MaxPerf</strong></p>
</td>
<td width="79" valign="bottom">
<p><strong>MaxBattery</strong></p>
</td>
<td width="76" valign="bottom">
<p><strong>MaxPerf</strong></p>
</td>
</tr>
<tr>
<td width="108" valign="bottom">
<p><strong>Blu-ray Drive</strong><strong></strong></p>
</td>
<td width="90" valign="bottom">
<p align="right"><strong>3.55</strong></p>
</td>
<td width="92" valign="bottom">
<p align="right"><strong>3.47</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>3.74</strong></p>
</td>
<td width="78" valign="bottom">
<p align="right"><strong>3.66</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>2.85</strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>1.88</strong></p>
</td>
</tr>
<tr>
<td width="108" valign="bottom">
<p><strong>HDD</strong></p>
</td>
<td width="90" valign="bottom">
<p align="right"><strong>1.37</strong></p>
</td>
<td width="92" valign="bottom">
<p align="right"><strong>1.54</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>1.28</strong></p>
</td>
<td width="78" valign="bottom">
<p align="right"><strong>1.56</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>1.27</strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>1.57</strong></p>
</td>
</tr>
<tr>
<td width="108" valign="bottom">
<p><strong>CPU</strong></p>
</td>
<td width="90" valign="bottom">
<p align="right"><strong>3.97</strong></p>
</td>
<td width="92" valign="bottom">
<p align="right"><strong>3.95</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>3.72</strong></p>
</td>
<td width="78" valign="bottom">
<p align="right"><strong>3.78</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>3.87</strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>3.88</strong></p>
</td>
</tr>
<tr>
<td width="108" valign="bottom">
<p><strong>Memory</strong></p>
</td>
<td width="90" valign="bottom">
<p align="right"><strong>1.99</strong></p>
</td>
<td width="92" valign="bottom">
<p align="right"><strong>1.96</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>1.86</strong></p>
</td>
<td width="78" valign="bottom">
<p align="right"><strong>1.93</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>1.39</strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>1.36</strong></p>
</td>
</tr>
<tr>
<td width="108" valign="bottom">
<p><strong>GMCH</strong></p>
</td>
<td width="90" valign="bottom">
<p align="right"><strong>4.74</strong></p>
</td>
<td width="92" valign="bottom">
<p align="right"><strong>4.71</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>4.83</strong></p>
</td>
<td width="78" valign="bottom">
<p align="right"><strong>4.84</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>4.38</strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>4.27</strong></p>
</td>
</tr>
<tr>
<td width="108" valign="bottom">
<p><strong>Platform</strong></p>
</td>
<td width="90" valign="bottom">
<p align="right"><strong>20.88</strong></p>
</td>
<td width="92" valign="bottom">
<p align="right"><strong>21.57</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>21.47</strong></p>
</td>
<td width="78" valign="bottom">
<p align="right"><strong>21.48</strong></p>
</td>
<td width="79" valign="bottom">
<p align="right"><strong>17.67</strong></p>
</td>
<td width="76" valign="bottom">
<p align="right"><strong>19.71</strong></p>
</td>
</tr>
<tr>
<td width="108" valign="bottom">
<p><strong>LCD Display</strong></p>
</td>
<td width="90" valign="bottom">
<p><strong>Not instrumented</strong></p>
</td>
<td width="92" valign="bottom">
<p><strong></strong></p>
</td>
<td width="79" valign="bottom">
<p><strong></strong></p>
</td>
<td width="78" valign="bottom">
<p><strong></strong></p>
</td>
<td width="79" valign="bottom">
<p><strong></strong></p>
</td>
<td width="76" valign="bottom">
<p><strong></strong></p>
</td>
</tr>
</tbody>
</table>
<br /><strong>Observations from this study:</strong> <br />1. There was no significant difference seen on total power consumption between H.264 and VC-1 encodings for both local power settings. <br />2. In the case of MPEG2, Blu-Ray* Drive power consumption was the lowest compared to VC-1 and H.264, which suggests MPEG2 may be utilizing device content caching or data buffering (similar to the observation in Application-1). <br />3. Again, the data surprisingly showed MaxBattery local power setting was having marginal impact on platform power components compared to MaxPerf. Except for the case of MPEG2, 2.04W was the difference between the MaxBattery and MaxPerf power setting.<br /><br /><span class="sectionHeading">Summary</span><br /><br />This section summarizes our findings with particular focus on the impact of Intel® integrated graphics, the support of Intel® Cantiga chipset native hardware accelerator of HD Blu-ray* contents , and the local applications power plan settings and various decoders on the overall total platform power saving. <br /><br /><strong>Figure 4. Total Platform Power Consumption during Blu-ray* HD Video Playback</strong><br /><br /><img src="http://software.intel.com/file/10860" alt="" /><br /><br /><strong>Table 3. Total Platform Power Consumption during Blu-Ray* HD Video playback</strong><br /><br />
<table class="tableformat1" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td rowspan="2" width="67" valign="bottom">
<p><strong></strong></p>
</td>
<td colspan="2" width="197" valign="bottom">
<p align="center"><strong>H.264</strong></p>
</td>
<td colspan="2" width="187" valign="bottom">
<p align="center"><strong>VC-1</strong></p>
</td>
<td colspan="2" width="197" valign="bottom">
<p align="center"><strong>MPEG2</strong></p>
</td>
</tr>
<tr>
<td width="101" valign="bottom">
<p><strong>Application-1</strong></p>
</td>
<td width="96" valign="bottom">
<p><strong>Application-2</strong></p>
</td>
<td width="91" valign="bottom">
<p><strong>Application-1</strong></p>
</td>
<td width="96" valign="bottom">
<p><strong>Application-2</strong></p>
</td>
<td width="101" valign="bottom">
<p><strong>Application-1</strong></p>
</td>
<td width="96" valign="bottom">
<p><strong>Application-2</strong></p>
</td>
</tr>
<tr>
<td width="67" valign="bottom">
<p><strong>Platform-MaxPerf</strong></p>
</td>
<td width="101" valign="bottom">
<p align="right"><strong>20.06</strong><strong> </strong></p>
</td>
<td width="96" valign="bottom">
<p align="right"><strong>21.57</strong></p>
</td>
<td width="91" valign="bottom">
<p align="right"><strong>21.05</strong><strong> </strong></p>
</td>
<td width="96" valign="bottom">
<p align="right"><strong>21.48</strong></p>
</td>
<td width="101" valign="bottom">
<p align="right"><strong>17.65</strong><strong> </strong></p>
</td>
<td width="96" valign="bottom">
<p align="right"><strong>19.71</strong></p>
</td>
</tr>
<tr>
<td width="67" valign="bottom">
<p><strong>Platform-MaxBatt </strong></p>
</td>
<td width="101" valign="bottom">
<p align="right"><strong>19.29</strong><strong> </strong></p>
</td>
<td width="96" valign="bottom">
<p align="right"><strong>20.88</strong></p>
</td>
<td width="91" valign="bottom">
<p align="right"><strong>21.18</strong><strong> </strong></p>
</td>
<td width="96" valign="bottom">
<p align="right"><strong>21.47</strong></p>
</td>
<td width="101" valign="bottom">
<p align="right"><strong>16.34</strong><strong> </strong></p>
</td>
<td width="96" valign="bottom">
<p align="right"><strong>17.67</strong></p>
</td>
</tr>
</tbody>
</table>
<br />Figure 4 and Table 3 focus on Application-1 and Application-2 local plan power settings (MaxPerf and MaxBattery) and provide a relative comparison of the total platform energy consumed by the applications with Intel® integrated HW accelerator set to on all the time during all tests:<br /><br />
<ul>
<li>Application-1 is the most energy efficient software on both MaxPerf and MaxBattery Power settings for all encodings H.264, VC-1 &amp; MPEG2.</li>
<li>MPEG2 encoding significantly consumed less power on Application-1 and Application-2 when compared to H.264-2 and VC-1 encoding. This holds true regardless of application power plan settings.</li>
<li>VC-1 encoding is the most costly encoding in the study on both applications. We find it interesting in the study that there is slight difference with VC-1 encoding on both applications regardless the power plan settings comparing to MPEG2 and H.264.</li>
<li>As indicated early in this paper, there is no significant difference seen on the overall total platform power when plan power is set to MaxPerf on each application for H.264 and VC-1 encodings. Surprisingly, this also holds true that there is no difference between the two applications except in the case of MPEG2.</li>
<li>As indicated before, LCD was not instrumental in this study. From a side power study on the impact of LCD brightness level, full brightness can consume around 5-6 watts, and 50% brightness can consume 3-4 watts. This also depends on LCD configuration such as bit rate and size.</li>
<li>The Table 3 and Figure 4 data above clearly shows the overall total platform power consumed between 16 Watts and 22 Watts for all encodings. This data supports and proves that two hours of HD video playback can easily be reached in all encodings on Intel® Cantiga chipset, in particular with MPEG2 encoding with respect to LCD brightness level and battery size.</li>
</ul>
<br />In conclusion, Figure 4 and Table 3 show a comparative analysis for all applications used in this power study. Application-1 is the most energy-efficient HD video playback software for encoded content. And also, since we see throughout the analysis that there is no significant difference between the two HD applications, we use Table 4 to demonstrate through Application-1 that two hours or more of HD video playback can be reached and validated on the Intel® Cantiga chipset with regard to battery capacity and LCD brightness level. <br /><br /><strong>Table 4. Application-1 Anticipated Time for Different Battery Size and LCD Brightness Level during Blu-ray* HD Video Playback</strong><br /><br />
<table class="tableformat1" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="69" valign="bottom">
<p><strong></strong></p>
</td>
<td width="84" valign="bottom">
<p align="center"><strong>1</strong></p>
</td>
<td width="72" valign="bottom">
<p align="center"><strong>2</strong></p>
</td>
<td width="72" valign="bottom">
<p align="center"><strong>3</strong></p>
</td>
<td width="84" valign="bottom">
<p align="center"><strong>4</strong></p>
</td>
<td width="84" valign="bottom">
<p align="center"><strong>5</strong></p>
</td>
<td width="91" valign="bottom">
<p align="center"><strong>6</strong></p>
</td>
<td width="84" valign="bottom">
<p align="center"><strong>7</strong></p>
</td>
</tr>
<tr>
<td width="69" valign="bottom">
<p><strong>Encoding</strong></p>
</td>
<td width="84" valign="bottom">
<p><strong>Application-1 Maximum Battery Total Platform (W) </strong></p>
</td>
<td width="72" valign="bottom">
<p><strong>Total platform at 100% LCD (~6W) Brightness </strong></p>
</td>
<td width="72" valign="bottom">
<p><strong>Total platform at 50% LCD (~3W) Brightness </strong></p>
</td>
<td width="84" valign="bottom">
<p><strong>Anticipated Time at 54 WHr Battery + 100% Brightness</strong></p>
</td>
<td width="84" valign="bottom">
<p><strong>Anticipated Time at 60 WHr Battery + 100 % Brightness</strong></p>
</td>
<td width="91" valign="bottom">
<p><strong>Anticipated Time at 54 WHr Battery + 50% Brightness</strong></p>
</td>
<td width="84" valign="bottom">
<p><strong>Anticipated Time at 60 WHr Battery + 50% Brightness</strong></p>
</td>
</tr>
<tr>
<td width="69" valign="bottom">
<p><strong>H264</strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>19.29 </strong></p>
</td>
<td width="72" valign="bottom">
<p align="right"><strong>25.29 </strong></p>
</td>
<td width="72" valign="bottom">
<p align="right"><strong>22.29 </strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>2.14</strong><strong> Hr </strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>2.37</strong><strong> Hr. </strong></p>
</td>
<td width="91" valign="bottom">
<p align="right"><strong>2.42</strong><strong> Hr. </strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>2.69</strong><strong> Hr. </strong></p>
</td>
</tr>
<tr>
<td width="69" valign="bottom">
<p><strong>VC-1 </strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>21.18 </strong></p>
</td>
<td width="72" valign="bottom">
<p align="right"><strong>27.18 </strong></p>
</td>
<td width="72" valign="bottom">
<p align="right"><strong>24.18 </strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>1.99 Hr </strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>2.21</strong><strong> Hr. </strong></p>
</td>
<td width="91" valign="bottom">
<p align="right"><strong>2.23</strong><strong> Hr. </strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>2.48</strong><strong> Hr. </strong></p>
</td>
</tr>
<tr>
<td width="69" valign="bottom">
<p><strong>MPEG2</strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>16.34 </strong></p>
</td>
<td width="72" valign="bottom">
<p align="right"><strong>22.34 </strong></p>
</td>
<td width="72" valign="bottom">
<p align="right"><strong>19.34 </strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>2.42</strong><strong> Hr </strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>2.69</strong><strong> Hr. </strong></p>
</td>
<td width="91" valign="bottom">
<p align="right"><strong>2.79</strong><strong> Hr. </strong></p>
</td>
<td width="84" valign="bottom">
<p align="right"><strong>3.10</strong><strong> Hr. </strong></p>
</td>
</tr>
</tbody>
</table>
<br /><span class="sectionHeading">Recommendations</span><br /><br />The study above shows comparative analysis of HD Video playback applications power profiles and the significant effects of running various HD codecs on Intel® Cantiga with native HW Accelerator support, and particularly the energy saved in the CPU and the overall total platform that helped extend battery life compared with an earlier HD Video playback white paper power study . <br /><br />This section makes some recommendations that will help make HD video playback applications Hardware and OS power aware, reducing energy consumption to help extend HD playback time on mobile computers.<br /><br /><strong>Improving HD Application Power-Awareness:<br /></strong><br />Video playback developers highly recommend working on the following areas to help extend and create energy efficient applications: <br />
<ul>
<li><strong>Power Management Schema:</strong> Experiment and research more on their local power management schema since the paper surprisingly shows marginal differences in both H264 and VC-1 encodings on each application local power schema (independent from Vista* power plan).</li>
<li><strong>Power-Awareness Application:</strong> Applications should respond to appropriate OS power events and also shut down non-essential functions that may increase battery energy consumption, i.e., many applications use SetTimer() to invoke particular functionality after a specific time interval.</li>
<li><strong>Explore Device Buffering:</strong> Explore device buffering and content caching for VC-1 and H.264 without having any side performance risk or penalty on loading high intensive computation in the cache. Table 1 and Table 2 show Blu-Ray* Optical Drive power consumptions are below 3 Watts with MPEG encoding that may suggest data efficiency and device buffering .</li>
<li><strong>Optimizing the Application:</strong> Both applications in this paper use SSE2 and SSE3 micro-architecture. We therefore recommend that developer’s video playback applications take steps to optimize software decoders to take advantage of hyper threading, multiple cores, and special instruction sets such as Intel SSE4X and AVX that may help significantly optimize the application. There are also data efficiency techniques such as memory buffering and caching that can further improve energy-efficiency.</li>
</ul>
<strong>Investing in Hardware Power-Awareness:<br /></strong>This paper shows the effect of upgraded hardware such as battery, chipset and CPU on saving power consumption compared to an early power study done on HD video playback. Therefore, OEMs should highly consider investing in and designing on not only the state of the art mobile PCs but also notebook PCs that can last longer in the following areas: <br /><br />
<ul>
<li><strong>Make Better Batteries: <br /></strong>Battery life is a key differentiator in notebook PCs. Therefore, OEMs should increase the adoption of notebook PCs that require focus on longer battery life with high quality and capacity that can go all day on a single charge.</li>
<li><strong>Adapting Energy-Efficient Hardware:</strong> <br />This paper clearly demonstrates the significant impact of Intel–based "Cantiga" platform equipped with hardware accelerated video decoders (Penryn CPU) on power saving compared to Intel-based "Matanzas" platform (Merom CPU). OEMs are highly recommended to continue investment in Intel® consumer platforms, both mobile and desktop, which include energy-efficient devices. In particular, 45nm processor technology (Nehalem) and dedicated hardware accelerators for high-definition video decoding. These new platform architectural changes will reduce CPU and total platform power and boost performance. In addition, Nehalem processors have Quad cores to support multi-threading and Intel® SSE4 and AVX instructions to enhance performance.</li>
</ul>
<strong>Configuring and Utilizing OS Settings Efficiently:</strong><br /><br />Power management schema definitions may vary between platforms. It also holds true<br />that each platform has different energy use profiles. We find it interesting that some <br />OS configurations and settings can make a difference on power saving. Therefore, we<br />recommend the following to help extend battery life: <br /><br />
<ul>
<li><strong>Selecting Appropriate OS Power Plan:</strong> <br />Each platform has different power profile settings that can be changed by the end user. These power settings can differentiate CPU performance and energy-efficiency with respect to CPU frequency. Therefore, before choosing your workload, it’s highly recommended to select the appropriate power plan and power sources such as AC/DC that can have an impact on the overall platform performance and power saving. Also, MS Vista* "Balanced" power profile allows Intel® SpeedStep™ technology and the OS to dynamically change the CPU frequency on demand. Also, it’s recommended to: </li>
<li><strong>Shut down OS unnecessary functions or features:</strong> <br />For OEMs, and ultimately for consumers, OS Vendors (OSV) continue to develop OS features and tools that can make notebook PCs energy efficient and power aware. An effective energy efficient OS should monitor and measure any running process with respect not only to its performance but also to its power behavior. It’s been proven from this paper and other power studies, that video player, Web Media, Screen Saver, system update, Scan, Disc defragmentation, LCD Brightness level and more can impact overall power consumption. Thus, having a system with tools and gadgets that can be aware of the system power source and its status so that the behavior can be changed dynamically, or alerts end user to take an early action against any source of unnecessary running tools/features in order to save power, delivers the best possible user experience.</li>
</ul>
<span class="sectionHeading">Conclusion</span><br /><br />This analysis clearly shows the importance of the Intel® native hardware accelerator. Intel® Mobile Chipset "Cantiga" with its hardware accelerated HD video decode saves energy and validates two hours or more of HD video playback based on the battery capacity, LCD brightness level and software decoders. We have also shown that despite the difference in software decoders, the HD video playback applications show no significant differences in their local power management profiles regardless if power profile setting has been set to Maximum Battery or Maximum Performance. Software decode energy-efficiency can be improved by researching more on their local power schema, using advanced micro architecture instructions, multi-threading, efficient algorithm, and date efficiency. With these approaches in consideration, also having notebook PCs with hardware and OS power awareness, playback time on mobile devices can be extended enough to play an entire HD movie on a single charge from a standard battery.<br /><br /><span class="sectionHeading">References</span><br /><br />[1] Tareq Darwish, Rajshree Chabukswar, Kiefer Kuah and Bob Steigerwald, HD Video Playback Power Consumption Analysis, <a href="http://software.intel.com/en-us/articles/hd-video-playback-power-consumption-analysis">http://software.intel.com/en-us/articles/hd-video-playback-power-consumption-analysis</a><br />[2] Rajshree Chabukswar and Jun De Vega, Power Enabling with Windows Vista* on Intel® Laptop Platform, <a href="http://software.intel.com/en-us/articles/power-enabling-with-windows-vista-on-intel-laptop-platforms">http://software.intel.com/en-us/articles/power-enabling-with-windows-vista-on-intel-laptop-platforms</a><br />[3] Bob Steigerwald, Rajshree Chabukswar, Karthik Krishnan and Jun De Vega, Creating Energy-Efficient Software <a href="http://software.intel.com/en-us/articles/creating-energy-efficient-software-part-1/">http://software.intel.com/en-us/articles/creating-energy-efficient-software-part-1/</a><br />[4] Rajshree Chabukswar, DVD Playback Power Consumption Analysis [1] <a href="http://software.intel.com/en-us/articles/dvd-playback-power-consumption-analysis/">http://software.intel.com/en-us/articles/dvd-playback-power-consumption-analysis/</a><br />[5] Aleksandr Budik, 45-nm Penryn and Nehalem: architectural details, <a href="http://www.digital-daily.com/cpu/intel_penryn_nehalem/">http://www.digital-daily.com/cpu/intel_penryn_nehalem/</a><br /><br /><span class="sectionHeading">About the Authors</span><br /><br /><strong>Tareq Darwish</strong> is a Software Engineer working on Platform Power Enabling as part of client enabling in the Software Solutions Group. His current focus is on defining tools and technologies to support the development of energy-efficient software for Intel-based mobile platforms. Prior to working at Intel, he worked for nine years with Lexmark International in Lexington, Kentucky as a Software Development Engineer. He earned his MS degree in Applied Computing and Software Engineering at Eastern Kentucky University. His email is <a href="http://software.intel.commailto:tareq.h.darwish@intel.com">tareq.h.darwish@intel.com</a><br /><br /><strong>Rajshree Chabukswar</strong> is a Software Engineer working on enabling client platforms through software optimizations in the Software Solutions Group. Prior to working at Intel, she obtained a Masters degree in Computer Engineering from Syracuse University, NY. Her email is <a href="http://software.intel.commailto:rajshree.a.chabukswar@intel.com">rajshree.a.chabukswar@intel.com</a>.<br /> ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-hardware-accelerated-high-definition-video-playback-power-analysis/</link>
      <pubDate>Wed, 25 Feb 2009 00:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/intel-hardware-accelerated-high-definition-video-playback-power-analysis/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-hardware-accelerated-high-definition-video-playback-power-analysis/</guid>
      <category>Mobility</category>
      <category>Intel® AppUp(SM) Developer Community</category>
    </item>
    <item>
      <title>Keeping Memory in Mind</title>
      <description><![CDATA[ <span class="sectionHeading">Introduction</span>
<p><br />Processing power and the size of memory in the average computer has increased significantly over the years. The fact that most computers implement virtual memory and have large amounts of physical memory has desensitized many programmers to the need of conserving physical memory. Many programmers may wonder, "Why should I care about memory; the OS takes care of everything for me?". If you are writing programs targeted to "ultra" mobile devices such as mobile phones or mobile internet devices, known as MIDs, it is important that you care. An example of this kind of device is the Apple iPhone*. In the iPhone SDK, Apple states that only one iPhone application can run at a time. This restriction is not placed on the iPhone because of the operating system since it is a based on a Unix multitasking OS. This restriction is because of small memory size. Desktops and laptops run multiple applications easily since there are typically a gigabyte or more of RAM memory and if the application needs more, the OS will use virtual memory to page in and out of physical memory using "fast enough" storage devices. Mobile devices may have the similarly featured multi-process/multi-tasking operating systems and virtual memory systems but they do not have the large amounts of RAM available for applications to use. This paper presents information and specific examples on how to write applications with memory size and usage in mind so that there is not significant performance degradation when multiple applications are running at the same time.</p>
<span class="sectionHeading"><br />Application Analytics</span>
<p><br />In order to understand how to use memory efficiently in an application, one has to understand how the application uses the memory. Some people confuse optimizing for memory with optimizing for size of a data storage device such as a hard drive or solid state device storage. I bring this up because I do not want to confuse the reader about the size of the program on the mass storage device and the size of the program "in memory". A perfect example of this when I asked a couple of "C/C++" programmers what they do to minimize the amount of memory used in a program -- they said they use the compiler options to "optimize for size". Although it is true this can optimize the in memory footprint of an application, it does not necessarily minimize the in RAM memory, just because the file is smaller on disk. Although, there is a relationship between on disk size and in RAM size, since the executable file on disk is very similar to the in memory footprint, but I digress. Let us first look at what a program looks like "on disk".</p>
<span class="sectionHeading">Applications on Disk (File Formats)</span>
<p><br />There are predominately two types of program file formats, Executable and Linking Format (ELF) and Portable Executable (PE) format. The ELF file typically runs on a Linux/Unix/Apple OS while the PE file typically runs on a Windows* OS. <br /><br /><img src="http://software.intel.com/file/9800" alt="" width="441" height="251" /><br /><br />This paper is not going to go into significant detail on file formats, but it will describe the basics so you can understand how executable files are loaded in memory by the operating system and how to minimize an applications runtime memory footprint. <br /><br />The reason it is important to understand the file format and what the loader has to do is that the executable file on disk is very similar to the in memory footprint after the OS has loaded it. This is intentional so that the program loader has to do a minimal amount of work when loading the program into memory. The program loader will take the binary files and load them as specified in the header sections. Once the program is loaded in memory for the most part the OS can treat it like any another memory mapped file.<br /><br />Having identified file formats, let us look at the types of binary objects that compilers and linkers generate to make up the PE or ELF content. The three executable binary file types generated by a compiler/linker are executables, static libraries, and shared libraries. Executables and shared libraries are be loaded by the operating system at runtime as opposed to static libraries that are included into the other two files directly and "fixed at link time".</p>
<span class="sectionHeading"><br />Static Libraries or Shared Libraries</span>
<p><br />Partitioning software into classes and libraries is in general good coding practice. However, the programmer needs to be aware of the implementation. For example, the use of static libraries in memory can be a problem. Static libraries will be duplicated in physical memory in separately linked binary objects. I have seen many applications that use static libraries duplicating code several times in the same or multiple processes. For example, an executable and a .dll or .so, that both use the same static library, will duplicate the code since they are both independently linked.</p>
<span class="sectionHeading"><br />Shared Libraries</span>
<p><br />A shared library’s implementation is different from that of a static library and it has to be treated differently by the loader; hence, the ELF and PE file formats have to account for the difference. The ELF file format distinguishes the two by providing a Relocation Header Table where as the PE file format embeds the information in the individual sections. The most common sections of the ELF/PE formats are divided into the following section:</p>
<ul>
<li>.text – Contains all of the "code"</li>
<li>.data – Contains all of the initialized data</li>
<li>.idata – Contains import data – names of other files and functions called</li>
<li>.edata – Contains export data – names of files and functions available to other modules</li>
<li>.reloc – Contains relocation information</li>
</ul>
<p>The sections of interest are the .text (code section) and .data sections that may require fix up code.</p>
<span class="sectionHeading"><br />Share and Share Kind of Alike (.dll and .so files)</span>
<p><br />On Windows, a shared library is known as a .dll or dynamic link library and on Linux the shared library is known as a .so or shared object. Although the implementation is different, the characteristics of these libraries or objects are similar. Shared libraries (.dll/.so files) provide the convenience of static libraries from a programmer’s point of view of sharing code without a lot of the memory duplication. Multiple applications may use the same set of libraries without increasing the size of the text when multiple applications are running at the same time. Applications using the same .dll/.so may be (assigned as owning memory by the OS) tagged for some of the same memory but in reality, it is shared with other processes. There is only one copy in physical memory for the read only memory but its usage is charged against every process that has loaded the dynamic library. The actual memory in use can be calculated by subtracting out all shared memory that is counted more than once. If you are interested in calculating exact numbers there are utilities that parse the ELF or PE sections and indentify which sections are shareable.</p>
<p>So why would you ever use static libraries? Shared libraries are nice but they may come with a cost in startup performance. In order to use shared libraries the operating system needs to do the following when the binary is loaded:</p>
<ul>
<li>Locate the shared library on disk</li>
<li>Check to determine if the shared library is already loaded in the process space</li>
<li>Allocate memory for the shared library</li>
<li>Resolve Fix ups for the .text and .data sections</li>
</ul>
<br />
<p>This all takes time and space. Both Windows and Linux provide explicit dynamic linking routines such as LoadLibrary or dlopen() and GetProcAddress() or dlsym() respectively. These routines effectively call the same routines that the implicit linker calls.</p>
<p>On Linux the only shared library that is statically linked is the glibc. Linux uses a pre-link virtual address for all other shared libraries. A compiler switch helps with the fix-up by providing a hint to the compiler to generate code that is designed to be position independent (-fPIC), and it avoids referencing data by absolute address as much as possible. A developer asked me why should we care, doesn’t the operating system take care of all this? The operating system will take care of any of the fix code for you but it is very inefficient if you don’t use the right options and create a shared library just for the sake of creating a shared library. For example a text relocation is a memory address in the "read-execute" text segment of a shared library. Say a non-PIC text segment calls into a memory location that needs to "fixed up" by the runtime. In Linux this is performed by the ld.so in glibc during the startup of the dynamically linked executable. If the developer didn’t design the code correctly there would be significant memory and fix-up penalties associated with the code. For example, a non-PIC compiled libmpg3 library has roughly 6000 memory locations left inside the shared library to point to some 300 functions and data referred to by the instructions. So why not use –fPIC all the time? There may be some cases where you may not want to use PIC. For the code to set up the PIC register (ebx typically) it takes about three instructions and an additional 1 – 2 instructions per symbol accessed in the data object. In addition, the PIC register is being used so the compiler is not free to use it for other purposes resulting in possibly less then optimal code performance by limiting the number of registers available to the compiler. On Linux, to test if, a shared object requires relocation in its text segment, tools such as "readelf –d binary.so" and inspect the output for any TEXTREL entry. The fact that TEXTREL exists indicates that text relocations exist.</p>
<p>On Windows the code that is compiled into a DLL uses a define _WINDLL to provide the compiler hints to avoid position dependent code minimizing fix-ups. Windows also provides for the dll to be rebased which improves the load time of the shared code as well as a minimizes the size of the image directory table since it will be first try to be loaded at the address specified by the rebase address.</p>
<p>So what is a "fix-up"? Fix-ups are adjustments to specific addresses that are not relocatable. For example say I have an STL string</p>
<p>string MyString="Initial Value";</p>
<p>The loader has to allocate the string "Initial Value" in the data segment and initialize a pointer the value. If the shared library needs to be relocated or rebased, the value of the pointer needs to be fixed up at runtime to point to the new address, creating a pointer to a pointer. This also happens to functions.</p>
<p>In general a well designed application can save significant memory by using shared libraries. Let us look an example of an Adobe Air application using their common runtime. It runs in a working set of 58,508 KB with 14,348 KB shareable memory unshared. Now let us launch a second Air Application. It runs in a 43,864 KB working set with 13,748 KB shareable memory unshared. When these two applications run by themselves, there is some sharing with the operating system already but running together, we have increased the amount of memory shared by 12,880 KB, a memory savings of 22 percent in the first application and 29 percent in the second application.</p>
<p>Given the fact it seems obvious to used shared libraries to minimize memory usage, here are some things to think about as a part of your application design. The more shared libraries you use the more fragmented your memory space can be because of code alignment which may result in wasted memory. You can dynamically load and unload shared memory, which may slow down performance when doing specific tasks but significantly improve memory. For example, if you have code that does a specific task infrequently such as converting a file from one format to another format; explicitly load a shared library to do the conversion only when the user requests the conversion and then unload the library.</p>
<p>Now we know we should look very closely at using shared libraries if possible. This includes using the C-Runtime Libraries as much as possible since they will most likely are already loaded. If your application does not dynamically load and unload the shared library, it may not help the amount of memory your application appears to take running by itself but it will significantly help the platform run multiple applications at a time. In the example of the two Air applications, the 29 percent savings may make the difference of placing an artificial limitation of one application at a time on a system. So what are some other things you can do to optimize your application for memory? Now let us look at specific optimizations that will help in improving the size of the application. Let us go back to the comment of "just setting the compiler to optimize for size".</p>
<span class="sectionHeading"><br />Compiler Optimizations for Size</span>
<p><br />Most compilers come with the option of optimizing for size. Typically, what this does is some or all of the following:</p>
<ul>
<li>Disables function, jump, loop and label alignment (removes gaps in memory due to the alignment)</li>
<li>Disables pre-fetch loop arrays</li>
<li>Does not perform loop unrolling</li>
<li>Disables inline of functions</li>
</ul>
<br />
<p>What I have found with 10 different client applications, of varying types is that the code size isn’t significantly improved between the "optimize for size" verses the "full optimization" option, typically less than 5 percent with a few exceptions for specific cases where significant vectorization or loop unrolling is taking place. What I did find is that compilers vary widely in size. In some cases I saw code size differences as much as two times the size. The main reason for the code size swings is because of the optimizations due to performance optimizations as stated earlier. I did do a comparison of the two times the size of binary and it was almost 4 times as fast as the smaller code. A size verse speed tradeoff. Many compilers provide optimizations for specific processors. Target the processor you are running on if possible. This will reduce the size of code by eliminating the other branches of code optimized for other processors if you know your targeted machine. I guess the lesson learn here is look at the compiler and options you select and don’t be afraid to look at other compilers and options. There is not any magic dust you can use that works in all cases for all code. The bottom line is not all compilers are the same, so what about linkers?</p>
<span class="sectionHeading">Check for Incremental Linking and Debug Information</span>
<p><br />Some linkers provide an option to incrementally link files. Although the compiler typically has incremental linking turned on for debug and off for release, make sure it is off in your release code. Incremental linking is nice for developers that are changing code all of the time but is very bad when it comes to releasing code. The way incremental linking works is that on each segment of code it is padded with int 3 so that if the code is changed the linker will only have to re-link the effected section of code up to the padded int 3 region. In large executables the padded int 3 sections can easily put you into the hundreds of kilobytes or even megabytes. Also, remember to remove any debug symbols from the linker options that may be left behind in inadvertently.</p>
<span class="sectionHeading">Conclusion</span>
<p><br />So where does this leave us? I hope that you are a little more informed on how to write programs with tight memory requirements. Designing your applications to take full advantage of shared memory seems to give the biggest benefit. Determine which functions are needed in multiple places and share them. Take advantage of explicit loading of shared memory for large sections of code that rarely get used if possible. You should not only play around with different compiler options, but also look at different compilers and see what it does on your specific code. There is not a one size fits all here, the following list are a few of my findings and remember as you design your application don’t forget to think about memory.</p>
<ul>
<li>
<ul>
<li>Share as much memory as possible within and outside of your application. I have found this usually gives you more memory then all of the other optimization techniques. 
<ul>
<li>Look at how you partition functions or classes. Try to make use of reusable code as much as possible and use shared libraries as the package implementation.</li>
<li>Determine how to best take advantage of other shared libraries that are already loaded on the system. Many applications are using a lot of the same functionality that you are and maybe running simultaneously. For example, a c-runtime is most likely already loaded into memory. If you are considering an application that uses other runtimes such as a Java runtime or Adobe Air runtime, some of the runtime code may be shared with other applications running simultaneously as well. </li>
</ul>
</li>
<li>Experiment with the Compiler and it options 
<ul>
<li>As stated earlier there is not a one size fits all for compilers and their options. Most compilers have an optimize for size option, however what I have found is that in a lot of the cases full optimization produces very similar code perhaps only slightly larger and slightly faster. You almost have to take it on a case-by-case basis.</li>
<li>When selecting compiler options if you know the target you are compiling to and do not plan on sharing the binary across platforms set the compiler option for the targeted platform. This provides about a 5 percent improvement on size and can significantly improve performance of the application since the compiler can be more specific in generating code optimized for the target platform.</li>
<li>Watch for options that may take up significant memory such as loop unrolling or linker options that is designed for debug such as incremental linking. Although loop unrolling may not be the most expensive as far as memory in some applications other it may be significant. Moreover, from what I have seen incremental linking is always an expensive option and provides no runtime benefits.</li>
<li>Watch out for segment alignments. Some compilers align segments of text on large boundaries sometimes up to page sizes of 4096 bytes. This could end up wasting a lot of memory just to align a text or data segment.</li>
</ul>
</li>
<li>Use the stack often but be careful of static or unnecessary variable initializations. Remember if you use initialize the variable it will require you to have a copy of the value in a read only data section in many cases. </li>
</ul>
</li>
</ul> ]]></description>
      <link>http://software.intel.com/en-us/articles/keeping-memory-in-mind/</link>
      <pubDate>Thu, 29 Jan 2009 00:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/keeping-memory-in-mind/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/keeping-memory-in-mind/</guid>
      <category>Mobility</category>
    </item>
  </channel></rss>
