<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Wed, 25 Nov 2009 06:10:39 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/intel-cluster-ready-kb/type/performance-and-optimization/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network articles feed</title>
    <link>http://software.intel.com/en-us/articles/intel-cluster-ready-kb/performance-and-optimization/</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>Configuration of threshold values in the HPCC test module</title>
      <description><![CDATA[ <p>The HPC Challenge (HPCC) benchmark suite is a common method to gauge the performance of a cluster.  HPCC consists of seven benchmarks that measure a spectrum of system characteristics.  The hpcc module for Intel® Cluster Checker runs the HPCC benchmark suite on the cluster and reports ‘Succeeded' or ‘Failed' based on the outcome of the tests.</p>
<p>This article does not cover descriptions or definitions of the individual HPCC benchmarks.  For more information about the HPCC benchmarks, see <a href="http://icl.cs.utk.edu/hpcc/">http://icl.cs.utk.edu/hpcc/</a>.</p>
<p class="sectionHeading">Module configuration affects the results of the hpcc tests</p>
<p>Whether the hpcc module succeeds or fails depends on the configuration of the module in the Intel® Cluster Checker configuration file.  The module will execute the HPCC benchmark over each network fabric that is configured in the hpcc module block in the input configuration file.  For each network fabric configured, the individual HPCC benchmark test can optionally configure a performance threshold value that must be achieved for a successful result.  If a performance threshold is not set, then success of a test is based solely on the benchmark running to completion.</p>
<p class="sectionHeading">Results when threshold values are configured</p>
<p>When threshold values are set, a benchmark must meet or exceed the configured performance value.  Depending on the benchmark, that may mean a result that is equal to or greater than the configured threshold OR a result that is equal to or less than the configured threshold.</p>
<table border="1" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td width="139" valign="top">
<p><b>hpcc module configuration tag</b></p>
</td>
<td width="114" valign="top">
<p><b>Measurement unit</b></p>
</td>
<td width="192" valign="top">
<p><b>Output characteristics</b></p>
</td>
<td width="145" valign="top">
<p><b>Passing result</b></p>
</td>
</tr>
<tr>
<td width="139" valign="top">
<p>bandwidth</p>
</td>
<td width="114" valign="top">
<p>GB/s</p>
</td>
<td width="192" valign="top">
<p>Higher is better</p>
</td>
<td width="145" valign="top">
<p>Equal or greater</p>
</td>
</tr>
<tr>
<td width="139" valign="top">
<p>dgemm</p>
</td>
<td width="114" valign="top">
<p>GFLOPS</p>
</td>
<td width="192" valign="top">
<p>Higher is better</p>
</td>
<td width="145" valign="top">
<p>Equal or greater</p>
</td>
</tr>
<tr>
<td width="139" valign="top">
<p>fft</p>
</td>
<td width="114" valign="top">
<p>GFLOPS</p>
</td>
<td width="192" valign="top">
<p>Higher is better</p>
</td>
<td width="145" valign="top">
<p>Equal or greater</p>
</td>
</tr>
<tr>
<td width="139" valign="top">
<p>hpl</p>
</td>
<td width="114" valign="top">
<p>TFLOPS</p>
</td>
<td width="192" valign="top">
<p>Higher is better</p>
</td>
<td width="145" valign="top">
<p>Equal or greater</p>
</td>
</tr>
<tr>
<td width="139" valign="top">
<p>latency</p>
</td>
<td width="114" valign="top">
<p>µs</p>
</td>
<td width="192" valign="top">
<p>Lower is better</p>
</td>
<td width="145" valign="top">
<p>Equal or less</p>
</td>
</tr>
<tr>
<td width="139" valign="top">
<p>ptrans</p>
</td>
<td width="114" valign="top">
<p>GB/s</p>
</td>
<td width="192" valign="top">
<p>Higher is better</p>
</td>
<td width="145" valign="top">
<p>Equal or greater</p>
</td>
</tr>
<tr>
<td width="139" valign="top">
<p>randomacess</p>
</td>
<td width="114" valign="top">
<p>GUPs</p>
</td>
<td width="192" valign="top">
<p>Higher is better</p>
</td>
<td width="145" valign="top">
<p>Equal or greater</p>
</td>
</tr>
<tr>
<td width="139" valign="top">
<p>stream</p>
</td>
<td width="114" valign="top">
<p>GB/s</p>
</td>
<td width="192" valign="top">
<p>Higher is better</p>
</td>
<td width="145" valign="top">
<p>Equal or greater</p>
</td>
</tr>
</tbody>
</table>
<p><br />If one of the benchmarks does not meet the configured threshold value, the module will report a failing result identifying the network fabric and the individual failing benchmark(s).  For example, using the following configuration, the hpcc module reported the following failure:</p>
<blockquote>
<p>&lt;hpcc&gt;</p>
<p>        &lt;cc-path&gt;/opt/intel/cce/11.0.069/&lt;/cc-path&gt;</p>
<p>        &lt;fabric&gt;</p>
<p>              &lt;bandwidth&gt;0.003&lt;/bandwidth&gt;</p>
<p>              &lt;device&gt;sock&lt;/device&gt;</p>
<p>              &lt;dgemm&gt;5.76&lt;/dgemm&gt;</p>
<p>              &lt;fft&gt;0.4&lt;/fft&gt;</p>
<p>              &lt;hpl&gt;0.04&lt;/hpl&gt;</p>
<p>              &lt;latency&gt;40&lt;/latency&gt;</p>
<p>              &lt;ptrans&gt;0.10&lt;/ptrans&gt;</p>
<p>              &lt;randomaccess&gt;0.008&lt;/randomaccess&gt;</p>
<p>              &lt;stream&gt;1.4&lt;/stream&gt;</p>
<p>        &lt;/fabric&gt;</p>
<p> </p>
<p>        &lt;mkl-path&gt;/opt/intel/cmkl/10.1.0.015/&lt;/mkl-path&gt;</p>
<p>        &lt;mpi-path&gt;/opt/intel/impi/3.2/&lt;/mpi-path&gt;</p>
<p>        &lt;process-number&gt;8&lt;/process-number&gt;</p>
<p>        &lt;thread-number&gt;1&lt;/thread-number&gt;</p>
<p>&lt;/hpcc&gt;</p>
</blockquote>
<p> </p>
<blockquote>
<p>HPC Challenge Benchmark (Intel(R) C++ Compiler, Intel(R) MPI</p>
<p>Library, Intel(R) Math Kernel Library), (hpcc)</p>
<p>Attention: this check may take a long time to complete......FAILED</p>
<p>subtest 'PTRANS, GB/s (device = sock)' failed</p>
<p>  - failing All hosts returned: '0.0817186'</p>
</blockquote>
<p> <br />The module reported a failure because the result of running the PTRANS test was 0.0817186 GB/s which did not meet or exceed the configured value of 0.10 GB/s.</p>
<p class="sectionHeading">What do failures to meet thresholds mean?</p>
<p>Many system characteristics affect the results of the HPCC benchmark suite, and a reported test failure does not necessarily indicate an under-performing or malfunctioning cluster.  Processor speeds, network characteristics, and memory architecture, for instance, all factor into the measured results.  Changes in the characteristics of any of those components or sub-systems can affect the outcome of the tests.  Therefore, a failure to meet a threshold may be the result of a value configured too high for the characteristics of a particular cluster.  The thresholds can be reset to levels that are more appropriate for the specific system to resolve the issue.</p>
<p>A cluster that has historically passed the hpcc module testing where threshold values were configured but begins to fail the test consistently may indicate a problem with one or more components in the system.  Make sure that Intel® Cluster Checker was the only application running on the system; other applications running concurrently are likely to impact the measured results of the benchmarks.  If failures to meet thresholds persist and there have been no changes to the hardware characteristics of the cluster, then there may be an issue causing the system to exhibit degraded performance that should be resolved.</p>
<p>Intermittent failures to meet threshold values may be the result of threshold levels that are set too high to account for the natural fluctuations in performance of the system.  For example, with the PTRANS configuration above, the threshold is set to 0.10.  A given cluster may exhibit performance that routinely yields 0.11 GB/s but has fluctuations ranging from 0.095 to 0.12 GB/s.  Any fluctuations that dip below the 0.10 threshold will be flagged as a failure.  Threshold values should be configured to account for some fluctuations in results, so a better threshold for this example may be 0.09 GB/s.</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/configuration-of-threshold-values-in-the-hpcc-test-module</link>
      <pubDate>Mon, 23 Nov 2009 12:40:48 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/configuration-of-threshold-values-in-the-hpcc-test-module#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/configuration-of-threshold-values-in-the-hpcc-test-module</guid>
      <category>Intel® Cluster Checker Knowledge Base</category>
      <category>Intel® Cluster Ready Knowledge Base</category>
    </item>
    <item>
      <title>Setting the Processor Frequency for Intel® Cluster Checker Performance Tests</title>
      <description><![CDATA[ <strong>Note</strong>: This article is only applicable to clusters on which Enhanced Intel Speedstep® Technology is enabled.<br /><br />Enhanced Intel Speedstep Technology allows systems to achieve very high performance while also conserving power. However, clusters using Enhanced Intel Speedstep Technology may experience intermittent performance failures when checked with Intel Cluster Checker.  Due to the ability of the cluster nodes to dynamically adjust processor frequency, performance results may be lower than expected.  This issue affects nearly all the checks that verify cluster performance, including <span style="font-family:courier new,monospace;">mflops_intel_mkl</span>, <span style="font-family:courier new,monospace;">hpcc</span>, and <span style="font-family:courier new,monospace;">imb_pingpong_intel_mpi</span>.<br /><br />The recommended method to resolve this issue, without disabling Enhanced Intel Speedstep Technology, is to manually set the processor frequency on all cluster nodes.  Processor frequency should be statically configured until Intel Cluster Checker execution is complete. Most versions of Linux provide utilities to control dynamic processor frequency. Two common utilities are <span style="font-family:courier new,monospace;">cpuspeed</span> and <span style="font-family:courier new,monospace;">powersaved</span>.<br /><br />Follow the following steps:<br /><ol>
<li>Manually set all processors in the cluster to maximum frequency. This is normally done executing the CPU frequency scaling utility with the correct options. The command must be executed on all nodes using the cluster-wide execution capability required by Intel Cluster Ready.</li>
<li>Run Intel Cluster Checker.</li>
<li>Once Intel Cluster Checker has completed successfully, restore dynamic CPU frequency scaling on all nodes.</li>
</ol>For example, to set maximum processor frequency on a cluster using <span style="font-family:courier new,monospace;">cpuspeed</span> and <span style="font-family:courier new,monospace;">pdsh</span>:<br />
<blockquote>pdsh –a "killall -SIGUSR1 cpuspeed"</blockquote>
<br />To return the processor frequency scaling to automatic mode:<br />
<blockquote>pdsh –a "killall -SIGHUP cpuspeed"</blockquote>
<br />With the introduction of Intel® Turbo Boost Technology in Intel Core™ i7 processors, processor frequency can exceed the base operating frequency.  Additional steps may be required to force the processors to run at nominal base frequency, instead of maximum frequency.<br /><br />More information on Enhanced Intel Speedstep Technology can be found at <br /><a href="http://software.intel.com/en-us/articles/enhanced-intel-speedstepr-technology-and-demand-based-switching-on-linux/">http://software.intel.com/en-us/articles/enhanced-intel-speedstepr-technology-and-demand-based-switching-on-linux/<br /></a><br />The following article discusses how to implement processor frequency control for high performance computing clusters: <br /><a href="http://software.intel.com/en-us/articles/using-enhanced-intel-speedstep-features-in-hpc-clusters/">http://software.intel.com/en-us/articles/using-enhanced-intel-speedstep-features-in-hpc-clusters/</a><br /> ]]></description>
      <link>http://software.intel.com/en-us/articles/setting-the-processor-frequency-for-intel-cluster-checker-performance-tests</link>
      <pubDate>Thu, 13 Aug 2009 09:54:41 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/setting-the-processor-frequency-for-intel-cluster-checker-performance-tests#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/setting-the-processor-frequency-for-intel-cluster-checker-performance-tests</guid>
      <category>Intel® Cluster Checker Knowledge Base</category>
      <category>Intel® Cluster Ready Knowledge Base</category>
    </item>
  </channel></rss>