<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Fri, 25 May 2012 13:06:38 -0700 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/intel-cluster-checker-kb/type/known-issues/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network articles Feed</title>
    <link>http://software.intel.com/en-us/articles/intel-cluster-checker-kb/type/known-issues/</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>The mflops_intel_mkl test module hangs during execution on Scyld Clusterware 5.4</title>
      <description><![CDATA[ <p><span class="sectionHeading">Symptom</span><br /><br />Intel® Cluster Checker hangs during the execution of the <span >mflops_intel_mkl</span> test module on clusters running Penguin Computing* Scyld Clusterware* 5.4.  <br /><br />In addition, inactive or zombie processes named dgemm_mflops may be present on the nodes.  The dgemm_mflops binary is a DGEMM benchmark optimized with the Intel® Math Kernel library.  It is packaged with Intel® Cluster Checker.<br /><br />Debug output provides no other information.<br /><br /><span class="sectionHeading">Cause</span><br /><br />The root cause of this error is undetermined.  It appears to be an incompatibility between the binary version of dgemm_mflops included with Intel® Cluster Checker and Scyld ClusterWare 5.4.  It only occurs when executing over non-interactive SSH. <br /><br /><span class="sectionHeading">Resolution</span><br /><br />Configure Intel® Cluster Checker to build dgemm_mflops from source rather than using the prebuilt binary. This is accomplished using the <span >&lt;build/&gt;</span> configuration tag.<br /><br />The Intel Math Kernel Library must be in the linker path or the <span >&lt;mkl-path&gt;</span> option must also be set. The <span >&lt;mkl-path&gt;</span> option can also be set using the <a href="http://software.intel.com/en-us/articles/CLCK-global-configuration-options/">global configuration capability</a>.  The GNU C Compiler (gcc) must also be present, so it is recommended to add the corresponding test module as a dependency.<br /><br />The following is an example of the updated test module configuration to work around the issue:<br /><br />
<blockquote>&lt;mflops_intel_mkl&gt;<br />  &lt;add_dependency&gt;gcc&lt;/add_dependency&gt; <br />  &lt;build/&gt;<br />  &lt;mkl-path&gt;/opt/intel/cmkl/10.1&lt;/mkl-path&gt;<br />&lt;/mflops_intel_mkl&gt;</blockquote>
</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/the-mflops_intel_mkl-test-module-hangs-during-execution-on-scyld-clusterware-54/</link>
      <pubDate>Wed, 03 Feb 2010 22:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/the-mflops_intel_mkl-test-module-hangs-during-execution-on-scyld-clusterware-54/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/the-mflops_intel_mkl-test-module-hangs-during-execution-on-scyld-clusterware-54/</guid>
      <category>Intel® Cluster Checker Knowledge Base</category>
      <category>Intel® Cluster Ready Knowledge Base</category>
    </item>
    <item>
      <title>Troubleshooting the dmidecode check: Wake-up Type</title>
      <description><![CDATA[ <span class="sectionHeading">Symptom</span><br /><br />The <span >dmidecode</span> module of Intel® Cluster Checker reports inconsistencies in the “Wake-Up Type” value. The error may appear similar to the following example:<br /><br />
<blockquote>SMBIOS/DMI Uniformity,(dmidecode).............................................FAILED<br />subtest 'System Information (0x0002): Wake-up Type' failed<br />- failing hosts node-1, node-2 returned: 'AC Power Restored'<br />- failing host node-0 returned: 'Power Switch'</blockquote>
<br />Modifying the Power Restore Policy or Restore on AC Power Loss setting, using either the IPMI/BIOS configuration interface or the Save and Restore System Configuration utility (syscfg), has no effect on this error.<br /><br /><span class="sectionHeading">Cause<br /></span><br />The “Wake-Up Type” value does not represent the current setting for power restore or power-on response. It describes how the system was powered-on after the last power off, regardless of restore settings. This value may change after each boot, depending on what method was used to power-on the individual node.<br /><br /><span class="sectionHeading">Resolution</span><br /><br />This value stores dynamic data and compute nodes are not expected to have identical data for this value. It can be ignored for the purposes of Intel® Cluster Ready certification and verification.<br /><br />Exclude the value by adding an <span >exclude</span> element to the <span >dmidecode</span> block in the Intel® Cluster Checker configuration file. For example:<br /><br />
<blockquote>&lt;dmidecode&gt;<br />  &lt;exclude&gt;System Information (0x0002): Wake-up Type&lt;/exclude&gt;<br />&lt;/dmidecode&gt;</blockquote>
<br />In Intel® Cluster Checker 1.4, the <span >exclude</span> element has been enhanced to support regular expression matching. In this case, the following option is sufficient:<br /><br />
<blockquote>&lt;dmidecode&gt;<br />  &lt;exclude&gt;Wake-up Type&lt;/exclude&gt;<br />&lt;/dmidecode&gt;</blockquote>
<br />For more information on troubleshooting the <span >dmidecode</span> check, see <a href="http://software.intel.com/en-us/articles/troubleshooting-the-dmidecode-check/">this article</a>.  ]]></description>
      <link>http://software.intel.com/en-us/articles/troubleshooting-the-dmidecode-check-wake-up-type/</link>
      <pubDate>Wed, 16 Dec 2009 22:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/troubleshooting-the-dmidecode-check-wake-up-type/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/troubleshooting-the-dmidecode-check-wake-up-type/</guid>
      <category>Intel® Cluster Checker Knowledge Base</category>
      <category>Intel® Cluster Ready Knowledge Base</category>
    </item>
    <item>
      <title>Random fabric errors on Red Hat Enterprise Linux 5.4</title>
      <description><![CDATA[ <p><b>Problem:</b></p>
<p>The Intel® MPI Library fails intermittently when run over the <code>RDSSM</code> or <code>RDMA</code> devices. Approximately 5-10% of runs fail on RHEL (Red Hat Enterprise Linux) 5.4, but this problem does not occur on earlier versions of RHEL.</p>
<p>When reviewing the debug output, the following error is seen during the Intel MPI Library operations:</p>
<blockquote>setup_listener Cannot assign requested address</blockquote>
<br />
<p><b>Environment:</b></p>
<p>Red Hat Enterprise Linux 5.4 only</p>
<p><b>Root Cause:</b></p>
<p>This error occurs with the Intel MPI Library and the version of OFED (Open Fabrics Enterprise Distribution) included with RHEL 5.4. There is a potential port space conflict with RDS (reliable datagram sockets) and when this port space conflict occurs, uDAPL does not resolve it correctly.</p>
<p>By default, the Intel MPI Library uses its process ID to define its port number. In RHEL 5.4, the process ID can occasionally match a port number that the RDS driver has already allocated, which creates a port space conflict. Currently, uDAPL will reply with the wrong return code to the Intel MPI Library and communication will fail.</p>
<p><b>Resolution:</b></p>
<p>As a temporary workaround, set the following environment variable on all nodes:</p>
<blockquote>$ export I_MPI_RDMA_CREATE_CONN_QUAL = 0</blockquote>
<p>After setting this variable, the Intel MPI Library will not define its port number from its process ID.</p>
<p>This error is resolved in DAPL 2.0.25, to be included in Open Fabrics Enterprise Distribution (OFED) 1.5.  Status of the resolution can be found in <a href="http://www.openfabrics.org/downloads/dapl/" target="_blank">the latest OFED release notes</a>.</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/random-fabric-errors-on-rhel5U4/</link>
      <pubDate>Tue, 01 Dec 2009 22:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/random-fabric-errors-on-rhel5U4/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/random-fabric-errors-on-rhel5U4/</guid>
      <category>Intel® Cluster Checker Knowledge Base</category>
      <category>Intel® Cluster Toolkit for Linux* Knowledge Base</category>
      <category>Intel® MPI Library for Linux* Knowledge Base</category>
    </item>
    <item>
      <title>How to Deal with file_tree issues in Intel® Cluster Checker 1.3</title>
      <description><![CDATA[ <p><em>Note: this article describes behavior in Intel® Cluster Checker version 1.3 Update 2 or earlier.</em><br /><br />The file_tree Intel® Cluster Checker test module verifies that a consistent set of files is present on all cluster nodes.  This consistency is a requirement of the Intel® Cluster Ready Specification.  However, some files are expected to contain data unique to each node, such as hostname and IP address.  While the file_tree test automatically excludes many common examples of such files from its consistency check, special cases exist that are not automatically handled (see the end of this article for a list).  <br /><br />If the file_tree test identifies files that are not consistent across the cluster, you should manually determine whether the reported difference must be resolved by applying the following rules:</p>
<ul>
<li>Does the reported file contain unique information to each node, such as a time stamp or network configuration?  If so, manually verify that other than than the node unique data, the file is identical on all nodes.  </li>
<li>Does the file name itself or the path to the file contain node unique information?  If so, are similar files present on all the nodes?</li>
<li>Is the file dynamically generated or modified, e.g., a log file, compiled from source on each node, or modified by the prelink utility, on each node?  If so, a time stamp or other node unique information may be embedded.</li>
</ul>
<p>If you can determine why the file is different on each node and verify that the differences are not material, then you may ignore the reported errors.  If you are certifying a cluster design as Intel® Cluster Ready, you should include a brief description of why you believe the reported errors are immaterial with your submitted Intel® Cluster Checker output logs.  You should document the files you manually resolved with your cluster design so that other engineers and technicians at your company understand that they can ignore file_tree errors with these files, but should not ignore errors reported for other files.  <br /><br />Forthcoming versions of Intel® Cluster Checker will have the capability to configure which files should be automatically excluded.<br /><br /><span class="sectionHeading">Files that are known to differ from node to node not automatically handled by Intel® Cluster Checker 1.3</span></p>
<ul>
<li>/opt/mlnx-ofed/src/* </li>
<li>/opt/rocks/lib/graphviz/config </li>
<li>/opt/torque/lib64/xpbs* </li>
<li>/opt/torque/mom_logs/* </li>
<li>/usr/java/jdk1.6.0_14/jre/lib/servicetag/registration.xml </li>
<li>/usr/java/jdk1.6.0_14/register.html </li>
<li>/usr/java/jdk1.6.0_14/register_ja.html </li>
<li>/usr/java/jdk1.6.0_14/register_zh_CN.html </li>
<li>/usr/java/i386/jre1.6.0_12/lib/i386/client/classes.jsa </li>
</ul> ]]></description>
      <link>http://software.intel.com/en-us/articles/how-to-deal-with-file_tree-issues-in-intel-cluster-checker-13/</link>
      <pubDate>Mon, 28 Sep 2009 22:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/how-to-deal-with-file_tree-issues-in-intel-cluster-checker-13/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/how-to-deal-with-file_tree-issues-in-intel-cluster-checker-13/</guid>
      <category>Intel® Cluster Checker Knowledge Base</category>
      <category>Intel® Cluster Ready Knowledge Base</category>
    </item>
  </channel></rss>
