The mflops_intel_mkl test module hangs during execution on Scyld Clusterware 5.4

Symptom



Intel® Cluster Checker hangs during the execution of the mflops_intel_mkl test module on clusters running Penguin Computing* Scyld Clusterware* 5.4. 

In addition, inactive or zombie processes named dgemm_mflops may be present on the nodes.  The dgemm_mflops binary is a DGEMM benchmark optimized with the Intel® Math Kernel library.  It is packaged with Intel® Cluster Checker.

Debug output provides no other information.

Cause



The root cause of this error is undetermined.  It appears to be an incompatibility between the binary version of dgemm_mflops included with Intel® Cluster Checker and Scyld ClusterWare 5.4.  It only occurs when executing over non-interactive SSH.

Resolution



Configure Intel® Cluster Checker to build dgemm_mflops from source rather than using the prebuilt binary. This is accomplished using the <build/> configuration tag.

The Intel Math Kernel Library must be in the linker path or the <mkl-path> option must also be set. The <mkl-path> option can also be set using the global configuration capability.  The GNU C Compiler (gcc) must also be present, so it is recommended to add the corresponding test module as a dependency.

The following is an example of the updated test module configuration to work around the issue:

<mflops_intel_mkl>
  <add_dependency>gcc</add_dependency>
  <build/>
  <mkl-path>/opt/intel/cmkl/10.1</mkl-path>
</mflops_intel_mkl>
For more complete information about compiler optimizations, see our Optimization Notice.