Symptom
Intel® Cluster Checker hangs during the execution of the mflops_intel_mkl test module on clusters running Penguin Computing* Scyld Clusterware* 5.4.
In addition, inactive or zombie processes named dgemm_mflops may be present on the nodes. The dgemm_mflops binary is a DGEMM benchmark optimized with the Intel® Math Kernel library. It is packaged with Intel® Cluster Checker.
Debug output provides no other information.
Cause
The root cause of this error is undetermined. It appears to be an incompatibility between the binary version of dgemm_mflops included with Intel® Cluster Checker and Scyld ClusterWare 5.4. It only occurs when executing over non-interactive SSH.
Resolution
Configure Intel® Cluster Checker to build dgemm_mflops from source rather than using the prebuilt binary. This is accomplished using the <build/> configuration tag.
The Intel Math Kernel Library must be in the linker path or the <mkl-path> option must also be set. The <mkl-path> option can also be set using the global configuration capability. The GNU C Compiler (gcc) must also be present, so it is recommended to add the corresponding test module as a dependency.
The following is an example of the updated test module configuration to work around the issue:
<mflops_intel_mkl>
<add_dependency>gcc</add_dependency>
<build/>
<mkl-path>/opt/intel/cmkl/10.1</mkl-path>
</mflops_intel_mkl>