| November 22, 2009 9:00 PM PST | |
The HPC Challenge (HPCC) benchmark suite is a common method to gauge the performance of a cluster. HPCC consists of seven benchmarks that measure a spectrum of system characteristics. The hpcc module for Intel® Cluster Checker runs the HPCC benchmark suite on the cluster and reports ‘Succeeded' or ‘Failed' based on the outcome of the tests.
This article does not cover descriptions or definitions of the individual HPCC benchmarks. For more information about the HPCC benchmarks, see http://icl.cs.utk.edu/hpcc/.
Module configuration affects the results of the hpcc tests
Whether the hpcc module succeeds or fails depends on the configuration of the module in the Intel® Cluster Checker configuration file. The module will execute the HPCC benchmark over each network fabric that is configured in the hpcc module block in the input configuration file. For each network fabric configured, the individual HPCC benchmark test can optionally configure a performance threshold value that must be achieved for a successful result. If a performance threshold is not set, then success of a test is based solely on the benchmark running to completion.
Results when threshold values are configured
When threshold values are set, a benchmark must meet or exceed the configured performance value. Depending on the benchmark, that may mean a result that is equal to or greater than the configured threshold OR a result that is equal to or less than the configured threshold.
|
hpcc module configuration tag |
Measurement unit |
Output characteristics |
Passing result |
|
bandwidth |
GB/s |
Higher is better |
Equal or greater |
|
dgemm |
GFLOPS |
Higher is better |
Equal or greater |
|
fft |
GFLOPS |
Higher is better |
Equal or greater |
|
hpl |
TFLOPS |
Higher is better |
Equal or greater |
|
latency |
µs |
Lower is better |
Equal or less |
|
ptrans |
GB/s |
Higher is better |
Equal or greater |
|
randomacess |
GUPs |
Higher is better |
Equal or greater |
|
stream |
GB/s |
Higher is better |
Equal or greater |
If one of the benchmarks does not meet the configured threshold value, the module will report a failing result identifying the network fabric and the individual failing benchmark(s). For example, using the following configuration, the hpcc module reported the following failure:
<hpcc>
<cc-path>/opt/intel/cce/11.0.069/</cc-path>
<fabric>
<bandwidth>0.003</bandwidth>
<device>sock</device>
<dgemm>5.76</dgemm>
<fft>0.4</fft>
<hpl>0.04</hpl>
<latency>40</latency>
<ptrans>0.10</ptrans>
<randomaccess>0.008</randomaccess>
<stream>1.4</stream>
</fabric>
<mkl-path>/opt/intel/cmkl/10.1.0.015/</mkl-path>
<mpi-path>/opt/intel/impi/3.2/</mpi-path>
<process-number>8</process-number>
<thread-number>1</thread-number>
</hpcc>
HPC Challenge Benchmark (Intel(R) C++ Compiler, Intel(R) MPI
Library, Intel(R) Math Kernel Library), (hpcc)
Attention: this check may take a long time to complete......FAILED
subtest 'PTRANS, GB/s (device = sock)' failed
- failing All hosts returned: '0.0817186'
The module reported a failure because the result of running the PTRANS test was 0.0817186 GB/s which did not meet or exceed the configured value of 0.10 GB/s.
What do failures to meet thresholds mean?
Many system characteristics affect the results of the HPCC benchmark suite, and a reported test failure does not necessarily indicate an under-performing or malfunctioning cluster. Processor speeds, network characteristics, and memory architecture, for instance, all factor into the measured results. Changes in the characteristics of any of those components or sub-systems can affect the outcome of the tests. Therefore, a failure to meet a threshold may be the result of a value configured too high for the characteristics of a particular cluster. The thresholds can be reset to levels that are more appropriate for the specific system to resolve the issue.
A cluster that has historically passed the hpcc module testing where threshold values were configured but begins to fail the test consistently may indicate a problem with one or more components in the system. Make sure that Intel® Cluster Checker was the only application running on the system; other applications running concurrently are likely to impact the measured results of the benchmarks. If failures to meet thresholds persist and there have been no changes to the hardware characteristics of the cluster, then there may be an issue causing the system to exhibit degraded performance that should be resolved.
Intermittent failures to meet threshold values may be the result of threshold levels that are set too high to account for the natural fluctuations in performance of the system. For example, with the PTRANS configuration above, the threshold is set to 0.10. A given cluster may exhibit performance that routinely yields 0.11 GB/s but has fluctuations ranging from 0.095 to 0.12 GB/s. Any fluctuations that dip below the 0.10 threshold will be flagged as a failure. Threshold values should be configured to account for some fluctuations in results, so a better threshold for this example may be 0.09 GB/s.
This article applies to: Intel® Cluster Checker Knowledge Base, Intel® Cluster Ready Knowledge Base
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (0) 
Trackbacks (0)
Leave a comment 
Brock Taylor (Intel)
| ||
Scott McMillan (Intel)
|

