How to Complete Certification on Heterogenous Clusters with Intel® Cluster Checker 3.0.1

This is a summary of the necessary steps to complete an Intel® Cluster Ready Certification using Intel® Cluster Checker v3.0.1 on a heterogeneous cluster. In this configuration, the system hardware, software, or fabric configuration of compute nodes is not identical. Minor variations of nodes may be better handled through suppression of messages.

Starting with Intel® Cluster Checker v3.0.1, groups of compute nodes having an identical configuration can be grouped into subclusters. Intel Cluster Checker will automatically group all data collection and analysis.

For general information about the Intel® Cluster Ready certification instructions, please refer to the Intel® Cluster Ready Certification Instructions with Intel® Cluster Checker v3.0 document.

Procedure

On a heterogeneous cluster, the following steps are used to complete Intel® Cluster Ready certification.

  1. Create a node list which contains all cluster nodes (as per Intel® Cluster Ready Certification Instructions with Intel® Cluster Checker v3.0). Define the head node using the "role" keyword and each subcluster using the "subcluster" keyword. For example,

    frontend # role: head
    node1
    node2
    node3
    node5 # subcluster: mic
    node6 # subcluster: mic
    node7 # subcluster: mic
    
  2. Edit the data provider configuration file /opt/intel/clck/3.0.1/etc/clckd.xml, so it complies with the Intel® Cluster Ready architecture specification requirements. It may be necessary to use different configurations for the head node or different subclusters; however, the configuration file for each subcluster is the same.

  3. Run the data collection using the node list with all nodes

    clck-collect -a -f <node_list>
    
  4. For the head node, and if any subclusters contain a single node, add the following code to the analyzer configuration file /opt/intel/clck/3.0.1/etc/clck.xml. Some providers cannot produce data without at least one node-pair. Don't forget to change "NODENAME" to the name of the affected node(s).

    <suppressions>
      <suppress>
        <id>hpl-data-missing</id>
        <node_id>NODENAME</node_id>
      </suppress>
      <suppress>
        <id>mpi_internode-data-missing</id>
        <node_id>NODENAME</node_id>
      </suppress>
      <suppress>
        <id>imb_pingpong-data-missing</id>
        <node_id>NODENAME</node_id>
      </suppress>
    </suppressions>
    
  5. Run the analysis

    clck-analyze -c <analyzer configuration file> -f <node list> 
    
  6. When execution completes successfully, follow the instructions for submission of your results, using the Intel® Cluster Ready Certification Instructions with Intel® Cluster Checker v3.0 document.

There is a known issue in version 3.0.1 when subclusters are used. The data collector does not capture more than one subcluster for the mpi_internode test; however, correct MPI functionality is still evaluated by other modules, such as hpl. To avoid the appearance of mpi_internode missing data messages, add the following to the clck.xml configuration file:

    <suppress>
      <id>mpi_internode-data-missing</id>
    </suppress>

 

Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.
Возможность комментирования русскоязычного контента была отключена. Узнать подробнее.