User Guide

  • 2021.1
  • 01/08/2021
  • Public Content

Data Collection

Before Intel® Cluster Checker can identify issues, it must first gather data from the cluster. Intel® Cluster Checker uses providers to collect data from the system and stores that data in a database. Framework definitions determine what data to collect by defining a set of data providers to run.
Running Data Collection
The
clck
program triggers data collection followed immediately by analysis. The
clck-collect
program only triggers data collection.
Typical invocation of the collect commands is:
clck-collect
<options>
By default, Intel® Cluster Checker will collect and analyze data to evaluate the health of the cluster using the
health_base
framework definition.
Limiting the run of data collection as root will prevent problems from running data providers. There may be cases in which running as root is necessary, such as when a provider is attempting to access a tool that is not available to a non-privileged user on a system, but limiting running as root as much as possible is recommended.
Framework Definitions
Framework Definitions, further detailed in the
Framework Definitions
chapter, can be used to select which providers run when running
clck
or
clck-collect
. Framework Definitions can be specified through the command line by using the
-F
/
--framework-definition
command line option. For example, to run
myFramework.xml
, the following command can be used:
clck-collect
<options>
-F
/path/to/myFramework.xml
Custom Framework Definitions can also be specified in the configuration file
/opt/intel/clck/20xy/etc/clck.xml
or
/opt/intel/oneapi/clck/2021.x/etc/clck.xml
. The following example shows how to declare the use of two custom Framework Definitions:
<configuration> <plugins> <framework_definitions> <framework_definition>/path/to/CustomFWD1/xml</framework_definition> <framework_definition>/path/to/CustomFWD2/xml</framework_definition> </framework_definitions> </plugins> ... <configuration>
For more information about Framework Definitions, see the
Framework Definitions
section in the Reference.
Selecting Nodes
The nodefile contains a list of line-separated cluster node hostnames. For compute nodes, the nodefile is a simple list of nodes. For instance, the nodefile provided by a cluster resource manager typically contains just compute nodes and may be used as-is. Intel® Xeon Phi™ coprocessors should be included in the nodefile as independent nodes.
The nodefile is specified using the
-f
file
command line option.
However, in some cases, nodes in the nodefile need to be annotated. The # symbol may be used to introduce comments in a nodefile. Annotations are specially formatted comments containing an annotation keyword following by a colon and a value. Annotations may alter the data collection behavior.
If no nodefile is specified for data collection (via
clck
or
clck-collect
), a Slurm query will be used to determine the available nodes.
Node Roles
The
role
annotation keyword is used to assign a node to one or more roles. A role describes the intended functionality of a node. For example, a node might be a compute node. If no role is explicitly assigned, by default a node is assumed to be a compute node. The role annotation may be repeated to assign a node multiple roles.
For example, the following nodefile defines 4 nodes: node1 is a head and compute node; node2, node3, and node4 are compute nodes; and node5 is disabled.
node1 # role: head role: compute node2 # role: compute node3 # implicitly assumed to be a compute node node4 #node5
Some data providers will only run on nodes with certain roles. For example, data providers that measure performance typically only run on compute or enhanced nodes.
Valid node role values are described below.
  • boot - Provides software imaging / provisioning capabilities.
  • compute - Is a compute resource (mutually exclusive with
    enhanced
    ).
  • enhanced - Provides enhanced compute resources, for example, contains additional memory (mutually exclusive with compute).
  • external - Provides an external network interface.
  • head - Alias for the union of boot, external, job_schedule, login, network_address, and storage.
  • job_schedule - Provides resource manager / job scheduling capabilities.
  • login - Is an interactive login system.
  • network_address - Provides network address to the cluster, for example, DHCP.
  • storage - Provides network storage to the cluster, like NFS.
Subclusters
Some clusters contain groups of nodes, or subclusters, that are homogeneous within the subcluster but differ from the rest of the cluster. For example, one subcluster may be connected with Intel® Omni-Path Host Fabric Interface while the rest of the cluster uses Ethernet.
The
subcluster
annotation keyword is used to assign a node to a subcluster. A node may only belong to a single subcluster. If no subcluster is explicitly assigned, the node is placed into the default subcluster. The subcluster name is an arbitrary string.
For example, the following nodefile defines 2 subclusters, each with 4 compute nodes:
node1 # subcluster: eth node2 # subcluster: eth node3 # subcluster: eth node4 # subcluster: eth node5 # subcluster: ib node6 # subcluster: ib node7 # subcluster: ib node8 # subcluster: ib
By default, cluster data providers will not span subclusters. To override this behavior, use the following
clck-collect
command line option:
-S
/
--ignore-subclusters
Ignore subclusters when running cluster data providers. That is, cluster data providers will span subclusters. The default is not to span subclusters.
Collect Missing or Old Data
A fully populated database is necessary for a complete analysis. However, the database may be partially populated, in which case it is unnecessary to run a full data collection. To avoid re-collecting valid data by only collecting any data that is missing or old, use the data re-collection feature.
To use this feature, run
clck-collect
or
clck
with the
-C
or
--re-collect-data
command line option. This option takes no parameters and causes Intel® Cluster Checker to only collect data that is missing or old. This option is useful to avoid running a full data collection when the database is already populated while still ensuring that all data is present and up to date. If data is missing or old for one or more nodes, that data will be re-collected on all specified (or detected) nodes.
Note
on deprecation: Intel® Cluster Checker will deprecate the re-collect functionality available in the command line or through the configuration file. Rather than only collecting old or missing data, Cluster Checker will run the full data collection phase for the associated framework definitions (FWD).
Environment Propagation
Intel® Cluster Checker will automatically propagate the environment that Intel® Cluster Checker is run on with certain collect extensions. Currently supported by:
  • pdsh
This is done by copying and exporting all environment variables except the following:
  • HOST
  • HOSTTYPE
  • HOSTNAME
  • MACHTYPE
  • OSTYPE
  • PMI_RANK
  • PMI_SIZE
  • PMI_FD
  • MPI_LOCALRANKID
  • MPI_LOCALNRANKS
  • DISPLAY
  • SHLVL
  • BASH_FUNC
  • PWD
  • _
This feature can be turned off:
  • through the environment by running:
    • ‘export CLCK_TURN_OFF_ENV_PROPAGATION=true’
  • through turning it off in clck.xml (or whichever configuration file is used)
    • ‘<turn-off-environment-propagation>on</turn-off-environment-propagation>’
  • or by running with the ‘-e’ flag
Configuration File Options
The following variables alter the behavior of data collection as options in the configuration file.
Extensions
Collect extensions determine how Intel® Cluster Checker collects data. To change which collector extension is used, edit the file
/opt/intel/clck/<version>/etc/clck.xml
or
/opt/intel/oneapi/clck/<version>/etc/clck.xml
. The syntax for selecting a collect extension is as follows:
<collector> <extension>mpi.so</extension> </collector>
Currently, Intel® Cluster Checker uses pdsh by default. The available collect extensions are pdsh (pdsh.so) and Intel® MPI Library or MPICH (mpi.so), both of which are located at
/opt/intel/clck/2019x/collect/intel64
or
/opt/intel/oneapi/clck/2021.x.y/collect/intel64
.
Use of other MPI varieties outside of Intel® MPI Library and MPICH are not expected to work.
Note
when you chose a specific MPI, this MPI will be used for both launching Cluster Checker and running any possible MPI workloads in the framework definitions requested. The use of MPICH to run framework definitions of Intel MPI Benchmarks (IMB) or HPCG Benchmarks will not work. The IMB benchmarks are found in framework definitions starting with ‘
imb_
’ and are also found in a handful of other framework definitions that run benchmarks such as ‘health_extended_user’, or ‘select_solutions_sim_mod_benchmarks_plus_2018.0’.
In order for Intel® MPI Library or MPICH to be succesfully used, the clck.xml file needs to have uncommented the mpi.so extension and the
$PATH
and
$LD_LIBRARY_PATH
information for the desired MPI must be correct. For Intel® MPI Library, insure the appropriate vars.sh/.csh script is sourced, i.e.
source
/opt/intel/oneapi/setvars.sh
or similar. For MPICH
(advanced)
insure PATH and LD_LIBRARY_PATH are configured as defined by the MPICH Installers Guide. i.e. for Bash; export
export
PATH=/path/to/mpich/bin:$PATH
and
export
LD_LIBRARY_PATH=/path/to/mpich/libraries:$LD_LIBRARY_PATH
CLCK_COLLECT_DATABASE_BUSY_TIMEOUT
Specify the amount of time to wait for a database lock to become available.
Environmental variable syntax:
CLCK_COLLECT_DATABASE_BUSY_TIMEOUT=value
where
value
is the number of milliseconds to wait for a database lock to become available before giving up. The value must be greater than 0. The default value is 60,000 milliseconds.
When inserting a new row into the database, the database is locked and any concurrent write attempts are prevented. This value specifies the amount of time that the concurrent write(s) should wait for the database to be unlocked before giving up. If the timeout expires and the database is still locked, the concurrent write(s) will not be successful and the data will be lost.
CLCK_COLLECT_DATABASE_CLOSE_DELAY
Specify the amount of time to wait after data collection has finished for data to arrive.
Environmental variable syntax:
CLCK_COLLECT_DATABASE_CLOSE_DELAY=value
where
value
is the number of seconds to wait after data collection has finished for any remaining data to be accumulated. The value must be greater than 0. The default value is 1 second.
All data that is in the accumulate queue will always be written to the database, but some data may still be on the wire when data collection has finished. This option provides a method to wait an additional amount of time for data to be received by the accumulate server before exiting. Clusters with very slow networks or a very large number of nodes may need to increase this value from the default.
CLCK_COLLECT_DATABASE_VFS_MODULE
Specify the SQLite* VFS module.
Environmental variable syntax:
CLCK_COLLECT_DATABASE_VFS_MODULE=value
where
value
is:
  • unix
    Uses POSIX advisory locks when locking the database. Note that the implementation of POSIX advisory locks on some filesystems, for example, NFS, is incomplete and/or buggy. This value should usually only be selected when the database is located on a local filesystem. unix-dotfile - Uses dot-file locking when locking the database. This value usually works around filesystem implementation issues related to POSIX advisory locks. This is the default value
  • unix-excl
    Obtains and holds an exclusive lock on the database file. All concurrent database operations will be prevented while the lock is held. This value may help in the event of database errors during data collection of if collected data is missing from the database
  • unix-none
    No locking is used. This option should only be used if there is a guarantee that only a single writer will modify the database at any given time. Otherwise, this value can easily result in database corruption if two or more processes are writing to the database concurrently.
The SQLite* OS interface layer, or VFS, can be selected at runtime. The VFSes differ primarily in the way they handle file locking. See http://www.sqlite.org/vfs.html for more information.
CLCK_COLLECTION_TIMEOUT
Specify the amount of time to wait for a collect extention to finish before closing.
Environmental variable syntax:
CLCK_COLLECTION_TIMEOUT=value
where
value
is the number of seconds to wait after for the extension to finish. The value must be greater than 0. The default value is 1 week.
Custom libfabric Provider
Some of the framework definitions use Intel® MPI Library to run MPI benchmarks; for example the Select Solutions framework
select_solutions_sim_mod_benchmarks_base_2018.0
or IMB frameworks such as
imb_pingpong_fabric_performance
. In some scenarios it may be desirable to use a different Libfabric OFI provider when running your MPI application, including those run through Cluster Checker.
To override what Cluster Checker selects you can set the environment variable
I_MPI_OFI_PROVIDER
to a specific libfabric provider. A list of what your server supports can be discovered by running the command
fi_info
. We suggest you set I_MPI_OFI_PROVIDER in your .bashrc file or job submission script;
export
I_MPI_OFI_PROVIDER=sockets
Where the value ‘sockets’ is replaced by a libfabric provider listed in the output of
fi_info
command
By default Intel® Cluster Checker and Intel® MPI Library will choose an optimized fabric provider, but there are scenarios where it is worthwhile to override those defaults for testing.
Note: Intel® TrueScale InfiniBand
- TrueScale IB only has Intel® MPI 2018 support, newer version of Intel(r) MPI Library do no support TrueScale. TrueScale also has limited newer Operating System support. When collecting data with TrueScale IB, be sure to have the environment variables I_MPI_FABRICS=tmi and I_MPI_TMI_PROVIDER=psm present either in slurm script or exported.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.