User Guide

  • 2021.1
  • 01/08/2021
  • Public Content

Configuring Intel ® Cluster Checker

To run Intel® Cluster Checker, users can run the command
clck
. Which performs data collection followed by immediate analysis. This is the primary suggested usage of Cluster Checker. There may be reasons to perform data collection and analysis separately by running
clck-collect
or
clck-analyze
. All of these commands can be configured using command line options and the configuration file.
Environment Variables
Many of the options for configuring Intel® Cluster Checker, both below and in other sections for this guide, have environmental variable counterparts that also can be set. Examples of environment variables used by Intel® Cluster Checker are:
  • CLCK_ROOT which is set when clckvars.sh/csh is run. This variable sets where the tool’s top level directory is.
  • CLCK_SHARED_TEMP_DIR which is commonly set when running the tool with a higher privilege level. This variable is needed to set a path to a temporary directory that can be accessed by all the nodes.
Command Line Options
There are three ways of running Intel® Cluster Checker from the command line.
  • Collection of data only: clck-collect
  • Analysis of existing data: clck-analyze
  • Combined collection and analysis: clck.
-c / –config=FILE
: Specifies a configuration file. The default configuration file is CLCK_ROOT/etc/clck.xml.
-C / –re-collect-data
: Attempts to re-collect any missing or old data for use in analysis. This option only applies to data collection.
-D / –db=FILE
: Specifies the location of the database file. This option works in clck-analyze and in clck, but not currently in clck-collect.
-e / –environment-propagation-off
: Turns off environment variable propagation for select collector extensions
-f / –nodefile
: Specifies a nodefile containing the list of nodes, one per line. See The Nodefile. If a nodefile is not specified for clck or clck-collect, a Slurm query will be used to determine the available nodes. If no nodefile is specified for clck-analyze, the nodes already present in the database will be used.
-F / –fwd=FILE
: Specifies a framework definition. See the framework definition section in the
Reference
for more details.
. If a framework definition is not specified, the health framework definition is used. This option can be used multiple times to specify multiple framework definitions. To see a list of available framework definitions, use the command line option -X list.
-h / –help
: Displays the help message.
-l / –log-level
: Specifies the output level. Recognized values are (in increasing order of verbosity)**: alert, critical, error, warning, notice, info, and debug. The default log level is error.
-M / –mark-snapshot
: Takes a snapshot of the data used in an analysis. The string used to mark the data cannot contain the comma character “,” or spaces. This option only applies to analysis.
-n / –node-include
: Displays only the specified nodes in the analyzer output.
-o / –logfile
: Specifies a file where the results from the run are written. By default, results are written to clck_results.log.
-r / –permutations
: Number of permutations of nodes to use when running cluster data providers. By default, one permutation will run. This option only applies to data collection.
-S / –ignore-subclusters
: Ignores the subcluster annotations in the nodefile. This option only applies to data collection.
-t / –threshold-too-old
: Sets the minimum number of days since collection that will trigger a data too old error. This option only applies to data analysis.
-v / –version
: Prints the version and exits.
-X / –FWD_description
: Prints a description of the framework definition if available. If the value passed is “list”, then it prints a list of found framework definitions.
-z / –fail-level
: Specifies the lowest severity level at which found issues fail. Recognizes values are (in increasing order of severity)**: informational, warning, and critical. The default level at which issues cause a failure is warning.
–sort-asc
: Organizes the output in ascending order of the specified field. Recognized values are “id”, “node”, and “severity”.
–sort-desc
: Organizes the output in descending order of the specified field. Recognized values are “id”, “node”, and “severity.”
The
clck
command line does accept multiple option inputs on a single command line. It does not accept comma separated input. As a result, scenarios requiring multiple options such as -F, it needs to be repeated. i.e.
clck
-F
health_user
-F
opa_user
-F
mpi_prereq_user
For more information about the available command line options and their uses, run Intel® Cluster Checker with the -h option, or see the man pages.
The Configuration File
Intel® Cluster Checker provides a main configuration file in XML format to allow for more detailed configuration. Settings done on the fly in the command line section can be set and saved in a config file along with more complicated options such as:
  • suppressing certain types of output
  • setting output format overrides
  • setting network interfaces
  • various collect and analysis environment variables
The configuration file is located at
/opt/intel/oneapi/clck/<version>/etc/clck.xml
. Intel® Cluster Checker uses this file by default, or you can pass it your own config file using the ‘-c’ command line option.
Configuring the Database
You can specify a datastore configuration file in the main configuration file using the tags:
<datastore_extensions> <group path="datastore/intel64/"> <entry config_file="default_sqlite.xml">libsqlite.so</entry> </group> </datastore_extensions>
To use odbc instead of sqlite3, enter libodbc.so instead of libsqlite.so. Multiple entry tags will allow you to specify multiple databases through multiple datastore configuration files.
The datastore configuration file, by default, is located at
/opt/intel/oneapi/clck/<version>/etc/datastore/default_sqlite.xml
and takes the following format:
<configuration> <instance_name>clck_default</instance_name> <source_parameters>read_only=false|source=$HOME/.clck/<version>/clck.db</source_parameters> <type>sqlite3</type> <source_types>data</source_types> </configuration>
The ‘instance_name’ tag defines a database source name. This value must be unique.
The ‘source_parameters’ tag determines whether or not to open the database in read-only mode and indicates which database to use.
The ‘type’ tag specifies what type of database to use. Currently, the only accepted value is ‘sqlite3’.
The ‘source_types’ tag specifies what source type to use. Currently, the only accepted value is ‘data’.
Configuring the Default Framework Definition
You can specify a default framework definition in the main configuration file using the tags:
<framework_definitions> <framework_definition>clock</framework_definition> <framework_definition>health_base</framework_definition> </framework_definitions>
The ‘framework_definition’ tag defines what framework(s) will be run by default. If this is not present, the default framework definition will be ‘health_base’. If the ‘’-F’’ option is used, then that will override the default list of frameworks run.
Using Framework Definitions
Intel® Cluster Checker checks can be divided up into categories. We bundle related sets of
providers
and their
analyzer
counterparts in a config file called ‘framework definitions’. A few examples of such areas these bundles can cover are:
  • hardware
  • software
  • networking and fabrics
  • performance
  • memory
These bundles of can be run from the command line with the
-F
<framework
name>
command. A full list of framework definitions can be found in this list or with the command line flag
-X
list
. More on
framework definitions
can be found in the framework definition section of the User Guide.
The Nodefile
As demonstrated earlier in the
Getting Started
section, Intel® Cluster Checker can also configure nodes in various ways. By default, a nodefile is required to run Intel® Cluster Checker using the
-f
<nodefile
name>
option. If launching Cluster Checker through Slurm, no nodefile is required and Slurm will provide Cluster Checker with the assigned node list.
The nodefile format is a single server name per line. The server name must be resolvable by the server and should be the value returned by the command line command
hostname
. This means it is pingable and accessable by that name given. i.e.
ping
node1
would correclty ping a server of name node1.
Note:
The use of the name ‘localhost’ in the nodefile is not currently supported, instead use the servers resolvable hostname.
Nodefiles can also be used to define what role a node plays within a cluster (head, compute, login, etc). This is not a requirement, but can be helpful if your cluster has differently configured servers. Providers can be configured to act differently or ignore nodes with different roles. For example, on a cluster with a login or head node distinct from the compute nodes, marking that node with a non-compute role would exclude it from the benchmarking that it is not set up to handle.
More on node roles can be found in the
Data Collection
section of the User Guide.
Similarly to node roles, nodes can also be part of a subcluster. This is set in the nodefile with a syntax similar to setting roles.
More on subclustering can be found in the subcluster section of the
Data Collection
.
Example nodefile:
head #role: head login #role:login role: compute node1 #role: compute subcluster :A node2 #role: compute subcluster: A node3 #role: enhanced subcluster: A node4 #role: enhanced subcluster: A node5 #role: compute subcluster: B node6 #role: compute subcluster: B node7 #role: enhanced subcluster: B node8 #role: enhanced subcluster: B

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.