• 4.0.0
  • 04/10/2020
  • Public Content

Actions
Given a populated database (see the Data Collection chapter), Intel® Cluster Checker analyzes the data to identify issues, diagnose problems, and in some cases, provide recommendations on how to repair the cluster. Invoke the clck-analyze program to perform analysis or clck to perform both collection and analysis. The analysis evaluates the collected data using an embedded expert system.
Running Analysis
There are two ways to analyze data using Intel® Cluster Checker. The command
clck
first collects data and then analyzes it and
clck-analyze
analyzes data in an existing database without collecting new data. If not given any command line options, Intel® Cluster Checker will analyze all nodes in the database by default. To analyze a subset of nodes or to assign node roles, provide a nodefile using the
-f
command line option. For more information about writing a nodefile, see the Selecting Nodes section in the Data Collection chapter. For details about the available command line options, see Configuring Intel® Cluster Checker. A typical use of the analysis command is:
clck-analyze
-f
nodefile
The output to the screen will provide a brief summary of any issues found. Further details will be written to the log file.
Each issue has a category, message ID, a severity, and a list of relevant nodes. It may also have a database row ID and a remedy. The message id is an unique identifier for the issue type. All issues have a primary id; some issues may also have an optional sub-id appended with a colon, id:sub-id. The message id can be used to suppress the issue (see the Suppressions section).
A list of nodes will be displayed with the issue, indicating the nodes in the system to which the issue applies. Node names displayed in parentheses indicate that the issue applies to a pair of nodes, such as MPI latency between a pair of nodes.
The database row id is a list of database entries containing the raw data that led to the issue. Database row ids are only included when debug output is enabled (see Configuring Intel® Cluster Checker).
Some issues recommend a suggested remedy to resolve the issue. Some remedies may require privileged cluster access.
Issues fall into one of two categories:
  • Diagnoses
    • Diagnoses describe the root cause of an issue. For example, MPI performance is substandard because some network setting is mis-configured. The typical process to reach a diagnosis is by combining one or more observations. In this example, an observation for substandard MPI performance and another observation for a mis-configured network setting.
  • Observations
    • An observation is an objective fact about the cluster based on collected data. For example, a cluster’s memory may not be uniform.
Each reported issue should be investigated and either resolved or suppressed (see the Suppressions section). Once the issue is resolved, fresh data should be collected and the analysis repeated. When no issues are reported, the cluster has been successfully verified with Intel® Cluster Checker.
Selecting Nodes
By default,
clck-analyze
will analyze all nodes in the database, while using
clck
may use either Slurm to auto-detect nodes or a nodefile. If a nodefile is supplied, then the list of nodes contained in the nodefile will be used instead of all available nodes in the database. Optional nodefile annotations can also be specified and may alter the analysis output (see the Selecting Nodes section in the “Data Collection” chapter for more details). For example, some rules may only apply to compute nodes and ignore non-compute nodes.
Framework Definition (FWD) Selection
Framework Definitions, further detailed in the Framework Definitions chapter, can be used to select which group of providers will run during data collection and which analyzer extensions and knowledge base modules will run during analysis.
Framework Definitions can be specified through the command line by using the
-F
/
--framework-definition
command line option.
-F
FWD/
--framework-definition
FWD
For instance, the following command would run myFramework.xml:
clck
-F
/path/to/myFramework.xml
Custom FWDs can also be specified in the configuration file
/opt/intel/clck/201n/etc/clck.xml
. The following example shows how to declare the use of two custom definitions:
<
configuration
>
<
analyzer
>
<
framework_definitions
>
<
framework_definition
>/
path
/
to
/
CustomFWD1
.
xml
</
framework_definition
>
<
framework_definition
>/
path
/
to
/
CustomFWD2
.
xml
</
framework_definition
>
</
framework_definitions
>
...
...
</
analyzer
>
...
</
configuration
>
For more information about Framework Definitions, see the
Framework Definitions
section in the Reference.
Suppressions
In some cases, while the issue may be correct, the behavior is actually intended and should not be flagged. Such issues can be suppressed by adding an entry to the configuration file.
The base suppression format is:
<
configuration
>
<
analyzer
>
...
<
suppressions
>
<
suppress
>
<
id
>
string
</
id
>
<
node_id
>
hostname
</
node_id
>
<
severity
>
num
</
severity
>
</
suppress
>
...
</
suppressions
>
...
</
analyzer
>
...
</
configuration
>
Multiple suppressions may be specified.
<id>string</id>
Suppress all issues matching the specified message id string. The default is empty, meaning suppress all message ids that match the other tags. If the message id includes a sub-id and only the primary id is used, then all messages with the same primary id will be suppressed regardless of the sub-id.
<node_id>hostname</node_id>
Suppress all issues corresponding to the specified node. The default is empty, meaning suppress all nodes that match the other tags.
If a tag is omitted, then the default value is used. There is implicit AND logic among tags within a suppression.
The following example will suppress all issues from node4, any issues with message id example-id and with a confidence level of less than 50% on any node, as well as any issues with message id network:eth1 or network:eth2 but not other sub-id values.
<
configuration
>
<
analyzer
>
...
<
suppressions
>
<
suppress
>
<
node_id
>
node4
</
node_id
>
</
suppress
>
<
suppress
>
<
confidence
>
50
</
confidence
>
<
id
>
example
-
id
</
id
>
</
suppress
>
<
suppress
>
<
id
>
network
:
eth1
</
id
>
</
suppress
>
<
suppress
>
<
id
>
network
:
eth2
</
id
>
</
suppress
>
</
suppressions
>
...
</
analyzer
>
...
</
configuration
>
Configuration Options
Intel® Cluster Checker contains both command line options and a configuration file to allow for configuration of the tool. The chapter Configuring Intel® Cluster Checker contains a complete list of command line options and an explanation of the config file.
The config file is in an XML format, and a variety of XML tags are available to configure the behavior of Intel® Cluster Checker. Below is a list of configuration tags that affect analysis.
cluster-mode-uniformity-threshold
Specify the threshold ratio for checking the uniformity of cluster mode entries across the cluster.
XML syntax:
<
config
>
<
cluster
-
mode
-
uniformity
-
threshold
>
NUMBER
</
cluster
-
mode
-
uniformity
-
threshold
>
</
config
>
  • If the percentage of nodes that share the same cluster mode entry value is above the value specified for the cluster-mode-uniformity-threshold tag, then that value is considered uniform in that cluster. If the percentage of nodes that share the same cluster mode entry value is below the uniformity threshold, then a sign is generated.
data-age-threshhold
Specify the maximum age of data points, in seconds, before a data point is considered too old for relevant analysis.
XML syntax:
<
config
>
<
data
-
age
-
threshold
>
NUMBER
</
data
-
age
-
threshold
>
</
config
>
  • The value should be an integer value greater than 0. The default value is 604800 seconds (1 week).
data-source-time-difference
Specify the maximum time difference allowed between timestamps for two data sources that contribute to the same analysis sign.
XML syntax:
<
config
>
<
data
-
source
-
time
-
difference
>
NUMBER
</
data
-
source
-
time
-
difference
>
</
config
>
  • Currently this is only enabled for the dgemm sign substandard-dgemm-due-to-offline-cores.
  • The value should be an integer value greater than 0. The default value is 900 seconds (15 minutes) for dgemm.
dgemm-number-of-mad
Specify the number of median absolute deviations (MADs) allowed before a dgemm value is considered an outlier.
XML syntax:
<
config
>
<
dgemm
-
number
-
of
-
mad
>
NUMBER
</
dgemm
-
number
-
of
-
mad
>
</
config
>
  • The value should be an integer value greater than 0.
dgemm-peak-fraction
Specify the minimum value of the ratio between the measured dgemm performance and theoretical peak performance value.
XML syntax:
<
config
>
<
dgemm
-
peak
-
fraction
>
NUMBER
</
dgemm
-
peak
-
fraction
>
</
config
>
  • Any value below this will generate a sign.
  • The value should be a floating point value between 0 and 1.
environment-blacklist
Specify the environment variable patterns that will be ignored for uniformity comparison across the cluster.
XML syntax:
<
config
>
<
environment
-
blacklist
>
<
entry
>
PATTERN
</
entry
>
<
entry
>
PATTERN
</
entry
>
</
environment
-
blacklist
>
</
config
>
  • The value within each entry tag is interpreted as a POSIX matching regular expression. If this value is not a valid POSIX regular expression, then no filtering will be done.
  • The entry tag can be repeated multiple times.
  • Note that to exactly match meta characters, (^[.*(${()+|?<>), they should be escaped.
hpl-number-of-mad
Specify the number of median absolute deviations (MADs) allowed before an HPL value is considered an outlier.
XML syntax:
<
config
>
<
hpl
-
number
-
of
-
mad
>
NUMBER
</
hpl
-
number
-
of
-
mad
>
</
config
>
  • The value should be an integer value greater than 0.
imb-pingpong-number-of-mad
Specify the number of median absolute deviations (MADs) allowed before a PingPong latency or bandwidth value is considered an outlier.
XML syntax:
<
config
>
<
imb
-
pingpong
-
number
-
of
-
mad
>
NUMBER
</
imb
-
pingpong
-
number
-
of
-
mad
>
</
config
>
  • The value should be an integer value greater than 0.
iozone-number-of-mad
Specify the number of median absolute deviations (MADs) allowed before an iozone value is considered an outlier.
XML syntax:
<
config
>
<
iozone
-
number
-
of
-
mad
>
NUMBER
</
iozone
-
number
-
of
-
mad
>
</
config
>
  • The value should be an integer value greater than 0.
kernel-blacklist
Specify the kernel parameter patterns that will be ignored for uniformity comparisons across the cluster.
XML syntax:
<
config
>
<
kernel
-
blacklist
>
<
entry
>
PATTERN
</
entry
>
<
entry
>
PATTERN
</
entry
>
</
kernel
-
blacklist
>
</
config
>
  • The value within each entry tag is interpreted as a POSIX matching regular expression. If this value is not a valid POSIX regular expression, then no filtering will be done.
  • The entry tag can be repeated multiple times.
  • Note that to exactly match meta characters, (^[.*(${()+|?<>), they should be escaped.
kernel-param-uniformity-threshold
Specify the threshold ratio for checking the uniformity of kernel parameters across the cluster, that is, sysctl entries.
XML syntax:
<
config
>
<
kernel
-
param
-
uniformity
-
threshold
>
NUMBER
</
kernel
-
param
-
uniformity
-
threshold
>
</
config
>
  • If the percentage of nodes that share the same kernel parameter entry value is above the value specified for the kernel-param-uniformity-threshold tag, then that value is considered uniform in that cluster. If the percentage of nodes that share the same kernel parameter entry value is below the uniformity threshold, then a sign is generated.
  • The value should be an floating point value between 0 and 1.
logical-cores-uniformity-threshold
Specify the threshold ratio for checking the uniformity of logical cores across the cluster.
XML syntax:
<
config
>
<
logical
-
cores
-
uniformity
-
threshold
>
NUMBER
</
logical
-
cores
-
uniformity
-
threshold
>
</
config
>
  • If the percentage of nodes that share the same setting is above the value specified for the logical-cores-uniformity-threshold tag, then it is considered uniform on the cluster. If the percentage of nodes that share the same number of logical cores is below the uniformity threshold, then a sign is generated.
  • The value should be a floating point value between 0 and 1. The default value is 0.9.
lshw-blacklist
Specify the lshw output patterns that will be ignored for uniformity comparison across the cluster.
XML syntax:
<
config
>
<
lshw
-
blacklist
>
<
entry
>
PATTERN
</
entry
>
<
entry
>
PATTERN
</
entry
>
</
lshw
-
blacklist
>
</
config
>
  • The value within each entry tag is interpreted as a POSIX matching regular expression. If this value is not a valid POSIX regular expression, then no filtering will be done.
  • The entry tag can be repeated multiple times.
  • Note that to exactly match meta characters, (^[.*(${()+|?<>), they should be escaped.
lshw-uniformity-threshold
Specify the threshold ratio for checking the uniformity of lshw entries across the cluster.
XML syntax:
<
config
>
<
lshw
-
uniformity
-
threshold
>
NUMBER
</
lshw
-
uniformity
-
threshold
>
</
config
>
  • If the percentage of nodes that share the same lshw entry value is above the value specified for the lshw-uniformity-threshold tag, then that value is considered uniform in that cluster. If the percentage of nodes that share the same lshw entry value is below the uniformity threshold, then a sign is generated.
  • The value should be an floating point value between 0 and 1.
memory-mode-uniformity-threshold
Specify the threshold ratio for checking the uniformity of memory mode entries across the cluster.
XML syntax:
<
config
>
<
memory
-
mode
-
uniformity
-
threshold
>
NUMBER
</
memory
-
mode
-
uniformity
-
threshold
>
</
config
>
  • If the percentage of nodes that share the same memory mode entry value is above the value specified for the memory-mode-uniformity-threshold tag, then that value is considered uniform in that cluster. If the percentage of nodes that share the same memory mode entry value is below the uniformity threshold, then a sign is generated.
  • The value should be an floating point value between 0 and 1.
memory-uniformity-threshold
Specify the maximum allowable deviation, in bytes, from the median memory size before a memory size is considered non-uniform.
XML syntax:
<
config
>
<
memory
-
uniformity
-
threshold
>
NUMBER
</
memory
-
uniformitythreshold
>
</
config
>
  • Any value greater than 0 can be used for this tag. The default value is 268435456 bytes (256 MB).
ntp-offset-threshold
Specify the maximum offset value an NTP peer can have before a sign is generated.
XML syntax:
<
config
>
<
ntp
-
offset
-
threshold
>
NUMBER
</
ntp
-
offset
-
threshold
>
</
config
>
  • Any floating point value can be used for this tag.
outlier-max-median-mad-dist
Specify the maximum distance, in orders of magnitude, between the median and median absolute deviation (MAD) for the MAD outlier algorithm to be used.
XML syntax:
<
config
>
<
outlier
-
max
-
median
-
mad
-
dist
>
NUMBER
</
outlier
-
max
-
median
-
maddist
>
</
config
>
  • If the allowable distance is exceeded, then the MAD outlier algorithm is disabled and a fallback algorithm (controlled by the outlier-median-pct tag) is used for outlier rules.
The following describes the test controlled by the
outlier-max-median-mad-dist
tag:
if
(
|
median
-
MAD
|
<
10
^
outlier
-
max
-
median
-
mad
-
dist
)
then
<
use
MAD
outlier
algorithm
>
else
<
use
fallback
outlier
algorithm
>
  • Any value greater than 0 can be used for this tag. The default value is 2.5.
outlier-median-pct
Percentage of the median used to calculate outliers by the fallback algorithm.
XML syntax:
<
config
>
<
outlier
-
median
-
pct
>
NUMBER
</
outlier
-
median
-
pct
>
</
config
>
  • The outlier-median-pct determines the distance from the median that a sample value is allowed to be before it is considered an outlier in the fallback outlier algorithm. The outlier-median-pct value is divided by 100 and multiplied by the median to get an allowable distance. If the sample value is further away from the median than the allowable distance, the sample value is considered an outlier.
The following describes the fallback outlier algorithm controlled by the outlier-median-pct tag:
if
(
|
median
-
sample_value
|
>
(
median
*
(
outlier
-
median
-
pct
/
100
)
)
then
<
the
sample_value
is
an
outlier
>
else
<
the
sample_value
is
not
an
outlier
>
  • Any value between 0 and 100 can be used. The default value is 5.
preferred-cluster-mode
Specify the preferred cluster mode for Intel® Xeon Phi™ processor.
XML syntax:
<
config
>
<
preferred
-
cluster
-
mode
>
MODE
</
preferred
-
cluster
-
mode
>
</
config
>
  • Valid values for
    MODE
    are
    All2All
    ,
    SNC2
    ,
    SNC4
    ,
    Hemisphere
    and
    Quadrant
    .
preferred-memory-mode
Specify the preferred memory mode for Intel® Xeon Phi™ processor.
XML syntax:
<
config
>
<
preferred
-
memory
-
mode
>
MODE
</
preferred
-
memory
-
mode
>
</
config
>
  • Valid values for MODE are Flat, Cache, Hybrid25 and Hybrid50.
preferred-tickless-cores
Specify the list of cores for the nohz_full kernel parameter for the Intel® Xeon Phi™ processor.
XML syntax:
<
config
>
<
preferred
-
tickless
-
cores
>
core
list
</
preferred
-
tickless
-
cores
>
</
config
>
  • 128-255, 1,2,7-9, 1,6,9 are examples of valid values.
preferred-turbo-status
Specify the preferred Intel® Turbo Boost Technology status for the processor.
XML syntax:
<
config
>
<
preferred
-
turbo
-
status
>
STATUS
</
preferred
-
turbo
-
status
>
</
config
>
  • The valid values are enabled and disabled.
rpm-uniformity-threshold
Specify the threshold ratio for checking whether each rpm file installed on a node is uniform across the cluster.
XML syntax:
<
config
>
<
rpm
-
uniformity
-
threshold
>
NUMBER
</
rpm
-
uniformity
-
threshold
>
</
config
>
  • If the percentage of nodes that share the same rpm file is above the value specified for the rpm-uniformity-threshold tag, then that rpm file is considered uniform on the cluster. If the percentage of nodes that share the same rpm file is below the uniformity threshold, then a sign is generated.
  • The value should be a floating point value between 0 and 1.
storage-max-used-pct
Specify the maximum percentage of space that can be used on a disk partition.
XML syntax:
<
config
>
<
storage
-
max
-
used
-
pct
>
NUMBER
</
storage
-
max
-
used
-
pct
>
</
config
>
  • If the percentage is exceeded on a disk partition, then a sign is emitted.
  • Any value between 0 and 100 can be used for this tag. The default value is 85.
stream-number-of-mad
Specify the number of median absolute deviations (MADs) allowed before a stream value is considered an outlier.
XML syntax:
<
config
>
<
stream
-
number
-
of
-
mad
>
NUMBER
</
stream
-
number
-
of
-
mad
>
</
config
>
  • The value should be an integer value greater than 0.
threads-per-core-uniformity-threshold
Specify the threshold ratio for checking the uniformity of threads available per core across the cluster.
XML syntax:
<
config
>
<
threads
-
per
-
core
-
uniformity
-
threshold
>
NUMBER
</
threads
-
per
-
core
-
uniformity
-
threshold
>
</
config
>
  • If the percentage of nodes that share the same setting is above the value specified for the threads-per-core-uniformity-threshold tag, then it is considered uniform on the cluster. If the percentage of nodes that share the same threads available per core is below the uniformity threshold, then a sign is generated.
  • The value should be a floating point value between 0 and 1. The default value is 0.9.
turbo-status-uniformity-threshold
Specify the threshold ratio for checking the uniformity of Intel® Turbo Boost Technology status (enabled or disabled) on a set of nodes within the cluster.
XML syntax:
<
config
>
<
turbo
-
status
-
uniformity
-
threshold
>
NUMBER
</
turbo
-
status
-
uniformity
-
threshold
>
</
config
>
  • If the percentage of nodes that share the same Intel® Turbo Boost Technology status is above the value specified for the turbo-status-uniformity-threshold tag, then Intel® Turbo Boost Technology status is considered uniform on the cluster; otherwise a sign is generated.
  • The value should be a floating point value between 0 and 1. The default value is set to 0.9.
The Node Group Feature
Node group is a beta feature. The node group feature enables finer-grained control of the analysis capabilities of the Intel® Cluster Checker on collected data by defining groups of nodes and assign tests through individual framework definitions (FWDs) to these node groups.
The intent of this feature is to enable the analysis by common features or attributes of nodes in a heterogenous cluster, for example if there are different groups of the same type or speed of a processor, speed/type/size of memory, or even the communications fabric. While Intel® Cluster Checker can collect data for all of the nodes in a heterogenous cluster at the same time, it can separate the analysis by the compute nodes common attributes using a node group configuration file. Instead of reporting the differences between all nodes of all different groups, now individual groups can be analyzed by their specific characteristics.
There are no changes to Intel® Cluster Checker performing the collection of the data - this feature only changes the analysis of the collected data by user-defined grouping in a “node group” file. An example of a node group configuration file is provided below.
The Node Group Command Line Option
This feature is enabled by running the command
clck
(or
clck-analyze
) with the command line option
-g/--groupfile
<nodegroupconfig>
with an appropriate node group configuration file. For example
clck
-F
health_admin
-g
path/to/my/nodegroupconfig.xml
as a privileged user
clck
-F
health_extended_user
-g
path/to/my/nodegroupconfig.xml
as a non-privileged user
If analysis is run separately from data collection by
clck-analyze
, the FWDs used in the analysis must have been included in the collection stage to provide sufficient data to later analyze the node group.
An example node group configuration file can be found in at:
$CLCK_HOME/etc/example_groups_cpu_mem.xml
The Node Group Configuration File
A node group configuration file defines a collection of compute nodes whose data is analyzed together based upon specific FWDs.
The node group configuration file is an XML file that lists group definitions and test definitions:
  • Group definitions specify node group names and which nodes belong in each groups.
  • Test definitions specify which set of tests (framework definitions) are assigned to which groups.
Here is an example node group configuration file body, listing two groups and one FWD:
<?xml version="1.0" encoding="UTF-8"?> <node_group_config> <nodegroup name="A"> ... <\nodegroup> <nodegroup name="B"> ... <\nodegroup> <fwd name="memory_uniformity"> ... <group>A<\group>...<\fwd> <\node_group_config>
When a node group configuration file is applied to Intel® Cluster Checker analysis (
-g/--groupfile
), tests of enlisted FWDs limit their comparisons to only compare nodes in the specified groups for each test. This way uniformity tests (hardware, firmware, network, software, performance etc…) can reflect the actual configuration of a heterogeneous cluster.
Example of Defining a Node Group
The section of the node group configuration file will contain one or more defined node groups. The node group section allows you to assign a group name to multiple servers which can then be used for grouped analysis. Here is a single node group example:
<
nodegroup
name
=
"A"
>
<
nodefiles
>
<
path
>
Path
/
to
/
a
/
nodefile
</
path
>
</
nodefiles
>
<
nodes
>
<
node
>
c01
</
node
>
<
node
>
c04
</
node
>
</
nodes
>
</
nodegroup
>
In this example nodes c01 and c04 along with any nodes specified in the nodefile (same format as with the
-f
option, single server name per line matching the
hostname
output) are assigned to group
A
. Nodes can be assigned to multiple groups.
Names of node groups must have no spaces. Group name
All
is reserved for the group containing all nodes being run. If a FWD is assigned to specific node groups, but not all nodes are included by its group assignments, an according group with the remaining nodes is added automatically, named
All-except-nodegroup-<group-name>
.
If a node that is not being analyzed is included in a group, it will be ignored. Groups with no nodes being analyzed in it in an actual run will also be ignored.
The
<nodegroup>
XML section accepts using nodefiles, individually listed nodes or combinations of both as shown in the example above.
Example of Using Node Groups for a Specific Test
This section of the XML configuration file allows the user to define which node groups should be associated with which specific tests (or Framework Definition,
fwd
) during the analysis phase. This is an example of assigning node groups to specific tests:
<
fwd
name
=
"cpu_user"
>
<
nodegroups
>
<
group
>
A
</
group
>
<
group
>
B
</
group
>
<
group
>
C
</
group
>
</
nodegroups
>
</
fwd
>
Here analyzing the tests specified in the framework definition cpu_user are analyzed within each of the groups
A
,
B
and
C
. The grouping of analysis is applied to any of the explicitly specified FWDs. Analysis of a FWD not specified for grouping which includes a FWD that is explicitly listed for grouping, will apply grouping to the included FWD explicitly listed for grouping. (e.g. in this example
-F
health_extended_user
which includes the FWD
cpu_user
will analyze all uniformity tests on the single group of
All
nodes, except the test of
cpu_user
, which will be analyzed only within the three individual groups
A
,
B
and
C
.
Of note
<fwd
name="xyz">
must be a valid framework definition either included with Intel® Cluster Checker or a user defined framework definition. In the example above,
cpu_user
is a valid framework definition. In this paragraph
xyz
is invalid by default and as a result would not be used.
All three groups will analyze the test
cpu_user
independently of each other and a group consisting of all remaining nodes will be created if there are remaining nodes (called group
All-except-nodegroups-A-B-C
) and will run the analysis on the group of remaining nodes separately. This means nodes in group
A
will only be compared against other nodes in group
A
. If you desire testing of node groups
A
&
B
, nodes in nodegroup
B
should be added to node group
A
or a completely independent node group created just for this test. Groups can be assigned to more than one test.
Tests on FWDs in the node group configuration file that are not run through either the command line command (through the
-F
option), the configuration file (default being
$CLCK_HOME/etc/clck.xml
), nor included in a framework run through will be ignored. If a test does not have any nodes included in the groups assigned to it, analysis will ignore that entry.
Complete Example File with Output
A full example of a complete node group configuration file for a simple configuration of 8 compute nodes in a cluster with hostnames c[01-08], with two types of processors, and two sizes of memory;
  • compute nodes with Intel® Xeon 6148 processors
    • 4 nodes with 192GB memory
  • compute nodes with Intel® Xeon 6258R processors
    • 2 nodes with 192GB memory
    • 2 nodes with 768GB memory
Then the console output is shown below using the example command line as a privileged user
#
clck
-F
health_admin
-g
path/to/my/nodegroupconfig.xml
A regular user could apply the same node group configuration file to any of his Intel® Cluster Checker tests as well in the same way, e.g.
$
clck
-F
health_extended_user
-g
path/to/my/nodegroupconfig.xml
This is the example node group configuration file for above configuration:
<?xml version="1.0" encoding="UTF-8"?> <node_group_config> <!-->Node group files can be specified with the '-g <file path/name>' option.<--> <!-->This option is incompatible with subclustering dfined in the node file.<--> <!-->List group name (no spaces) and included nodes.<--> <!-->The name "All" is a reserved node group name and will be ignored.<--> <nodegroup name="xeon6148"> <nodes> <node>c01</node> <node>c02</node> <node>c03</node> <node>c04</node> </nodes> </nodegroup> <nodegroup name="xeon6258R"> <nodes> <node>c05</node> <node>c06</node> <node>c07</node> <node>c08</node> </nodes> </nodegroup> <nodegroup name="xeon6148_mem192GB"> <nodes> <node>c01</node> <node>c02</node> <node>c03</node> <node>c04</node> </nodes> </nodegroup> <nodegroup name="xeon6258R_mem192GB"> <nodes> <node>c05</node> <node>c06</node> </nodes> </nodegroup> <nodegroup name="xeon6258R_mem768GB"> <nodes> <node>c07</node> <node>c08</node> </nodes> </nodegroup> <!-->List which Framework definitions will run on which groups<--> <!-->Frameworks not being run with or included by -F or -c config will be skipped.<--> <fwd name="cpu_base"> <nodegroups> <group>xeon6148</group> <group>xeon6258R</group> </nodegroups> </fwd> <fwd name="dgemm_cpu_performance"> <nodegroups> <group>xeon6148_mem192GB</group> <group>xeon6258R_mem192GB</group> <group>xeon6258R_mem768GB</group> </nodegroups> </fwd> <fwd name="sgemm_cpu_performance"> <nodegroups> <group>xeon6148_mem192GB</group> <group>xeon6258R_mem192GB</group> <group>xeon6258R_mem768GB</group> </nodegroups> </fwd> <fwd name="avx512_performance_ratios_user"> <nodegroups> <group>xeon6148_mem192GB</group> <group>xeon6258R_mem192GB</group> <group>xeon6258R_mem768GB</group> </nodegroups> </fwd> <fwd name="avx512_performance_ratios_priv"> <nodegroups> <group>xeon6148_mem192GB</group> <group>xeon6258R_mem192GB</group> <group>xeon6258R_mem768GB</group> </nodegroups> </fwd> <fwd name="hpl_cluster_performance"> <nodegroups> <group>xeon6148_mem192GB</group> <group>xeon6258R_mem192GB</group> <group>xeon6258R_mem768GB</group> </nodegroups> </fwd> <fwd name="stream_memory_bandwidth_performance"> <nodegroups> <group>xeon6148_mem192GB</group> <group>xeon6258R_mem192GB</group> <group>xeon6258R_mem768GB</group> </nodegroups> </fwd> <fwd name="syscfg_settings_uniformity"> <nodegroups> <group>xeon6148_mem192GB</group> <group>xeon6258R_mem192GB</group> <group>xeon6258R_mem768GB</group> </nodegroups> </fwd> <fwd name="kernel_parameter_uniformity"> <nodegroups> <group>xeon6148_mem192GB</group> <group>xeon6258R_mem192GB</group> <group>xeon6258R_mem768GB</group> </nodegroups> </fwd> <fwd name="lshw_hardware_uniformity"> <nodegroups> <group>xeon6148_mem192GB</group> <group>xeon6258R_mem192GB</group> <group>xeon6258R_mem768GB</group> </nodegroups> </fwd> <fwd name="memory_uniformity"> <nodegroups> <group>mem192GB</group> <group>mem768GB</group> </nodegroups> </fwd> </node_group_config>
Here is the console output of this command:
-
g
/--
groupfile
is
a
beta
feature
currently
in
development
.
Intel
(
R
)
Cluster
Checker
2021.1
Beta
6
(
build
20200403
)
Running
Collect
................................................................................................................................................................................................................................................................................................................................................................
Running
Analyze
SUMMARY
Command
-
line
:
clck
-
f
nodelist
-
g
example_groups_cpu_mem
.
xml
-
F
health_admin
Tests
Run
:
health_admin
Overall
Result
:
6
issues
found
-
FUNCTIONALITY
(
2
),
HARDWARE
UNIFORMITY
(
2
),
SOFTWARE
UNIFORMITY
(
2
)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8
nodes
tested
:
c
[
01
-
08
]
.
skl
0
nodes
with
no
issues
:
8
nodes
with
issues
:
c
[
01
-
08
]
.
skl
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Intel
(
R
)
Cluster
Checker
completed
analysis
with
the
following
groups
:
User
-
Configured
Groups
(
Defined
in
example_groups_cpu_mem
.
xml
)
1.
Group
"mem192GB"
Nodes
:
c
[
01
-
06
]
.
skl
Tests
:
memory_uniformity
2.
Group
"mem768GB"
Nodes
:
c
[
07
-
08
]
.
skl
Tests
:
memory_uniformity
3.
Group
"xeon6148"
Nodes
:
c
[
01
-
04
]
.
skl
Tests
:
cpu_base
4.
Group
"xeon6148_mem192GB"
Nodes
:
c
[
01
-
04
]
.
skl
Tests
:
dgemm_cpu_performance
,
stream_memory_bandwidth_performance
5.
Group
"xeon6258R"
Nodes
:
c
[
05
-
08
]
.
skl
Tests
:
cpu_base
6.
Group
"xeon6258R_mem192GB"
Nodes
:
c
[
05
-
06
]
.
skl
Tests
:
dgemm_cpu_performance
,
stream_memory_bandwidth_performance
7.
Group
"xeon6258R_mem768GB"
Nodes
:
c
[
07
-
08
]
.
skl
Tests
:
dgemm_cpu_performance
,
stream_memory_bandwidth_performance
Automatically
Configured
Groups
1.
Group
"All"
Nodes
:
c
[
01
-
08
]
.
skl
Tests
:
health_admin
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
FUNCTIONALITY
The
following
functionality
issues
were
detected
:
Group
"mem192GB"
:
c
[
01
-
06
]
.
skl
No
issues
detected
.
Group
"mem768GB"
:
c
[
07
-
08
]
.
skl
No
issues
detected
.
Group
"xeon6148"
:
c
[
01
-
04
]
.
skl
No
issues
detected
.
Group
"xeon6148_mem192GB"
:
c
[
01
-
04
]
.
skl
No
issues
detected
.
Group
"xeon6258R"
:
c
[
05
-
08
]
.
skl
No
issues
detected
.
Group
"xeon6258R_mem192GB"
:
c
[
05
-
06
]
.
skl
No
issues
detected
.
Group
"xeon6258R_mem768GB"
:
c
[
07
-
08
]
.
skl
No
issues
detected
.
Group
"All"
:
c
[
01
-
08
]
.
skl
1.
Intel
(
R
)
Turbo
Boost
Technology
is
disabled
.
1
node
:
c03
.
skl
2.
The
Intel
(
R
)
Cluster
Checker
requires
the
Intel
(
R
)
Omni
-
Path
tool
'opasmaquery'
.
1
node
:
c01
.
skl
HARDWARE
UNIFORMITY
The
following
hardware
uniformity
issues
were
detected
:
Group
"mem192GB"
:
c
[
01
-
06
]
.
skl
No
issues
detected
.
Group
"mem768GB"
:
c
[
07
-
08
]
.
skl
No
issues
detected
.
Group
"xeon6148"
:
c
[
01
-
04
]
.
skl
No
issues
detected
.
Group
"xeon6148_mem192GB"
:
c
[
01
-
04
]
.
skl
No
issues
detected
.
Group
"xeon6258R"
:
c
[
05
-
08
]
.
skl
No
issues
detected
.
Group
"xeon6258R_mem192GB"
:
c
[
05
-
06
]
.
skl
No
issues
detected
.
Group
"xeon6258R_mem768GB"
:
c
[
07
-
08
]
.
skl
No
issues
detected
.
Group
"All"
:
c
[
01
-
08
]
.
skl
1.
The
Intel
(
R
)
Turbo
Boost
Technology
status
'disabled'
,
is
not
uniform
.
12
%
of
nodes
in
the
same
grouping
have
the
same
Intel
(
R
)
Turbo
Boost
Technology
status
.
1
node
:
c03
.
skl
2.
The
Intel
(
R
)
Turbo
Boost
Technology
status
'enabled'
,
is
not
uniform
.
88
%
of
nodes
in
the
same
grouping
have
the
same
Intel
(
R
)
Turbo
Boost
Technology
status
.
7
nodes
:
c
[
01
-
02
,
04
-
08
]
.
skl
PERFORMANCE
No
issues
detected
.
SOFTWARE
UNIFORMITY
The
following
software
uniformity
issues
were
detected
:
Group
"mem192GB"
:
c
[
01
-
06
]
.
skl
No
issues
detected
.
Group
"mem768GB"
:
c
[
07
-
08
]
.
skl
No
issues
detected
.
Group
"xeon6148"
:
c
[
01
-
04
]
.
skl
No
issues
detected
.
Group
"xeon6148_mem192GB"
:
c
[
01
-
04
]
.
skl
No
issues
detected
.
Group
"xeon6258R"
:
c
[
05
-
08
]
.
skl
No
issues
detected
.
Group
"xeon6258R_mem192GB"
:
c
[
05
-
06
]
.
skl
No
issues
detected
.
Group
"xeon6258R_mem768GB"
:
c
[
07
-
08
]
.
skl
No
issues
detected
.
Group
"All"
:
c
[
01
-
08
]
.
skl
1.
The
Energy
/
Performance
Bias
BIOS
setting
,
'6.00'
,
is
not
uniform
.
88
%
of
nodes
in
the
same
grouping
have
the
same
Energy
/
Performance
Bias
setting
.
Intel
(
R
)
MPI
Library
works
best
with
these
values
being
consistent
.
7
nodes
:
c
[
01
-
05
,
07
-
08
]
.
skl
2.
The
Energy
/
Performance
Bias
BIOS
setting
,
'7.00'
,
is
not
uniform
.
12
%
of
nodes
in
the
same
grouping
have
the
same
Energy
/
Performance
Bias
setting
.
Intel
(
R
)
MPI
Library
works
best
with
these
values
being
consistent
.
1
node
:
c06
.
skl
See
the
following
files
for
more
information
:
clck_results
.
log
,
clck_execution_warnings
.
log
Note: the group
"All"
runs all the tests in FWD
-F
health_admin
on all the nodes with the exception of the tests specified to run on the listed groups which it ignores.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804