node1# role: head role: computenode2# role: computenode3# implicitly assumed to be a compute nodenode4#node5
- boot * Provides software imaging / provisioning capabilities.
- compute * Is a compute resource (mutually exclusive with).enhanced
- enhanced * Provides enhanced compute resources, for example, contains additional memory (mutually exclusive with compute).
- external * Provides an external network interface.
- head * Alias for the union of boot, external, job_schedule, login, network_address, and storage.
- job_schedule * Provides resource manager / job scheduling capabilities.
- login * Is an interactive login system.
- network_address * Provides network address to the cluster, for example, DHCP.
- storage * Provides network storage to the cluster, like NFS.
node1# subcluster: ethnode2# subcluster: ethnode3# subcluster: ethnode4# subcluster: ethnode5# subcluster: ibnode6# subcluster: ibnode7# subcluster: ibnode8# subcluster: ib
Ignore subclusters when running cluster data providers. That is, cluster data providers will span subclusters. The default is not to span subclusters.
- Currently, Intel® Cluster Checker uses pdsh by default. The available collect extensions are pdsh (pdsh.so) and Intel® MPI Library (mpi.so), both of which are located at /opt/intel/clck/20.x.y/collect/intel64.
- Environmental variable syntax:CLCK_COLLECT_DATABASE_BUSY_TIMEOUT=value
- where value is the number of milliseconds to wait for a database lock to become available before giving up. The value must be greater than 0. The default value is 60,000 milliseconds.
- When inserting a new row into the database, the database is locked and any concurrent write attempts are prevented. This value specifies the amount of time that the concurrent write(s) should wait for the database to be unlocked before giving up. If the timeout expires and the database is still locked, the concurrent write(s) will not be successful and the data will be lost.
- Environmental variable syntax:CLCK_COLLECT_DATABASE_CLOSE_DELAY=value
- where value is the number of seconds to wait after data collection has finished for any remaining data to be accumulated. The value must be greater than 0. The default value is 1 second.
- All data that is in the accumulate queue will always be written to the database, but some data may still be on the wire when data collection has finished. This option provides a method to wait an additional amount of time for data to be received by the accumulate server before exiting. Clusters with very slow networks or a very large number of nodes may need to increase this value from the default.
- Uses POSIX advisory locks when locking the database. Note that the implementation of POSIX advisory locks on some filesystems, for example, NFS, is incomplete and/or buggy. This value should usually only be selected when the database is located on a local filesystem. unix-dotfile - Uses dot-file locking when locking the database. This value usually works around filesystem implementation issues related to POSIX advisory locks. This is the default value
- Obtains and holds an exclusive lock on the database file. All concurrent database operations will be prevented while the lock is held. This value may help in the event of database errors during data collection of if collected data is missing from the database
- No locking is used. This option should only be used if there is a guarantee that only a single writer will modify the database at any given time. Otherwise, this value can easily result in database corruption if two or more processes are writing to the database concurrently.
- The SQLite* OS interface layer, or VFS, can be selected at runtime. The VFSes differ primarily in the way they handle file locking. See http://www.sqlite.org/vfs.html for more information.