Intel® Cluster Checker

Legal Information

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Running Intel® Cluster Checker

Intel® Cluster Checker executes in two phases. In the data collection phase, Intel® Cluster Checker collects data from the cluster for use in analysis. In the analysis phase, Intel® Cluster Checker analyzes the data in the database and produces the results of analysis. It is possible to invoke these phases together or separately and to customize their scope. By default, Intel® Cluster Checker verifies the overall health of the cluster using the health framework definition.

Introduction

Clusters are complex systems, and it can be difficult to identify issues when something goes wrong. Intel® Cluster Checker aims to reduce this complexity barrier and make debugging easier. It collects data from the cluster, analyzes that data, and produces a clear list of found issues. Using Intel® Cluster Checker, you can resolve issues quickly and move on to actually using your cluster.

Getting Started

Before using Intel® Cluster Checker for the first time, the runtime environment must be set up. Two files are included to set up the runtime environment, clckvars.sh for shells with Bourne syntax and clckvars.csh for shells with csh syntax. Type

source /opt/intel/clck/201n/bin/clckvars.sh

Data Collection

Before Intel® Cluster Checker can identify issues, it must first gather data from the cluster. Intel® Cluster Checker uses providers to collect data from the system and stores that data in a database. Framework definitions determine what data to collect by defining a set of data providers to run.

Knowledge Base

Intel® Cluster Checker is an expert system. A classic definition of an expert system is "an intelligent computer program that uses knowledge and inference procedures to solve problems that are difficult enough to require significant human expertise for their solutions" (Edward A. Feigenbaum, "Knowledge Engineering in the 1980s", Stanford University Computer Science Department, 1982). The problem that Intel® Cluster Checker solves is diagnosing system-level issues with Beowulf style clusters.

订阅 Intel® Cluster Checker