by John Sharp, Content Master Ltd
Clusters offer a scalable means of linking computers together to provide an expansive environment for hosting enterprise applications. The Intel® Xeon® processor and Itanium® processor are cost-effective platforms that are well-suited to implementing clusters, providing advanced features such as parallel architecture, large addressable memory spaces, and three-level cache for fast access to critical data.
OSCAR* (Open Source Cluster Application Resources), is an open-source project comprising software for building high-performance clusters. Currently, OSCAR is available for Linux*. OSCAR is managed by the Open Cluster Group, an informal group of professionals from a variety of establishments and organizations. The goal behind the OSCAR project is to make clustering a freely available and easily configurable option for systems based on Linux, helping to bring Linux into the mainstream of enterprise computing.
OSCAR contains a number of facilities, including cluster-management tools, a message passing interface based on the MPI standard, a job queuing system, and a batch scheduler. System images can be built on an OSCAR server and downloaded to OSCAR clients for execution using the OSCAR Installation Wizard. The environment is straightforward to install and configure, providing good scalability at minimal cost.
This paper provides an overview of OSCAR, describing how to use it to deploy an application cluster that can be used as a platform for high-performance computing.
Introduction to OSCAR
To many architects and designers, the term "clustering" often implies a complex network of expensive hardware running specialized, difficult-to-manage software. A major aim of OSCAR is to dispel the myth that establishing and managing a cluster is difficult and costly. OSCAR is designed to operate using standard, off-the-shelf computers based on either 32-bit or 64-bit processors.
OSCAR runs under RedHat Linux 7.1 and later or Mandrake Linux 8.2 and later. As a result, OSCAR can take full advantage of Hyper-Threading Technology in the Intel Xeon processor. OSCAR provides GUI wizards to guide administrators through the process of configuring clusters. Many common tasks are automated, increasing the consistency among cluster nodes while reducing the time and expertise needed.
OSCAR is an example of a high-performance compute cluster. In this model, multiple clients, or compute nodes, run programs in parallel. A server, or head node, drives the compute nodes, distributing the work to be performed and accumulating the results.
Other forms of clustering exist (e.g., storage clusters, database clusters, load-leveling clusters, Web-service clusters, and high-availability clusters), but OSCAR is not currently intended to address these needs. A working group within the Open Cluster Group is looking at Thin-OSCAR, which would provide support for diskless clients.
OSCAR is an ongoing open-source project that employs a number of verified tools. Many of these tools have been enhanced for integration into the OSCAR environment. The latest source code and binaries for the OSCAR suite can be freely downloaded from the main OSCAR Web page*. A variety of commercial and research organizations, including Intel, IBM, Dell, SGI, the NCSA, Indiana University, and the Oak Ridge National Laboratory, have made significant contributions to the project.
Apart from the operating system, the OSCAR suite of software contains everything needed to install, build, maintain, and use a modestly sized Linux cluster. OSCAR is currently aimed at clusters containing up to 64 nodes, although larger installations are possible.
Once installed on a head node, you can use the OSCAR Installation Wizard to build an image for the client nodes. The computers used as client nodes should initially be completely clean – OSCAR will automatically install an appropriate version of the operating system on the client nodes from the image that is constructed on the server. The client image comprises standard Linux packages, together with OSCAR client software. Note that OSCAR currently requires that all computers in the cluster are homogeneous, as all clients will be installed using the same image.
The principal components that comprise OSCAR include the following:
- System Installer Suite (SIS). SIS performs the initial installation of the compute nodes using the client image built on the head node. SIS bootstraps each client, either over the network using PXE (Preboot eXection Environment) or from a floppy disk created using the OSCAR Wizard, if client computers have a BIOS that does not support PXE. Once the image has been downloaded and installed by SIS, the client computer can be restarted. It will then be a constituent node in the cluster.
- Environment Switcher. A common problem when installing system software is ensuring that the environment of each user is configured correctly. Often this involves editing hidden "dot" files, and although the process is usually straightforward, it can be repetitive and error-prone. Furthermore, faults caused by a misconfigured environment can be very difficult to track down and correct. The Environment Switcher provides a safer mechanism for manipulating configuration information held in "dot" files, ensuring that consistency is maintained. The OSCAR Wizard provides a GUI interface for configuring some common options, and a command-line interface is also available.
- OSCAR Database (ODA). The ODA is a repository used by the OSCAR Wizard and OSCAR packages to store configuration information. The ODA is a mySQL database (mySQL will be installed automatically if it is not present on the head node computer).
- Cluster Command and Control toolset (C3). C3 is a set of command-line tools that allow a user or administrator to execute tasks, manage files, and query configuration information across the entire cluster or an specified nodes. C3 also contains a utility for shutting down and restarting the cluster in a controlled manner.
- Open Portable Batch System (OpenPBS). OpenPBS is a workload-management system, designed to submit and execute tasks on networked multi-platform environments such as a c luster. OpenPBS comprises three parts: the PBS server, which runs on the head node and controls jobs; the Maui scheduler, which determines when and in what order jobs will be executed; and a daemon process that executes on each compute node as it performs the specified tasks.
- Parallel Virtual Machine (PVM). This component provides an abstraction of the cluster, making it appear as one large virtual parallel computer. PVM includes a library of functions that developers can incorporate into applications to exploit this environment by performing tasks in parallel.
- Parallel Virtual File System (PVFS). PVFS consolidates disks belonging to the nodes in the cluster, maximizes throughput by striping files across disks hosted by different nodes and implements a parallel I/O mechanism. PVFS gives applications access to the same consistent view of data and files, regardless of which node they run on.
- Message Passing Interface (MPI). Message passing is a commonly-used technique for performing parallel processing spread among multiple processors.
Processes execute tasks on individual processors and communicate with each other by sending messages. Processes can operate in a semi-autonomous manner, performing distinct computations that form part of a larger job, sharing data, and synchronizing with each other when required.
Message-passing systems assume a distributed memory model, in which each process executes in a different memory space from the other processes. This scheme works well, whether the processors are part of the same computer or spread across a range of heterogeneous machines spanning a network. MPI is a specification of a standard set of functions that support message passing. OSCAR includes two implementations of MPI: MPI-CH and LAM/MPI. The choice of which one to use can be specified prior to building the client image using the Environment Switcher.
- OSCAR Password Installer and User Management (OPIUM). User account details must be maintained in a synchronized manner across all nodes in the cluster. OPIUM manages these tasks, and ensures that users can traverse every node in the cluster without needing to supply a password once they have logged in to the head node.
OSCAR provides some basic monitoring facilities, but for a more comprehensive set of tools, you should download Ganglia*, a real-time cluster monitoring tool, prior to installing OSCAR (the OSCAR Installation Wizard can be used to install Ganglia and to ensure that it is configured correctly on all nodes in the cluster). Ganglia can collect a variety of statistics from the nodes in the cluster and present them as a report using command-line utilities, or in a Web-based GUI format.
Security in a clustered environment is an important issue, but the current release of OSCAR was not built with security in mind. It is therefore preferable to install OSCAR on a private network with very limited outside access. OSCAR installs a firewall called pfilter on each node. The firewall is preconfigured to allow unrestricted network communications between machines in the cluster but to tightly control access with the outside world. Future releases of OSCAR will incorporat e more robust security.
Installing OSCAR is a matter of performing a vanilla Linux build (using the predefined RedHat Workstation installation is sufficient), downloading the OSCAR binaries from the Web, unpacking these binaries, and then running the OSCAR Installation Wizard by executing a script called install_cluster. The Wizard runs further scripts that examine the operating system configuration and automatically install, configure, and enable any additional Linux services required before presenting a GUI that allows you to install and configure OSCAR as shown in Figure 1. It is recommended that you do not apply any patches to the operating system, as OSCAR supplies its own (using non-OSCAR updates to the operating system can cause internal errors and conflicts within the various OSCAR components).
The OSCAR Installation Wizard requires access to the operating system packages (in RPM format) needed to install Linux. Prior to running the OSCAR Installation Wizard, these files should simply be copied directly from the same CDs used to set up Linux on the head node to the directory /tftpboot/rpm. OSCAR expects the head node and compute nodes to use the same version of Linux.
You will need approximately 2GB of free disk space to hold the RPMs for RedHat Linux. You will also need a further 2GB of disk space for holding the client image generated by the OSCAR Installation Wizard.
Installing OSCAR is a matter of stepping through the stages presented by the OSCAR Installation Wizard:
- Select OSCAR Packages to Install. This button displays a panel that allows you to select the OSCAR packages to install (Figure 2) . Some packages are described as core, meaning they cannot be deselected, while others are optional. However, unless you have good reason, it is recommended that you leave all packages selected.
- Configure Selected OSCAR Packages. This button displays another panel allowing you to configure those OSCAR packages that have modifiable options (Figure 3) . Use the Environment Switcher option to specify the MPI library to use, and the kernel_picker option to specify which kernel image to use when building the client image. Other options may be available, depending on which OSCAR packages have been selected. In most cases, the default configuration should be sufficient.
- Install OSCAR Server Packages. This button installs the selected OSCAR packages on the server. You can monitor the progress of the installation using the messages displayed in the console windows used to start the OSCAR Installation Wizard.
- Build OSCAR Client Image. This button displays a pane l allowing you to specify options (image name, location, and so on) for SIS as it creates the client image (Figure 4) . If you are using DHCP, select an appropriate IP Assignment Method. The Post Install Action option specifies what action will be taken on the client computer after successful installation of the image. Click Build Image to create the image (again, you can follow the progress in the console window).
- Define OSCAR Clients. This button displays a panel that allows you to specify the names and network addresses of the client computers (Figure 5) . OSCAR gives each client computer a name comprising a base name (oscarnode by default) and a number – the first node is called oscarnode1, the second is called oscarnode2, and so on. You must also specify the number of client computers that will make up the cluster. You can add or remove clients later. Like the computer names, the IP addresses of each node are generated sequentially, and you can specify the starting point. Click Add Clients to add the details to the SIS image.
Figure 5. Adding Clients to an Image
- Setup Networking. This button displays another panel you can use to indicate which client should be assigned to which node address (Figure 6). Clients are identified by the MAC (Media Access Control) address of their network cards. This panel provides two options for retrieving MAC addresses: either by scanning the network or by importing them from a text file. When the MAC addresses have been collected and assigned, use Setup Network Boot to create a network-boot image for clients that can use PXE, or use Build Autoinstall Floppy for clients that cannot use PXE.
MAC address assignment is only required if clients have addresses assigned using DHCP. For clients with static IP addresses, you can use the OSCAR mkautoinstalldiskette utility from the command line and provide a configuration file containing the IP address and other details of the client. A different installation diskette will be required for each client.
At this point, you should boot each client computer, either from the network or using the appropriate boot diskette. Each client will connect to the OSCAR server, download and install the client image, and enroll itself in the cluster. When each client has finished installing, it should be rebooted.
- Complete Cluster Setup. Click this button when all the client computers have successfully installed and configured themselves. A number of post-installation scripts will run that finalize the installation. You can follow the progress of these scripts in the console window.
- Test Cluster Setup. Click this button to ensure that the cluster has been configured successfully. A console window will open (Figure 7) and a number of tests will be performed that check the connectivity of each node and the installation of key components such as PBS, PVFS, PVM, and MPI.
The remaining buttons in the OSCAR Installation Wizard allow you to add and remove client nodes from the cluster. The buttons in the Add OSCAR Nodes panel (Figure 8) perform the same tasks as the equivalent buttons in the main panel.
The Delete OSCAR Nodes screen allows you to remove selected nodes from the cluster (YYY).
Using the Cluster
You can execute commands and applications on nodes in the cluster using several OSCAR tools, including C3, PBS, MPI, and PVM.
C3 provides several commands that an administrator can use to control and manage nodes in a cluster, as well as the cexec command. The cexec command will execute a specified command on all compute nodes in the cluster, on the head node, or on nodes specified in a configuration file or as command-line arguments. For example, Figure 10 shows the command ls –l /tmp being performed on all nodes in the cluster, listing the files in the /tmp directory (the cluster shown contains only two compute nodes).
C3 is useful for performing interactive commands, but the PBS should be used for performing non-interactive tasks that do not require any form of user input or response. You can submit jobs to the PBS using the qsub command. Qsub expects a number of parameters, including a script of commands to be performed by the job, and schedule information indicating when the task should be performed. Output can be directed to a file. Jobs submitted using qsub are placed in a queue and executed at the appropriate time on any available node. However, a user can arrange for tasks to be performed in parallel across all nodes using the PSB pbsdsh command inside a qsub script. Apart from qsub, PBS also provides the qstat command to display the status of jobs, and qdel which can be used to delete a job from the queue.
Developers building applications for execution on the cluster can make use of the MPI and PVM runtime support included with OSCAR.
MPI allows multiple processors spread across the nodes of the cluster to perform parallel execution of code. Programs should be written using the MPI API and linked with the MPI runtime libraries. After compilation, the program should be copied to al l nodes in the cluster using the C3 qpush command. The mpirun utility can then be used to execute the application. The MPI runtime will automatically distribute the tasks defined by the application across the nodes.
PVM delivers an alternative platform for running parallel applications (PVM actually predates MPI, but it is still popular). PVM provides facilities for defining a virtual machine that encompasses the nodes in the cluster. The pvm utility enables an administrator to add cluster nodes to the virtual machine and to monitor the progress of jobs running on this virtual machine. Applications built using the PVM APIs and compiled with the PVM libraries can then be executed on this virtual machine; tasks will be distributed across nodes in the virtual machine. Further discussion of PVM is beyond the scope of this paper, but more information is available from the Oak Ridge National Laboratories Web site*.
The OSCAR suite contains a core set of packages for building and maintaining an OSCAR cluster. Third-party developers can use the OSCAR Installation API to create additional packages that can be installed using the OSCAR framework. Approved packages are staged by SourceForge* and made available for download. The same mechanism is used for providing updated versions of the core OSCAR packages.
The OSCAR Package Downloader (OPD) is an OSCAR utility that connects to the OSCAR package repository at SourceForge and downloads selected packages. These packages can then be configured and installed using the OSCAR Installation Wizard.
OSCAR provides a suite of tools for quickly building compute clusters based on commodity hardware. The Intel Xeon processor and Itanium processor provide an eminently suitable, cost-effective platform for hosting an OSCAR cluster.
OSCAR incorporates the best-known methods for creating, programming, and using clusters, and it combines a range of proven open-source technologies. OSCAR supplies wizards and tools that can automate many repetitive tasks and guide an administrator through the process of implementing a cluster, minimizing the scope of configuration errors that could otherwise occur.
OSCAR contains tools and libraries that allow developers to take full advantage of the cluster environment. MPI and PVM can be used as a platform for applications that exploit the parallelism inherent in a cluster. PBS provides a means to schedule tasks for execution in a synchronized manner across the cluster, and the C3 tools enable command-line operations to be performed on all nodes.
OSCAR is extensible. Additional OSCAR packages can be downloaded from an appropriate repository and incorporated into the OSCAR environment. The OSCAR Installation Wizard can be used to install these packages to all nodes in the cluster.
The following resources provide additional detail about OSCAR and related topics:
- OSCAR (Open Source Cluster Application Resources) *
- http://www.sisuite.org/ - System Installation Suite*
- Project C3: Cluster Command and Control*
- Portable Batch System*
- Maui Scheduler*
- LAM/MPI Parallel Computing*
- MPICH – A Portable Implementation of MPI*
- PVM (Parallel Virtual Machine) *
- The Parallel Virtual File System*
- Ganglia distributed monitoring and execution system*
Intel, the world's largest chipmaker, also provides an array of value-added products and information to software developers:
- Intel® Software Partner Home provides software vendors with Intel's latest technologies, helping member companies to improve product lines and grow market share.
- Intel® Developer Zone offers free articles and training to help software developers maximize code performance and minimize time and effort.
- Intel® Software Development Products include Compilers, Performance Analyzers, Performance Libraries and Threading Tools.
- IT@Intel, through a series of white papers, case studies, and other materials, describes the lessons it has learned in identifying, evaluating, and deploying new technologies.
- The Intel® Academic Community provides a one-stop shop at Intel for training developers on leading-edge software-development technologies. Training consists of online and instructor-led courses covering all Intel® architectures, platforms, tools, and technologies.