This question was asked during the
An Introduction to High Performance Computing: Parallel Computing Issues webcast. Here is the answer given by Tom Lehman's answer.
There are several that are available for free. The two that we play with the most within my group, one of them is called OSCAR and it's available from SourceForge. Another common clusterpackage from San Diego Supercomputer Center is calledRocks. Both will allow you to build a cluster relatively easily. It takes care of sorting out all of the communications paths between the members of your cluster, and basically I can build a Rocks cluster of, say, 256 nodes in about four hours. Of course, if you don't happen to have 256 nodes, maybe you're only doing four nodes, it'll take you about 45 minutes, max. But along with those packages are included usually management packages such as Ganglia or another package from NCSA called CluMon. These give you an overall picture of the health of the software on your cluster. They show you what load on any given processor is. You can see historical data as to where your load was and where it recommends that it's probably going to be going, et cetera. Also being built into these clustering monitors are some monitors for the hardware as well, so that you can determine that you've got nodes that perhaps have fan failures and maybe should be taken out of operation as soon as possible, or nodes that have flat-out failed because maybe they lost the power supply. So that's one form of cluster management. Another form of cluster management is the workload management. In most clusters the way that they're operated is in a batch processing system, where you submit your job as they did in days gone by to the master node, and then a queuing system puts you into the proper queue, and then will execute your job and send the results back to an appropriate place once the necessary processors are available. Those packages are also part of OSCAR and Rocks. Again, they're automatically installed and you just start using them after you've put your cluster together.
Message Edited by hagabb on 11-01-2004 11:08 AM
Message Edited by hagabb on 11-01-2004 11:18 AM