| January 9, 2009 8:52 AM PST | |
by TW Burger
A process can be run faster by being divided into subtasks (threads) that are run on two or more interconnected computers in parallel. The more computers, the more inter-dependant processes can be run at the same time. Although cluster based computing has made very high speed processing possible with a small budget, there are computational problems that require such extensive processing needs that there is no reasonable way to fund the project using dedicated machines. Grid computing hopes to solve budget and infrastructure constraints by using thousands or even millions of networked computer’s spare CPU time. When these computers are not in use or operating under capacity, they can allow big problems to be solved in small pieces. This paper explores the latest trends in distributed computing and provides examples of its uses.
Distributed computing is becoming an ever more common methodology to solve highly complex computing problems that would traditionally be solved using a supercomputer. It is used to more quickly and/or efficiently process information using available resources. Using a distributed operating system a collection of computers can be interconnected through a network into a cluster.
Distributed computing is based on the concept that most CPUs are not fully utilized and can be used to run tasks sent to them. Distributed computing differs from cluster computing, as in a 'Beowulf Cluster', in that machines in a distributed network are not dedicated to the tasks sent them.
Distributed computing can be defined in many different ways. In most general terms it is a system to permit distributed processing of data and objects across a network of connected systems through the sharing of resources on that system. Distributed computing can encompass desktop PCs, powerful workstations, servers, and even mainframes and supercomputers interconnected through a network. Many scholastic, entrepreneurial and government efforts have developed numerous initiatives and architectures taking advantage of the power inherent in distributed computing.
The terms 'Parallel processing', 'parallelization' or 'distributed programming' all refer to the system where a complex task is broken up into many subtasks that are to be run in parallel. Each subtask is then assigned to a CPU on the network and the results are combined.
Distributed programming uses a collection of computers connected over a network to solve a single problem. Programming multi-computers requires models which are different from normal systems. The programmer must be able to transfer data between different parts of the program through a shared memory space and to coordinate efforts through an inter-process communications system capable of communication between interconnected CPUs.
Distributed programs achieve the following:
- Increased processing speed by using more than one computer at a time.
- Potential for improved reliability when additional computers can compensate for the failure of one
- Allowance for some problems, like remote data acquisition, to succeed in a distributed environment
To run a distributed application, there are several issues that will need to be addressed. To begin, it must be possible to start processes on remote computers and the necessary data for these processes must be provided before they can do any work. Some mechanism for synchronizing these processes, such as 'inter-process Semaphores', should be available, so that they know when to access the data and produce any results. Starting a program on another computer is not very hard using programs like 'telnet' or 'rsh'. Exchanging data and synchronizing, however, can be quite difficult and complicated. These problems can distract the programmer from his original project and can be the source of numerous bugs. Linux* already has some mechanisms for processes in the same computer to exchange data and synchronize between themselves. This is called Inter-Process Communication (IPC). One prominent example is the System V IPC, first introduced in AT&T's System V UNIX.
Distributed computing first used machines connected in a finite physical network. These are PCs similar in both hardware and software. In order to solve massive computational problems most networks are not big enough. Grid computing is the answer to this problem.
Grid offers a way to solve Grand Challenge problems like:
- Protein folding
- Drug discovery
- Financial modeling
- Earthquake simulation
- Climate / weather modeling
Grids offer a way of using the information technology resources optimally in an organization. They also offer a means to offer information technology as a utility bureau for commercial clients -- clients pay only for what they use, as with electricity or water.
Grid computing uses the Internet to borrow unused CPU cycles and storage from millions of systems across a worldwide network. This flexible, readily accessible pool can then be harnessed by anyone who needs it, much as power companies and their users share the electrical grid. Grid computing leans more to dedicated tasks, such as single large medical and engineering problems, rather than for general, everyday jobs. Sun defines a computational grid as "a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to computational capabilities." The computers on a Grid can be of many different OS and hardware platforms.
Grid computing is made up of computational and data intensive problems. The computational aspect focuses on reducing execution time of applications that require large amounts of computer processing cycles. Data intensive problems require large scale data management methods to transfer the data needed for solving the problem to the machine assigned to solve it. Data intensive applications such as High Energy Physics and Bioinformatics require both computational and data ma nagement solutions to be present in Grid computing solutions.
- Grid computing offers a model for solving massive computational problems using large numbers of computers arranged as groups, embedded in a distributed telecommunications infrastructure.
- Grid computing has the design goal of solving problems too big for any single supercomputer, while retaining the flexibility to work on many smaller problems providing a multi-user environment.
- Grid computing involves sharing heterogeneous resources. Computers that are part of the grid will be of different hardware/software architectures, operating systems and computer languages. These computers will be located in different places belonging to different administrative domains over a network using open standards.
- Grid computing is the virtualizing of computing resources.
- Generally grids are classified by function:
- Computational Grids (including CPU scavenging grids) referred - Computational Grids typically gain and lose machines at unpredictable times as interactive users start or stop using their machines, new machines are purchased, machines are removed from the network, or break down. Cycle-scavengers move jobs from machine to machine as necessary to allow the smooth running of the job and the network being scavenged
- Data Grids - Cycle-scavenging systems use machines purchased for other purposes to run batch jobs at night, weekends, and other idle times. A data grid is a grid computing system that considers access to distributed data as important as access to distributed computational resources. Many distributed scientific and engineering applications require access to large amounts of data -- often terabytes or even petabytes of data.
- It’s expected that in the future applications will require even more widely distributed access to data. Data grids will have to support scientific collaboration in a virtual environment allowing access around the world by many people. A Grid is a distributed collection of computer and storage resources maintained in a Virtual Organization (VO). Any of the authorized users within that VO has access to all or some of these resources, and is able to submit jobs to the Grid and expect responses.
- Grid computing requires the use of software that can divide and farm out pieces of a program to as many as several thousand computers.
- Grid computing can be thought of as distributed and large-scale cluster computing and as a form of network-distributed parallel processing. It can be confined to the network of computer workstations within a corporation or it can be a public collaboration using the Internet.
Grid projects are exercises in the manipulation of huge amounts of data or processing (usually both) by applying the resources of many computers in a network to a single problem simultaneously. Experiments like CMS and ATLAS currently being developed at CERN (European Organization for Nuclear Research) are expected to generate petabytes of scientific information by 2006.
The NA SA Advanced Supercomputing Division (NAS) has run genetic algorithms using the Condor* cycle scavenger running on about 350 Sun and SGI workstations. In addition, NASA intends to use United Devices to run genetic algorithms and other codes on the United Devices MetaProcessor*, which cycle scavenges on volunteer PCs connected to the Internet. As of September 2001, the MetaProcessor ran on about 900,000 machines.
A well-known example of grid computing in the public domain is the ongoing Search for Extraterrestrial Intelligence (SETI) @Home project in which thousands of people are sharing the unused processor cycles of their PCs in the vast search for signs of "rational" signals from outer space. According to John Patrick, IBM vice-president for Internet strategies, "the next big thing will be grid computing." The best-known cycle scavenging computation is seti@home, currently the largest computation on the planet. Seti@home was using more than 3 million computers to achieve a 23.37 teraflops/sec sustained processing cycle rate (with 979 lifetime teraflops) as of September 2001.
The US National Technology Grid is prototyping a computational grid for infrastructure and an access grid for people. Sun Microsystems offers Grid Engine* software. Described as a Distributed Resource Management (DRM) tool, Grid Engine allows engineers at companies like Sony and Synopsys to pool the computer cycles on up to 80 workstations at a time. (At this scale, grid computing can be seen as a more extreme case of load balancing.)
The availability and low price of large numbers of network PCs makes the application of distributed computing systems to solve a single large problem (distributed programming) or sharing resources amongst users (distributed computing – clusters and grids) likely to become more common. The great potential in savings, efficiency, problem solving, and reliability show promise for solving large data intensive problems in the future.
The growth of such processing models has been limited, however, due to a lack of compelling applications and by bandwidth bottlenecks. Significant security, management and standardization challenges also play a role in reducing the number of players willing to take advantage of the powerful benefits of distributed programming and distributed computing.
Recent improvements to bandwidth, infrastructure and PC capabilities combined with the potentials seen in the efforts such as those above have resulted in renewed efforts in this field. Although a set of universal standards is yet to be established for distributed computing, the level of interest from companies like Intel, as well as other major hardware and software vendors suggests that it’s only a matter of time. Organizations like the Linux Clusters are already working to standardize high performance computing using Linux clusters. The Internet Engineering Task Force (IETF), Open Grid Services Infrastructure Working Group, the Global Grid Forum and the Globus Alliance are also working toward internet standards for grid computing.
Although Grid scheduling is similar to distributed scheduling in cluster configuration, there are&nb sp;many greater complexities due to the vast size of the tasks to be run and the intricacy of the web. For example, because a Grid is comprised of various different administrative domains, meaningful collaboration can only be achieved if each domain is allowed to maintain its local scheduling policy.
Tasks submitted by VO members could reach millions of jobs. Unlike cluster scheduling, the large number of jobs, resources and local requirements means that centralized scheduling algorithms are impractical or ineffective for use with Grid computing.
The data-intensive nature of some jobs requires data location be taken into account when determining job placement. Replication of data from primary repositories to other locations is important to reduce the overhead and latency of data movement. Scheduling data intensive tasks is a recent focus of Grid computing activities. In the “Data Grid” environment, effective scheduling mechanisms, considering both computational and data storage resources, must be provided for large scale data intensive applications. In grid computing a scheduling framework is required to allow many users to submit requests for job execution from any one of a large number of VO sites.
One solution that has been suggested utilizes a design where at each node of a VO the following three components are placed.
- An External Scheduler(ES), responsible for determining where to send jobs submitted to that site
- A Local Scheduler(LS), responsible for determining the order in which jobs are executed at that particular site
- A Dataset Scheduler(DS), responsible for determining if and when to replicate data and/or delete local files
The Globus Toolkit* has emerged as the de facto standard for grid middleware. The Globus Alliance conducts research and development to create fundamental technologies behind the Grid, which lets people share computing power, databases, and other on-line tools securely across corporate, institutional, and geographic boundaries without sacrificing local autonomy. Globus has protocols to handle grid resource management. These are:
- Grid Resource Management Protocol (GRAM)
- Information Services: Monitoring and Discovery Service (MDS)
- Data Movement and management: Global Access to Secondary Storage (GASS)
- GridFTP
Most of the grids spanning research and academic communities in North America and Europe utilize the Globus Toolkit as their core middleware. As of 2003, the worlds of Grid computing and Web Services have started to converge to offer Grid as a web service (Grid Service). The Open Grid Services Architecture (OGSA) has defined this environment, which offers several functionalities adhering to the semantics of the Grid Service. The Globus Toolkit forms the basis of OGSA and although the OGSA is OS agnostic it is likely Linux will form a significant basis of the grid infrastructure.
Without proper security it is always p ossible for sophisticated individuals to feed bogus data to grid computing efforts. Since grid computing involves the running of code on remote computers and major efforts in grid computing like OSGA are open source, the code is well documented and may be reverse engineered to take over computers. Errant data must be detected and ignored and the validity of the code being run must be maintained.
While it is possible that an unauthorized party could modify a VO computer to send incorrect data, the amount of data sent compared to the total population under evolution, would not be significant (a ratio likely to be 1/100,000 or greater). The controlling computer can simply reevaluate the data fitness and data can be sent via SSL to make reverse engineering more difficult. Modification of the database and denial of service attacks could be more problematic. These can be addressed by password protecting the relevant web pages and by limiting access to the machines used by the author of the VO task and her VO collaborators. Generally unauthorized reading of the data is of little concern, as much of it will be published in the open literature.
The potential of computer science has always been hampered by the inability to adequately address massive processing and data volume issues. No matter how fast a CPU is or the data throughput rate, our imaginations come up with new applications that exceed the existing technology or budget.
Grid computing technology has the potential to alleviate processing capacity and cost barriers. A grid can solve problems that can't be approached without an enormous computing power. Computers will collaborate rather than being directed by one managing computer. Ultimately, the future may bring pervasive computing; computers will be saturating our environment without our direct awareness. Recent Internet over power grid developments may further increase grid computing use by making high speed connection ubiquitous. It may also act as a catalyst for the creation of the world power grid envisioned by Buckminster Fuller.
By creating a standard for cooperatively and synergistically allowing collaborative computer power to be harnessed by anyone and toward common objectives, science becomes more social and ultimately more human.
Thomas Wolfgang Burger is the owner of Thomas Wolfgang Burger Consulting. He has been a consultant, instructor, analyst and applications developer since 1978.
Distributed Computing Projects
- Genome@Home
- Berkeley NOW Project
- Berkeley Open Infrastructure for Network Computing (BOINC) a distributed computing infrastructure founded and developed by the SETI@home project.
- Distributed.net has many projects, one of which is a search for optimal Golomb rulers. Some will venture that Distributed.net is not a non-profit project since the main RC5-72 project they do is indeed for a cash prize from RSA Labs.
- Folding@Home University of Illinois at Urbana-Champaign report on October 22 2002 confirm success in simulating protein folding
- GIMPS - Great Internet Mersenne Prime Search
- Project Dolphin takes a count of the number of keys you press on your keyboard. This is mostly an event made of teams.
- SETI@home, a project searching for signs of extra-terrestrial intelligence (SETI ).
- Seventeen or bust Attempts to find prime numbers in 17 sequences, to solve the Sierpinski number. - So far prime in 5 sequences has been found.
- United Devices is the largest commercial distributed computing network.
- Lifemapper - Attempts to build global archive of biological species distributions.
Distributed Project Directories
- Internet-based Distributed Computing Projects - Lists ongoing, future and past projects, edited by Kirk Pearson.
References
- Fran Berman, Anthony J. G. Hey, Geoffrey Fox: Grid Computing: Making The Global Infrastructure a Reality, Wiley, ISBN 0470853190,
- Online version
- I. Foster, C. Kesselman, G. Tsudik, S. Tuecke. Proc. A Security Architecture for Computational Grids 5th ACM Conference on Computer and Communications Security Conference, pp. 83-92, 1998.
- I. Foster, C. Kesselman, S. Tuecke The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, 15(3), 2001.
- Ian Foster, Carl Kesselman: The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers, ISBN 1558604758, Website
Related links
- (www.biosimgrid.org) BioSimGrid: Grid database for biomolecular simulations
- Cluster
- Distributed computing
- EU DataGrid project
- Ganglia
- Global Grid Forum
- Globus Toolkit
- GridForge
- How You Can Fight Against Diseases Using Your Computer
- Grid Computing
- IBM Grid Computing website
- O'Reilly article about grid computing software
- ProActive is a Java library for parallel, distributed, and concurrent computing with mobility and security.
- Render farm
- Supercomputer
- The Condor project
- The Globus™ project
High Performance, Grid and Parallel Computing
- The Grid: Computing without Bounds, by Ian Foster (April 2003 Scientific American). This article by Ian Foster is an excellent read for laymen, scientists, and techies alike. Ian does a terrific job of making most of the abstract parts of Grid computing tangible, making them come alive. He also gives you a peek at the future of Grid computing and how much change it can bring to the way we do business
- 10 Emerging Technologies that will Change the World - Technology Review named Grid computing as one of the ten technologies that will change the world. Article includes photos of Ian Foster and Carl Kesselman.
- Anatomy of the Grid - This white paper by Ian Foster, Carl Kesselman, and Steven Tuecke defines the field of Grid computing. As the title suggests, the authors spend some time naming all of a Grid's constituent parts and defining what they do. Their focus is on Grid architecture.
- Arminius: SCI Coupled Linux-PC's.
- Beowulf at NASA/GSFC
- Beowulf Questionnaire (with results), attempts to gather some information about the Beowulf systems people are using.
- CERN - The CMS and ATLAS projects.
- Fundamentals of Grid Computing - A brief IBM Redpaper that offers a concise technical overview of Grid computing.
- Grid Computing Planet is a Grid Computing Information Portal.
- Grid Computing: Making the Global Infrastructure a Reality, edited by Fran Berman, Geoffrey Fox and Tony Hey. Published March 2003 by Wiley. This 1000+ page tome is filled with articles and essays that examine Grid computing from a variety of science and technical angles, including: history of the Grid, the semantic Grid, an overview of Grid architecture, Grid deployment models, OGSA, peer-to-peer Grid databases, and a lot more.
- Grid Service Specification, which defines the standard interfaces and behaviors of a Grid service, building on a Web services base.
- Grid-dy Determination - When it comes to companies that need a lot of CPU cycles to get their work done, Grid computing is definitely the way to go. Provides some real-world cases in which Grid computing is making all the difference: online gaming, financial number crunching, genome research, aerospace and more.
- High Performance Computing and Networking Center aims to contribute knowledge and new finding in the area of parallel and distributed computing, scientific computing, distributed system, networking and Internet Technology. Moreover, the center seeks to build tools and technology that allow scientists and engineers to use the high performance cluster computing systems to explore new territory in scientific discovery that benefits human society.
- IBM Grid Computing home page
- IBM VP Wladasky-Berger explains Grid Computing - A great overview of Grid computing -- and IBM's role in it. He talks about how emerging standards and access to greater bandwidth make the dream of commercially viable Grid computing closer to reality than most people think.
- SGI Linux Networx* (LNXI) provides cluster computing systems that deliver maximum sustained performance and high return on investment to our customers.
- LoBoS, a Beowolf class computer in the Molecular Graphics and Simulation Lab at the National Institutes of Health.
- MPI Linux Cluster Project, at the Max-Planck-Institut für Informatik in Germany.
- Physiology of the Grid - This white paper by Ian Foster, Carl Kesselman, Jeffrey Nick, and Steven Tuecke explains how Grid computing can be put to work in a Web services environment. This is the white paper that presents more details about OGSA and Grid semantics (i.e., services). Together with "Anatomy of the Grid", these two papers provide a fairly detailed overview (albeit a tad academic) about the world of Grid computing.
- Purdue's Adapter for Parallel Execution and Rapid Synchronization
- The Beowulf Central Site
- The Global Grid Forum is a community-initiated forum of researchers and practitioners working on Grid computing, and a number of working groups are producing technical specs, documenting user experiences, and implementation guidelines. See GGF@WORK at for a list of the working groups including the Open Grid Services Architecture (OGSA) Working Group.
- The Globus Alliance is developing fundamental technologies needed to build computational grids. Grids are persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations.
- The Globus Toolkit - Globus is an open-architecture, open standards tool for building computational Grids. It is widely cited as a solid reference implementation that will get your hands dirty in the world of building, deploying, and managing Grids. Also, look at the Globus FAQ.
- The Grid Computing Information Centre aims to promote the development and advancement of technologies that provide seamless and scalable access to wide-area distributed resources.
- The Open Grid Services Architecture (OGSA) represents an evolution towards a Grid system architecture based on Web services concepts and technologies.
- TOP500 Supercomputer Sites - lists the world's most powerful computers, several of which are Linux Beowulf systems.
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (0) 
Trackbacks (0)
Leave a comment 
TW Burger
|
