The news is full of high performance computing. Can someone provide me a working definition of what exactly determines a high performance system?
My master jury asked me that question and I answered somewhat like below.
"A HPC or supercomputer is a computersystem that can solve problems in an exceptible time that a commodity computer can not,because ofmemory, storage and speed constraints."
This gets into politics immediately. By many definitions, for example, Windows is excluded from HPC, in spite of the existence of mixed Windows/Solaris and Windows/linux MPI clusters, and Microsoft's renewed vow to support MPI. In some definitions, there is a sharp boundary between "HPC" and "workstation," although you are unlikely to find much agreement on that boundary.
I would loosely define HPC as commodity-based supercomputing. I know it is not true big-iron supercomputing, but for the money you pay, it's _much_ more cost effective.
"What makes a supercomputer?"
Possible correct answers:
The fastest, most powerful machine to solve a problem today.
A supercomputer is one that is only one generation behind what you really need.
Page _one_ of the Linpack-report...
HPC is a fairly generic term though. It does not have to be parallel. An exceptionally large, or parallel database server can also be called HPC, although some will call it High Throughput Computing (HTC).
I would consider just about anything with more than 4 processors HPC. Including a 16-way Itanium2 or a 32p SGI or an 8 processor Xeon cluster or a 33,000 processor vector machine. I would even include a single desktop equipped with several FPGA's.
The term "Beowulf" is slightly more specific in that it requires a cluster of commodity machines running an OpenSource operating system. It could be either HPC or HTC.
A dual Xeon workstation does not quite cut it though...
Everyone seems to focus on the hardware. Certainly this is a tangible object. You can parade visitors through your machine room and point out the "big iron" or the "XYZ cluster." HPC, to me, involves more than just parallel computers or fast networking fabrics with low latency and high bandwidth. I think HPC is an approach to solving a problem from both a hardware and (more importantly?) a software/algorithmic angle.
If I run a climate forecasting application on my desktop workstation, would that qualify as HPC? I think it does since the means used to arrive at the answers will require a large amount of intense computation.
If I run my email reading application on a 512-node cluster, would that qualify as HPC? Probably not, unless I'm getting so much spam that the filtering of incoming mail requires a large fraction of the cluster nodes.
What if I'm developing an email filter application based on neural network technology and have millions of email messages that need to be scanned and analyzed? For me, this last scenario is on the borderline, but I'd think that this is a case where the more HP my computations are, the faster and better my solution is likely to be. If I can divide up the work into independent tasks and run each of those on separate nodes, it looks more like HPC.
So, when defining HPC, I look at hardware as merely the means to facilitate the computations that must be done in a highly performing way.
I also think that there should be more emphasis on the software side of HPC. To me it means putting in the work needed so that the software makes best use of the available machine cycles, whether those cyclesare on a single workstation or a hundred-node cluster.
Just as an example, we recently doubled the size of our cluster, and got about a3X increase in throughput (since the new processors were faster). In the same period, I re-worked a lot of the old object-oriented code of our main application, and got about 20X overall increase.
Anyone (or at least anyone who has the money) can increase performance by buying more & faster hardware. The work, the benefits, and the fun of HPCare all in the software :-)
Here is the definition in Webopedia.com. It also introduce an new element:software applications.Because performance has software implications. In short,we can say:
HPC = Supercomputer (hardware) + Parallel Algorithm (software)
High Performance Computing
n.) A branch of computer science that concentrates on developing supercomputers and software to run on supercomputers. A main area of this discipline is developing parallel processing algorithms and software: programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors.
The citation you quoted says parallel algorithms are a "main area" of HPC. Today, with the inherent human need for speed, I would claim that 99 44/100% of software (and software research) for HPC concentrates on parallelism. There are non-parallel software developments/activities that I would also classify as being in the realm of HPC. Two of the more recent and visible examples I could cite are ATLAS (http://www.netlib.org/atlas/), FFTW (http://www.fftw.org),and Prof. Kazushige Goto's High-Performance BLAS (http://www.cs.utexas.edu/users/flame/goto/).
We can make overall computation faster by either improving how the computations perform on processors (serial tuning) or running independent parts of the computation in parallel. While the emphasis has been on the latter for the past decade or more, as long as new processors and architectures are released, the former will still have a place within HPC. In many cases, combining the best serial algorithms in parallel are going to yield the best possible results.
Message Edited by hagabb on 08-10-2004 07:10 AM
I totally agree with you on the importance of paralelism to HPC.I probabaly will agree with the number 99 44/100% a week ago. Now as I read throughSoftware Vectorization Handbook by Aart Bik,I can easily point out another software technology that are just as important to HPC application as the parallelism. That's the software pipelingor vectorization, which can include all the wonderfulcompiler optimizationhappening at loop level.
I think software technology just tracks thehardware technology. In the hardware world there are two ways to make computer faster: Duplication and pipelining. So at software level, it will be more complete if we emphasize two areas of HPC applications: parallelism and software pipelining.