Tera-scale Computing - A Parallel Path to the Future

By Justin R. Rattner
Intel Senior Fellow

Issuing in the Era of Tera

Last fall, Paul Otellini and I announced initial results from Intel’s Tera-scale Research Program and the Polaris processor, the first in a series of many-core research processors. The goal of the Polaris project was to develop design technologies and methodologies tuned toward rapid tera-scale silicon development. The design team created Polaris as 80 cores in a tiled two-dimensional array interconnected through routers built into the silicon. The cores were far simpler than today’s Intel® processors, so we could focus on the challenges of building a lot of cores in a single package. Other objectives of Polaris were to minimize the global clock network design effort, reduce the clocking power budget, and bring fine-grain power management to a many-core processor.

After receiving first silicon, within two hours, Polaris delivered 1 tera-FLOP of performance, consuming less than 62 watts of power to do so – more than the design power of our latest dual-core server processors. With Polaris, we achieved our first major objectives for a tera-scale processor.

When we announced the Polaris research, there were numerous questions about what someone would do with all those cores, and who needs a TFLOP of computing power.

Tera-scale Computing

At Intel, we’ve been considering for several years what future applications will look like as part of our Tera-scale Research Program. Tomorrow’s applications will process terabytes of data at TFLOP rates. That demands a level of computing that only exists in supercomputing. But it will need to be available at the desktop to support tomorrow’s tera-scale applications.

Recognition, Mining, Synthesis

We’ve categorized a whole new breed of software under what we call Recognition, Mining, and Synthesis (RMS) applications. These are applications that not only benefit from tera-scale computing, they require it. RMS means:

  • Recognition allows computers to examine data and construct mathematical models based on what they identify, like a person’s face in a single picture.
  • Mining extracts one or more instances of a specific model from massive amounts of environmental data, such as finding a person’s face occurring in a large number of picture frames in various resolutions, lightings, and so on.
  • Synthesis constructs new instances of the models, allowing what-if scenarios or projecting the model in new environments.

Consider the following example, which is an actual software project one of our research teams developed with RMS and tera-scale computing in mind.

If you want to see sports highlights of your favorite team, you have to wait for the sports segment of your local TV news to come on, or visit a sports website and watch a video playing in a small window. Sports summarization takes hours for computer vision software to mine the hundreds of thousands of video frames for a short segment of action. With a tera-scale processor, it could be done in real-time as the game plays. You decide what to summarize – sport, team, player – the recognition code creates models from a frame, and the mining code finds instances of those models through the rest of the frames, combining them in a summary ‘reel’ for you.

But what about the synthesis part?

We’ve demonstrated RMS in a motion capture research application that recognizes a person and his movements in a 3D space using four cameras and no markers on the person’s body, extracts a skeletal model of the person, and then uses ray tracing to synthesize the model in an entirely new environment, with lighting, shadows, and a new skin. Today, we have to do this offline. With a tera-scale processor, we could do it all in real-time.

Really Interesting Applications

Imagine the possibilities of RMS applications on tera-scale computers. These kinds of applications could have profound impacts on education and training, entertainment, scientific research, and birthday parties.

With tera-scale computing and RMS applications:

Learners could be immersed into an environment, and their real actions part of the scenario – the ultimate learn by doing approach.

Game players actually become part of the excitement and adrenalin of the story without wearing a motion sensing device.

Consolidating 50 years worth of photos and home movies into a few minutes for a family member’s birthday celebration could be done at home in a short while.

Of course, there are many more possibilities, such as real-time analytics impacting government, energy, and retail; personal health visualization in medicine, and a host of other industries. The really interesting applications for computing have yet to be imagined. Tera-scale computing will enable the innovators.

Intel’s Tera-scale Research Program has taken the first steps. There’s still a ways to go.

Enabling Tera-Scale

The level of computing at tera-scale cannot be done with just a few cores, or a few multi-core processors. Tera-scale computing requires tens and hundreds of cores working in parallel to handle the terabytes of data at TFLOP rates. Supporting those cores will require some new and unique technologies to keep them from starving for memory access and I/O bandwidth, or waiting for messages to pass among the core array. The Tera-scale Research Program’s teams are working on some of these issues, including a new approach to a stacked memory/processor package, integrating a new network-on-chip, and exploring optical signaling. But the real enabling of tera-scale computing will come in the cool codes required to run massively parallel processing on many-core chips. That means changing the way software is designed today, from the BIOS code to virtual machines, operating systems, and end-user applications.

The Future is Parallel

Many-core chips, parallel processing, and tera-scale computing require a paradigm shift. But that shift gives us the next level in what computing can and will do for our world. It places many challenges before us and opens a vast horizon of opportunities. Think in terms of when PCs first entered the marketplace decades ago and the inspiring applications that followed.

What will future tera-scale workloads look like? What part of these workloads can be parallelized? And how will they benefit on a tera-scale processor and platform? The tera-scale research teams at Intel have engaged with industry and academia to explore these topics. < /p>

RMS offers some exciting possibilities. At Intel, we’ve developed several RMS research application codes and primitives, and we’re offering some of them for public research use. They will be combined with many codes developed by leading thinkers and software architects interested in tera-scale research.

Beyond RMS, our research also shows significant performance potential for real-time analytics codes in finance. Others see the potential for tera-scale capabilities in AI, machine-learning optimization, and prediction.

Today, some existing codes can be parallelized. Many others cannot without a major effort. Thinking massively parallel processing from the beginning of software development is a requirement for tera-scale computing. But therein lies the challenge. Parallelizing is not necessarily trivial. It’s an iterative process that will require new tools, optimizers, and compilers. Intel is engaging with research, academia, and industry to spur efforts to discover new parallel programming techniques, parallelizable algorithms, and tools.

Tera-scale computing will require new tera-scale parallel benchmarks to test hardware and software performance. The current benchmarks are not optimized for many-core, tera-scale computing.

These are all areas needing further work to accelerate the development of tera-scale computing.

Tera-scale research – from circuits to solutions

The challenges to develop and fully utilize the future tera-scale platforms can be mind-boggling, in both hardware and software terms. But, the opportunities and benefits from putting these computing capabilities into the hands of future customers can be equally mind-bending. The Tera-scale Computing Research Program is preparing for the future of Intel, looking out between five and ten years. That’s when tens or hundreds of cores will work together in one system. To be ready, we are committed as a team to make advances in:

  • Microprocessor research – We’re developing highly scalable, multi–core architectures; new types of generalized and specialized processing cores; and scalable, reliable on-chip networks, and exploiting state-of-the art process technology and packaging.
  • Platform research – We’re provisioning commensurate amounts of memory and I/O bandwidth, adjusting the memory hierarchy (multi-level, adaptive caches) to meet the changing needs of hundreds of running threads, and evolving network protocols, virtualization, and trust models to work effectively at the tera-scale.
  • Software research – We’re studying future workloads to drive architectural designs, by creating new models of resource allocation and scheduling, and developing new programming tools and techniques to make highly-threaded and data-parallel applications easy to write, debug, and tune.

Making the Future Together

We are on the verge of a whole new era of possibilities in computing with many-core processors, new technologies, and parallel applications that are bringing massively parallel processing and tera-scale computing to the realm of the desktop. As an industry, we are beginning to define these possibilities today; there’s still a lot of work to be done.

We need to think ‘parallel.’ We need to more fully und erstand the future workloads that might run on a tera-scale processor. We need to develop new algorithms, programming techniques, and tools, and more codes and primitives. We need a whole new suite of benchmarks. We need new operating systems and virtualization software that manage the many cores for performance, reliability, and security.

It’s an industry effort to bring tera-scale computing to its fullest potential. Let’s make the future together.

Additional Resources

Please visit the following links for more information on tera-scale computing and Intel’s research into tera-scale processing.

About the author:


Justin Rattner is an Intel Senior Fellow and director of Intel's Corporate Technology Group. He also serves as the corporation's chief technology officer (CTO). He is responsible for leading Intel's microprocessor, communications and systems technology labs and Intel Research.

In 1989, Rattner was named Scientist of the Year by R&D Magazine for his leadership in parallel and distributed computer architecture. In December 1996, Rattner was featured as Person of the Week by ABC World News for his visionary work on the Department of Energy ASCI Red System, the first computer to sustain one trillion operations per second (one teraFLOPS) and the fastest computer in the world between 1996 and 2000. In 1997, Rattner was honored as one of the Computing 200, the 200 individuals having the greatest impact on the U.S. computer industry today, and subsequently profiled in the book Wizards and Their Wonders from ACM Press.

Rattner has received two Intel Achievement Awards for his work in high performance computing and advanced cluster communication architecture. He is a longstanding member of Intel's Research Council and Academic Advisory Council. He currently serves as the Intel executive sponsor for Cornell University where he serves on the External Advisory Board for the School of Engineering.

Rattner joined Intel in 1973. He was named its first Principal Engineer in 1979 and its fourth Intel Fellow in 1988. Prior to join ing Intel, Rattner held positions with Hewlett-Packard Company and Xerox Corporation. He received bachelors and masters degrees from Cornell University in Electrical Engineering and Computer Science in 1970 and 1972, respectively.

有关编译器优化的更完整信息,请参阅优化通知