 |
Dear Developer,

In this month's issue you will find an engaging interview with Intel's Dr. Michael McCool on the motivation for starting RapidMind.
Also, the Ct team is preparing the following events for SC09 in Portland:
- Monday, November 16 evening through 4:00 p.m. Thursday, November 19 Ct demos: Ct folks will be available in the Intel booth in the developer products tools demo pod (ask for pod #13), demoing Ct examples. Come meet with the minds behind Ct technology from Intel's Software Development Products group in the Intel booth: Anwar Ghuloum, Chief Ct architect; CJ Newburn, Ct architect; Michael McCool, parallel software architect, RapidMind cofounder and multicore programming expert, and Stefanus Du Toit, Ct developer, RapidMind cofounder, and chief developer. They will all be at SC09 and happy to answer your questions about Ct today, the RapidMind technology integration, and Ct's future.
|
|

 |
|

“ |

We founded RapidMind at the beginning of the multicore era. We saw that traditional serial processors were getting less and less benefit from increasing transistor density and that their performance had reached a plateau. At the same time, we saw great performance from parallel processors like GPUs
”

Click to read more >

|
|
|
|
 |
|

- Tuesday, November 17, at 8:30 a.m. Keynote: Delivering the Opening Address at SC09 will be Intel's Justin Rattner speaking on "The Rise of the 3D Internet: Advancements in Collaborative and Immersive Sciences."

- Wednesday, November 18, at 3:30 p.m. Ct in the Intel booth theater: We hope you will join us in the Intel booth theater, when Ct engineer CJ Newburn and parallel software architect Michael McCool will deliver a session on what it takes to develop a Ct application.

- Wednesday, November 18, at 5:30 p.m. Ct technology BoF: Join us for a Ct-focused Birds of a Feather session, where we will discuss with attendees our plans for Ct this year and beyond.

- Wednesday, November 18, at 1:30 p.m. Interested in cluster-specific developer productivity and application performance? Stop by this session by William (Bill) Magro.

Last month, we were at the Intel Developer Forum (IDF) with a lively data parallel session, delivered by Ct's chief architect Anwar Ghuloum and technical consulting engineer Amanda Sharp, which was well attended with more than 90% of the attendees claiming they learned something new! If you missed IDF, you can still review the session content, "Simplifying Data Parallel Applications for Your Manycore Future," and Anwar Ghuloum's IDF interview (included in the Intel® Software Network Teach Parallel!" video series).

Most readers have applied for the Ct beta program, but if you have not there is still time. Find out more, fill out the registration application at the updated Intel's Ct Technology website, and register for beta consideration. We are still accepting and reviewing applicants for potential inclusion in the Ct beta engagement program.

If you have any questions, please do not hesitate to contact us.

Missed our previous newsletter on the Intel Data Parallel website? You can find it here: Intel Ct Newsletter.

Sincerely,

Rita Turkowski Product Marketing Parallelization Products Intel Developer Products Division http://software.intel.com/en-us/blogs/author/rita-turkowski/



An interview with Dr. Michael McCool:

The RapidMind Cofounder Shares Insights and His New Goals at Intel

What was your vision (along with Stefanus Du Toit) for starting RapidMind?

We founded RapidMind at the beginning of the multicore era. We saw that traditional serial processors were getting less and less benefit from increasing transistor density and that their performance had reached a plateau. At the same time, we saw great performance from parallel processors like GPUs, but in limited domains and with very restrictive programming models. GPU computing was promising but only accessible to graphics or computer architecture experts. We anticipated that mainstream processors would have to go parallel to continue to scale, and that GPUs would likely become more CPU-like, but that there would be an ongoing challenge to write applications that could use both types of processors effectively. It was also clear that there was going to be a big gap between the capabilities of these processors and the ability of most programmers to use them effectively. Our vision was to support the use of these new, massively parallel architectures by a broad range of application developers, and to simplify the process so that developers could focus on algorithms, not mechanisms. We also felt that with the technological approach we had developed, we would be able to enable both efficient code and efficient development of that code.

What industry problem were you trying to solve that, say TBB or OpenMP or even CUDA, could not solve? I assume this is productivity for mainstream developers?

We often summed up the value of our platform in the "three Ps": performance, portability, and productivity. We were not targeting one of these to the exclusion of the others; we wanted to maximize their combination. We wanted a system that could produce reasonably efficient implementations from relatively naive specifications of computations. We also saw a need for portability between hardware architectures, which is not only an issue for today's deployment, but also for migration of code over time to new processors. Satisfying the three Ps is crucial for adoption by mainstream software developers.

Compared to the alternatives, RapidMind is considerably higher-level than OpenCL or CUDA. Code written in RapidMind, in fact, is often shorter and easier to understand than equivalent serial code, since its interface is designed with the application writer's goals in mind. In contrast, OpenCL and CUDA both require the specification of many hardware-dependent details that make it difficult to understand, port, and maintain programs. Sometimes you do need that control, but with RapidMind we took a layered approach. You could start with a simple implementation and then, if profiling deemed it would be useful, could drill down and add as much detail as necessary to tune performance. We also tried to design the system so that the simple, obvious way of writing an algorithm would be, as often as possible, the right way for performance. For example, many constructs in RapidMind support good data locality, which is often crucial for scalability and performance.

Relative to TBB and OpenMP, RapidMind provided two benefits. First, the RapidMind platform is data-centric. Many task-parallel programming models manage only the task and assume the data will come along for the ride. However, data management is just as important for scalability as parallelism, and many issues of safety and maintainability revolve around access to data. Also, to support accelerators, which have separate address spaces, a strong data abstraction is required. Second, the RapidMind platform uses dynamic code generation. This turns out to be useful for two reasons. First, the platform could adapt to the specific instruction-set architecture (and even cache architecture) of the platform the code is running on. Even within x86 processors, even within Intel, there are many microarchitectural variants and vector instruction set extensions, and so different optimizations are suitable for different variants. Second, in many languages and especially in C++, it is hard to compile out the overhead of modularity. Developers want to write modular code for maintainability, but things like virtual function pointers in C++ can negatively impact performance. With a "staged compiler," enabled by dynamic code generation, we could let developers have their cake and eat it too.

In summary, we wanted to enable the development of modular, highly maintainable code by mainstream developers that would still be efficient and make good use of these emerging massively parallel architectures.

How did you convince investors?

To be truthful, it was difficult. When we started, the need for mainstream parallel computing was not as obvious as it is today, when the trends toward massive parallelism everywhere are clear to everyone. Secondly, developer tools, while crucial to the industry as a whole, are not in themselves a large market. We ultimately settled on a business model that combined professional services, library development (that is, solving specific problems in particular domains with pre-packaged solutions), with development and sale of the platform itself. These worked together in that our customers (or we) were able to build solutions with capabilities not possible without the capabilities of the platform. One key factor that convinced investors was some early successes with this business model. We also demonstrated early on that our technical approach provided significant benefits.

Did you have a market application case study early on that you used to showcase the corporate vision, or do you have a great customer success story/testimonial?

Two in particular are worth mentioning.

First, RTT AG in Germany built a real-time raytracer using RapidMind technology nearly four years ago. This was a very advanced raytracer that ran on multiple processors (including GPUs and the Cell processor) and included important capabilities like user-specified programmable shading (which requires dynamic code generation). This raytracer is used for real-time automotive visualization and is crucial in the design of reflective and refractive components like headlamps.

Secondly, we have had a lot of success in medical imaging. One of our clients, Medipattern, has deployed a system using RapidMind that achieves almost a 10x speedup on an ultrasound breast-cancer screening application. This is actually a hybrid solution combining GPU and CPU computing, and involves many stages of image processing, some of them surprisingly irregular. The CPU tends to be good at irregular computations, the GPU at regular computations. With RapidMind, the functions could all be written once and then assigned to the processor on which they would run the best.

What are you hoping to accomplish now that you are at Intel?

I am hoping to tie three things together: applications, hardware design, and the software that bridges the gaps between them. For maximum benefit, the hardware and software stacks need to be designed with the application programmer's point of view and needs in mind. RapidMind had the goal of bridging this gap and I plan to continue working on that bridge.

You can find Dr. McCool's past blog entries here and check out his informative first Intel blog here.

|
|
 |