Parallel Programming Talk - Our First Listern Question Show, "Automatic Parallelization?"

Today was my first solo hosting of the Parallel Programming Talk show. Take a listen and let me know what you thought. Not as funny as most but I hope that it was just as jam packed with as much news and information for the Parallel Programming Developer Community.



Download the show.

The show was our first “listener questions” show and we selected from those of you faithful listeners that submitted questions to parallelprogrammingtalk@intel.com. Keep your questions coming. Clay and I will keep reading and we look forward to reading your question online on a future first Tuesday listener question episode.

Our top news story for today was that over 1000 Universities have joined the Intel Academic Community. Take a moment to check out the community that delivers cutting edge parallel programming and visual computing curriculum to universities worldwide. As you are all aware most new processors shipped are multi-core and software development requires a paradigm shift from serial to parallel programming. As this industry shift occurs, Intel is committed to preparing the future software development workforce worldwide to "think parallel," by providing professors with curriculum, research, training, access to new technologies and providing a professional network of academic peers for Computer and Information Technology faculty.

We also published a two part article titled “Best Practices for Developing and Optimizing Threaded Applications” by Shwetha Doss and  John O’Neill, Ph.D. The paper starts by discussing how Microprocessor design is experiencing a shift away from a predominant focus on pure performance to a balanced approach that optimizes for power as well as performance. Multi-core processors are capable of greater performance with optimal power consumption by concurrently sharing work and executing tasks on independent execution cores. One technique to fully utilize multi-core processors is to thread the application to enable it to run on multiple processor cores. While threading can be a challenge, new software development tools help simplify the process by identifying thread correctness issues and performance opportunities. The two part paper presents a methodology that has been used to successfully thread many applications and discuss tools that can assist in developing multithreaded applications. Part 1 & Part 2.

Today’s question was submitted by Dean who writes:
“Hi Guys!
I've been doing C++ for quite some time now and in my college days I
was fortunate enough to be able to do research on parallel computing.
I have run into quite some literature about tools that automatically
parallelized code -- some were meant to be parallelizing compilers --
and was wondering if that's still relevant today.

Do you know of some products that actually do automatic
parallelization effectively? And if not, do you think they would be
useful today or in the future?

I understand that there are compilers out there (the Intel compiler
included) which perform automatic vectorization at a low level, but
have you heard of compilers that are able to actually use something
like OpenMP or CUDA/OpenCL to automatically parallelize code?

Thanks, and please keep up the great work with the show!”

Thanks for the great question and kind words for the show. To answer the question I sat down with Ganesh Rao from the Intel Software Developer Products Division Compiler team.

Ganesh reviewed the current auto parallelization capabilities in the Intel C/C++ compiler and how software developer can take advantage of turning the feature on for “embarrassingly parallel loops” such as a loop to calculate the log of an array of variables. For more information on the Intel C/C++ compiler please refer to the product page, users guide (that includes details on /Qparallel (Windows) and the –parallel (Linux, Mac)). You are also encouraged to download an evaluation copy of the Intel C/C++ compiler. One interesting fact that I discovered during our conversation is that the auto parallization preformed by the Intel C/C++ compiler utilizes some of the same optimized libraries as OpenMP. To sum things up auto parallelization is great if your codes performance hot spots are from “embarrassingly parallel loops” but to get truly scalable parallel code it needs to be architected from the beginning to be optimized for parallel architecture. Read more about optimizing for embarrassingly parallel loops in the Intel Knowldege Base. I hope this answers your question Dean. Thank for your input we look forward to hearing from you and others in the future.

Our next Parallel Programming Talk show is at 8:00AM PST (1600 GMT) on March 10th.
Join Clay Breshears for a chat with Michael Mallen, Chief Executive for Marketing & Business Development at Virtual Parallel Systems. VPS produces Cores Unlimited™, a true parallel processing platform bringing dramatic performance gains at both the core and application level. Learn more about the company at http://www.VPSthg.com. Plus the News and all the regular wackiness you've come to expect.










Optimization Notice

Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors.  In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors.  For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel® Compiler User and Reference Guides” under “Compiler Options."  Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors.  While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors.


Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.  These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations.  Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.  Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.


While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements.  We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.


Notice revision #20101101


For more complete information about compiler optimizations, see our Optimization Notice.