Square Pegs and Round Holes - Choosing the Right Intel® Software Development Tools

Choosing the Right Intel® Software Development Tools

By Ken Strandberg

Download this white paper (PDF 303K)
http://software.intel.com/sites/default/files/m/d/4/1/d/8/Square-Pegs-Round-Holes-White-Paper.pdf

Evolving your serial code to take advantage of task and data parallelism from multiple cores and data parallelism from SIMD capabilities in Intel® processors requires knowing which parallel programming tools to choose for your project. Intel offers many software development tools for parallelism. This article looks at Intel tools and recommends how to choose the right tools for your project and development environment.

Introduction

Parallel application development used to be the domain of expert parallel programmers in specialized compute-intensive industries, such as financials, science and research, and pharmaceuticals. With multi-core processors mainstream for years now, and many-core soon to come, Intel has been helping bridge the gap between these experts and developers for mainstream application markets by working with the software industry to provide and standardize parallel development tools. Intel now offers rich suites of parallel development tools to assist developers to evolve their serial code.

Intel parallelization tools support both inter-core threading for task and data parallelism and intra-core vectorization for SIMD, data parallelism. Vectorization tool support is specific to Intel® architecture and the processor being compiled for, since it takes advantage of the processor’s multiple SSE registers and SSE and (Intel® Advanced Vector Extensions (Intel® AVX)) commands.

Since there is a range of development environments and processes, Intel tools cover the gamut to enable you to take advantage of parallelism regardless of your coding environment. But, with all these tool offerings, how do you know which tool or set of tools will fit into your development environment and help you reach your coding objectives? This article looks at those topics.

To Thread or Not To Thread

Most of the time, threading will give you a benefit. But how much? And where do you thread? Intel offers a couple tools to help you discover your best threading opportunities – and even experiment with guided threaded modeling.

Modeling Parallelism with Intel® Parallel Advisor

  • Languages: C/C++
  • Target OS: Windows*
  • Environment: Microsoft* Visual C++
  • Part of Intel® Parallel Studio for Windows

Do you know whether or not you’ve been threading? With most current Intel® processors, at least two cores are available, and Intel® compilers will try to take advantage of those cores when compiling, even if you don’t explicitly thread.

Whether you’re experienced with or new to threading and creating Windows applications, or if you’re a system architect, Intel® Parallel Advisor, part of Intel® Parallel Studio, is an ideal tool for your workflow. This tool is an efficient threading assistant for threading C and C++-based Windows applications using Microsoft Visual C++. This multi-faceted tool does the following:

  • Analyzes your serial code.
  • Looks for opportunities for parallelism.
  • Experiments with parallel models by notating your code at these opportunistic points.
  • Finds and addresses conflicts such as race conditions and locks based on the notations.
  • Analyzes the benefit of adding the notated parallelism and reports it, so you can see what the cost of threading will be and the benefit gained.
  • Then, if you approve the changes, it adds a parallel framework, showing you where to add parallel code.

Intel Parallel® Advisor helps you make better design decisions regarding threading through experimentation, analysis, and reporting.

Finding Opportunities with Intel® VTune™ Amplifier

  • Languages: C/C++, FORTRAN
  • Target OSs: Windows, Linux
  • Environment: Microsoft Visual C++, Standalone
  • Part of Intel® Parallel Amplifier, Intel® VTune™ Amplifier XE (components of Intel® Parallel Studio and Intel® Parallel Studio XE)

If you do not develop in a Visual C++ environment, or if you are experienced in threading and need to know where your threading opportunities lie, Intel® VTune™ Amplifier is a widely accepted tool for performance and threading analysis to help you optimize your code. Intel VTune Amplifier profiles your code’s performance, identifies where your threads are locking and/or waiting, and much more.

What’s Your Development Model?

Once you know you need to thread and where to parallelize, which Intel tool is best to consider? That decision depends on several aspects of your coding environment and your objectives. This article won’t consider them all in detail. This article focuses on shared memory models for mainstream computing, written in C or C++ for Windows and Linux operating systems (all Intel tools are compatible with Microsoft Visual C++). However, Intel does offer tool suites for distributed memory (clusters) and embedded/mobile parallel application development, and Intel products support threading in FORTRAN.* All Intel tool suites are listed at the end of this article.

This article summarizes Intel tools based on the development environment (libraries, language extensions, compiler restrictions, standards-based or not), threading model (OpenMP,* MPI* or Intel models), supported compilers, and threading experience needed.

  • Development (threading) environment – Since development environments are governed by their development processes, threading environments can vary. Some do not allow dependency on libraries, and only code with language extensions; others allow the use libraries; and some let the developers choose how they want to write their code and can use both.
  • Threading model – Within the environment, threading models can vary, too. Some models must be based on industry standards. Others can apply a wide variety of models. Threading models are also tied to the computing model (distributed versus shared memory). And whether there is a lot of task or data parallelism in the problem being threaded and vectorized.
  • Compilers – While Intel compilers are designed to optimize for Intel processors, not all organizations can use any compiler a developer would like.
  • Required experience – Developers’ threading experiences vary widely. Not all software experts need to be experienced with threading. Algorithm scientists and mathematicians who focus on creating the algorithm don’t necessarily need to be expert at threading, but might want to make sure their algorithm threads easily enough. Other developers focus on fine tuning and optimizing code for a particular processor architecture, and they would need to have more threading experience. Still others write specific vectorization code using Streaming SIMD Extensions (SSE) and Intel® Advanced Vector Extensions (Intel® AVX).

Standards Only, No Libraries

  • Threading Environment: Standards-based language extension
  • Threading Model: OpenMP*
  • Compilers: Intel® C++ Optimizing Compiler, GCC C++ Compiler, Microsoft Visual C++ Compiler
  • Required experience: Low to moderate

If your environment requires you to use only industry standard language extensions and not code with dependencies on other libraries, then OpenMP* is an easy-to-understand compiler-based, industry-accepted standard for task parallelism. It abstracts threads into tasks and allows you to easily add parallelism to for loops with implicit controls to localize variables inside loops, plus other operations. Intel C++ Optimizing Compiler supports OpenMP pragmas for inter-core threading. It does not vectorize your code.

The Intel® Developer Zone offers many resources for OpenMP programming including these:

Easy, Powerful Threading and Vectorization without Libraries

  • Threading Environment: Language Extensions
  • Threading Model: Intel® Cilk Plus
  • Compilers: Intel C++ Optimizing Compiler, GCC C++ Compiler, Microsoft Visual C++ Compiler
  • Required Experience: Very low to moderate

Intel Cilk Plus is a powerful language extension to C and C++. It’s a compiler-based threading model, so you need a compiler that supports Intel Cilk Plus directives. When using the Intel C++ Optimizing Compiler, additional extensions allow you to vectorize code for data parallelism on Intel processors.

There are a lot of reasons to use Intel Cilk Plus:

  • Short ramp-up. It uses a keyword style syntax that spawns threads for intercore threading, which makes it easy to quickly take advantage of task parallelism.
  • Scalable. It includes array extensions for the Intel C++ Optimizing Compiler to add vectorization, so you can integrate data parallelism using SIMD operations on Intel processors.
  • Versatility. It’s supported across today’s compilers: Intel C++ Optimizing Compiler, GCC, and Microsoft C++ Compiler (task parallelism only).

Because it’s easy to get started, yet powerful for high-performance data parallelism, Intel Cilk Plus is good for a range of developers:

  • Developers just learning how to thread, and/or experimenting with evolving their serial or parallel code to achieve more parallel performance without taking a lot of time.
  • Advanced developers in a library extension only environment, who want to take full advantage of both cores and SSE registers in Intel processors and using the Intel C++ Optimizing Compiler.
  • Algorithm scientists, who simply need to make sure their algorithms parallelize well and who do not need to think about or take time to optimize.

Intel Cilk Plus is a component of Intel® Parallel Building Blocks, which is part of Intel® Parallel Composer within the Intel® Parallel Studio, Intel® Parallel Studio XE, and Intel® C++ Studio software development suites.

Powerful, Library-based Threading

  • Threading Environment: Libraries
  • Threading Model: Intel® Threading Building Blocks
  • Compilers: Intel C++ Optimizing Compiler, GCC C++ Compiler, Microsoft Visual C++ Compiler
  • Required Experience: Low to moderate

Intel® Threading Building Blocks (Intel® TBB) is a widely accepted and powerful threaded template library for C and C++. Intel® TBB is good for the developer with some basic threading knowledge, who understands the concepts, because it allows you to effectively manage parallel tasks (instead of the details of threads) with comprehensive tools, but it requires some investment on learning it. Once mastered, Intel® TBB offers a comprehensive repository for task parallelism.

Intel® TBB does not integrate data parallelism (vectorization), like Intel® Cilk Plus, but it does allow you to integrate vectorization through the following:

  • Pragmas
  • Compiler flags
  • API calls from another template library that will vectorize for you
  • Explicit vectorization (writing your own Intel® AVX/SSE code)

Intel TBB is supported by all major C/C++ compilers.

Intel Threading Building Blocks is a component of Intel® Parallel Building Blocks, which is part of Intel Parallel Composer within the Intel Parallel Studio, Intel Parallel Studio XE, and Intel C++ Studio software development suites.

Powerful Threading and Vectorization in a Single Library

Threading Environment: Libraries

Threading Model: Intel® Array Building Blocks

Compilers: Intel C++ Optimizing Compiler, GCC C++ Compiler, Microsoft Visual C++ Compiler

Required Experience: Low

Intel® Array Building Blocks (Intel® ArBB) is both a library and a language extension, a combination of standard C++ library interface and powerful runtime. It blends low-level parallelism constructs with a productivity language, so you can specify or define arbitrary parallel computations, and then at runtime it does all the threading and vectorization for you. It is ideal for applications that require data-intensive mathematical computations such as those found in medical imaging, digital content creation, financial analytics, energy, data mining, science and engineering.

Threaded Performance Libraries

When all you need is to add performance libraries threaded for multicore and SIMD processing, Intel offers performance libraries and language extensions for optimizing your parallel code.

The Intel® Math Kernel Library (Intel® MKL) contains threaded and vectorized functions for scientific computing and other operations.

The Intel® Integrated Performance Primitives (Intel® IPP) provides low-level constructs for data compression, signal processing, and the like.

In addition to the industry standard Streaming SIMD Extensions (SSE), Intel® Advanced Vector Extensions (Intel® AVX) takes advantage of the wider vector registers and other capabilities in the 2nd generation Intel® Core™ processor family architecture. Intel AVX allows you to highly optimize performance for SIMD operations running on 2nd generation Intel® Core™ processors.

Intel Parallel Development Tool Suites

Intel offers different tool suites for developers coding for mainstream applications, high-performance computing (HPC), embedded and mobile devices, Windows applications, Linux systems, Mac OS products, and clusters. These tool sets are:

Components

Each of the above tool suites includes a comprehensive set of tools for evolving code. Intel® Parallel Studio for Windows includes an innovative threading assistant, called Intel® Parallel Advisor, that experiments with threading target serial code and reports the benefits of experimentation, so you understand the cost/benefit before you start threading. Intel® Parallel Advisor, plus the many other tools, comprehend the entire development process: from design and modeling, through coding and compiling, to error checking, and optimizing. Whether you do all the coding yourself, or your development process is distributed into specialties, such as algorithm scientists, programmers, and optimizers, the components of these suites are applicable across the entire development process. Intel tools fit most development models and target operating systems (OSs): language extensions, libraries, or both; C, C++, and FORTRAN; and Windows*, Linux*, Mac OS, real-time OS (RTOS), and Meego*. Table 1 summarizes the different tool components, the Intel product they are part of, applicable Intel® architecture, and target operating systems.

Table 1 . Summary of Intel parallel development tools by architecture and OS
Intel® Architecture and OS
IA-32 and Intel® 64 OSsbc IA-64 OSsbc Intel® Atom™ Processor
Tool Product W L M R W L Me/L
Threading Assistant Intel® Parallel Advisor (part of Intel® Parallel Studio for Windows)
Compilers Intel® C/C++ compiler
Intel® Fortran compiler
Intel® Parallel Composer (part of Intel® Parallel Studio and Intel® Parallel Studio XE)
Performance Libraries Intel® Integrated Performance Primitives (part of Intel® Parallel Studio XE and Intel® C++ Studio XE)
Intel® Math Kernel Library (part of Intel® Parallel Studio XE and Intel® C++ Studio XE)
Threading Tools Intel® Threading Building Blocks (part of Intel® Parallel Building Blocks/Intel® Parallel Composer)
Intel® Array Building Blocks (part of Intel® Parallel Building Blocks/Intel® Parallel Composer)
Intel® Cilk™ Plus (part of Intel® Parallel Building Blocks/Intel® Parallel Composer)
OpenMP* (non-Intel) (part of Intel® Parallel Studio XE)
OpenCL* (non-Intel)
Performance Analyzers Intel® VTune™ Amplifier XE (with Intel® Thread Profiler)
Intel® Parallel Amplifier
Analysis Tools Intel® Thread Checker
Intel® Memory Checker
Static Security Analysis
Intel® Parallel Inspector
Cluster Tools MPI Library (part of Intel® Cluster Studio)
Intel® Trace Analyzer and Collector (part of Intel® Cluster Studio)
Intel® Math Kernel Library Cluster Edition (part of Intel® Cluster Studio)
Intel® Cluster Toolkit (part of Intel® Cluster Studio)
Embedded/Mobile Tools Intel® Application Debugger (part of Intel® Embedded Application Development Tools Suite for Intel® Atom Processor)
Intel® JTAG Debugger (part of Intel® Embedded Application Development Tools Suite for Intel® Atom Processor)

a All products are Intel products unless indicated otherwise.

b W=Windows*; L=Linux*; M=Mac OS*; R=RTOS; Me/L = Meego*/Linux*

c IA-64 = Intel® Itanium Processor architecture

Further Reading

The Intel® Software Network is a rich source for all things parallel. Here are some recommended resources for further reading.

About the Author

Ken Strandberg is principal of Catlow Communications, a technical marketing communications firm (www.catlowcommunications.com). Mr. Strandberg writes a wide range of technical marketing and non-marketing content, video and interactive scripts, and educational programming for emerging technology companies, Fortune 100 enterprises, and multi-national corporations. He writes across a broad range of hardware and software industries. Mr. Strandberg can be reached at ken@catlowcommunications.com.

For more complete information about compiler optimizations, see our Optimization Notice.