SC10 Tutorial: Using Intel® Array Building Blocks for Efficient Development of Multicore Applications

Download the Tutorial [PDF]

October 2012: This WhatIf project has been retired, but this page remains for historical/archival purposes.

This in-depth tutorial was delivered at SC10. SC10 is the International Conference for High Performance Computing, Networking, Storage, and Analysis.

ABSTRACT

Intel® Array Building Blocks (Intel® ArBB) supports a high-level, generalized, and portable programming model for data-parallel programming. Programmers can express algorithms in terms of operations on collections of data, rather than focusing on low-level implementation details. The deterministic semantics of Intel ArBB avoids race conditions and deadlocks by design, improving reliability and maintainability. In this tutorial, we will introduce Intel ArBB’s programming and execution model. We will provide an in-depth guide to the basic building blocks of Intel ArBB:

  • scalars
  • dense and sparse collections
  • collective operations
  • elemental functions
  • control flow

We describe how Intel ArBB can be used to express different levels of abstraction. Based on real-world scientific codes and other examples, we then show how to construct data-parallel algorithms from these basic building blocks. The tutorial will include a demonstration of performance and scalability as well as performance optimization of Intel ArBB applications.

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

's picture

In the tutorial (attached PDF), the example programs are not optimized to performance or C++ notations. For example, on slide 45, the following function definition uses pass by value rather than by reference:

void vecsum(dense<f32> a, dense<f32> b, dense<f32>& c)

This could have been changed to:

void vecsum(const dense<f32>& a, const dense<f32>& b, dense<f32>& c)

This would help improve performance when dealing with large arrays.

On slide 43, the my_class has been defined to have "operator+" as a member function. It is preferrable to have binary operator overloading functions as global friend functions rather than member functions since they allow flexible syntax as below:

Expressions like (1 + my_class_object) and (my_class_object + 1) are possible if you define global member functions as:
myclass operator+(const myclass& val1, int val2);
myclass operator+(int val1, const myclass& val2);

But they are not possible if you define operator+ as member function of my_class.

I hope that these patterns are not part of ArBB library and are present only in the tutorial.

Michael McCool (Intel)'s picture

Actually, the first example (using by-value container arguments) is correct, and performant. When ArBB dense collections are passed by value as shown, only constant-sized headers are copied, not the actual data. These objects are actually just "handles" and the storage of the data is managed separately by-reference. The semantics is "as if" the data were passed by value, though, since this is easier to reason about. For example, it is perfectly reasonable (and performant) to declare a dense<T> in a function and "return" it, just like a scalar value.

In general, although the semantics of dense collection assignments are by-value, actual copies are optimized away. The exception to this is the case of aliases in conjunction with random access in map functions. These are automatically detected and copies are made to avoid read-write race conditions during parallel execution.

There is a very, very, small benefit during program capture time (when the computation expressed by this function is captured by the system at runtime and translated into machine language) using const references here instead of pass-by-value. However, this is a tiny one-time upfront cost and is constant time. The cost is NOT dependent on the size of the array. Note that ArBB uses its own code generator, not C++'s. The C++ functions in these examples only serve to express the semantics of the computation, not the implementation.

For clarity we chose to use pass-by-value in our examples. In fact, during the talk we used the fact the function was defined this way to discuss how this choice does NOT affect performance in ArBB. But if you want to avoid confusing students while also talking about STL containers, you can certainly use const references instead.

Your point about the operators is well-taken, I will double-check that this example is expressed in the best way. Generally we prefer friend functions, but there may be other constraints in this particular example.

Michael McCool, Principal Engineer Software Services Group, Intel
blesteralum.mit.edu's picture

Is there a video of the actual tutorial available for public viewing? How about other published documents that can fill in the gaps left by the pdf slides?