Any sample code about how to use ArBB?

We have a heavy double complex typematrix multiply operation, but the size is small. For example,

[a1,a2, ... a12]T*[a1', ... a12'] where a1 is double complex type, a1' isconjugate complex number of a1.

we need to do20012x1 multply 1x12 =12x12 operationin 6us each time and keep doing in over hours.

Issuch operation suitable to use ArBB?


Here is a short summary of points to look at when coding for a specific target (performance):

  1. Is this a specific well-known math function? In this case it is easy to achieve best performance using specialized fixed-function provided by (high-performance) libraries such as Intel MKL.
  2. Is the application incorporating the math/operations in a custom (non-separate) way, or is it performing operations which are not packaged by a fixed function library? In this case Intel ArBB is a very good choice since operations are fused using a global scope.

Another note on the implementation of above question is, the execution time (6 s) also depends the hardware used. Beyond this, ArBB can easily express such an application and is prepared with all primitives needed to do complex number arithmetic. In short, ArBB can use the std::complex<> class template to naturally express complex numbers and their arithmetic.

