Developer Guide and Reference

Contents

Introduction to the SIMD Data Layout Templates

SIMD Data Layout Templates (SDLT) is a C++11 template library providing containers that represent arrays of "Plain Old Data" objects (a struct whose data members do not have any pointers/references and no virtual functions) using layouts that enable generation of efficient SIMD (single instruction multiple data) vector code. SDLT uses standard ISO C++11 code. It does not require a special language or compiler to be functional, but takes advantage of performance features (such as
OpenMP* SIMD extensions
and
pragma ivdep
) that may not be available to all compilers. It is designed to promote scalable SIMD vector programming. To use the library, specify SIMD loops and data layouts using explicit vector programming model and SDLT containers, and let the compiler generate efficient SIMD code in an efficient manner.
Many of the library interfaces employ generic programming, in which interfaces are defined by requirements on types and not specific types. The C++ Standard Template Library (STL) is an example of generic programming. Generic programming enables SDLT to be flexible yet efficient. The generic interfaces enable you to customize components to your specific needs.
The net result is that SDLT enables you to specify a preferred SIMD data layout far more conveniently than re-structuring your code completely with a new data structure for effective vectorization, and at the same time can improve performance.

Motivation

C++ programs often represent an algorithm in terms of high level objects. For many algorithms there is a set of data that the algorithm will need to process. It is common for the data set to be represented as array of "plain old data" objects. It is also common for developers to represent that array with a container from the C++ Standard Template Library, like std::vector. For example:
struct Point3s { float x; float y; float z; // helper methods }; std::vector<Point3s> inputDataSet(count); std::vector<Point3s> outputDataSet(count); for(int i=0; i < count; ++i) { Point3s inputElement = inputDataSet[i]; Point3s result = // transformation of inputElement that is independent of other iterations // can keep algorithm high level using object helper methods outputDataSet[i] = result; }
When possible a compiler may attempt to vectorize the loop above, however the overhead of loading the "Array of Structures" data set into vector registers may overcome any performance gain of vectorizing. Programs exhibiting the scenario above could be good candidates to use a SDLT container with a SIMD-friendly internal memory layout. SDLT containers provide
accessor
objects to import and export Primitives between the underlying memory layout and the objects original representation. For example:
SDLT_PRIMITIVE(Point3s, x, y, z) sdlt::soa1d_container<Point3s> inputDataSet(count); sdlt::soa1d_container<Point3s> outputDataSet(count); auto inputData = inputDataSet.const_access(); auto outputData = outputDataSet.access(); #pragma forceinline recursive #pragma omp simd for(int i=0; i < count; ++i) { Point3s inputElement = inputData[i]; Point3s result = // transformation of inputElement that is independent of other iterations // can keep algorithm high level using object helper methods outputData[i] = result; }
When a local variable inside the loop is imported from or exported to using that loop's index, the compiler's vectorizor can now access the underlying SIMD friendly data format and when possible perform unit stride loads. If the compiler can prove nothing outside the loop can access the loop's local object, then it can optimize its private representation of the loop object be "Structure of Arrays" (SOA). In our example, the container's underlying memory layout is also SOA and unit stride loads can be generated. The Container also allocates aligned memory and its accessor objects provide the compiler with the correct alignment information for it to optimize code generation accordingly.

Version Information

This documentation is for SDLT version 2, which extends version 1 by introducing support for n-dimensional containers.
Backwards Compatibility
Public interfaces of version 2 are fully backward compatible with interfaces of version 1.
The backwards compatibility includes: