Parallel Beam Backprojection on Sandy Bridge EP

  • File:DKFZ_Case_Study__Backprojection_with_Sandy_Bridge__2_.pdf
  • Size:366.36 KB


Tomographic image reconstruction is computationally very demanding. In filtered backprojection as well as in iterative reconstruction schemes, the most time-consuming steps usually are the forward and backprojection.

We here present the performance results achieved with a high performance 3D parallel beam backprojection algorithm that was optimized for Intel® microarchitecture codenamed Sandy Bridge EP.

Compared to a “naïve” straightforward implementation. our optimized algorithm uses Sandy Bridge’s enhanced vector capabilities, i.e. its 256 bit vector instruction set Intel® Advanced Vector Extensions (Intel® AVX) to backproject 8 images simultaneously and an optimized memory layout in order to fully exploit the computational power of Sandy Bridge and thereby to reduce reconstruction time.

Backprojection algorithms in CT imaging are bandwidth-limited problems, and therefore choosing an optimal memory layout in terms of cache usage is essential in order to fully exploit the computational power of a given system.

Results show that using a cache-optimized memory layout during the backprojection increases performance by about 300%, as compared to the case where the backprojection is performed with a non-optimized memory layout.

For more complete information about compiler optimizations, see our Optimization Notice.