Parallelizing with a nested regular and dynamicaly sized data structure

Parallelizing with a nested regular and dynamicaly sized data structure


I am very interested in ArBB as it is to my knowledge the only industrial product providing an abstraction over the memory architecture of the computer.
I am working on a very complex linear algebra solver (used to solve the neutron transport equation in nuclear reactor cores).
The algorithm has been optimized a long time ago for vector machines and completely reimplemented for multicore superscalar processors.
In the current implementation, data locality has been optimized but the code do not use SSE/AVX units.

In order to study the usability of a tool such as ArBB for our case, I have extracted the most computation intensive part of our solver.

Here is a short description of the problem :

We want to solve a linear algebra problem Ax=b where:
x and b are vectors.
A is a bloc-diagonal matrix.
All blocks are banded, symmetric and have the same size.
As A is block diagonal, the Ax=b problem can be decomposed as a collection of smaller independent problems. One for each block of the matrix A.

As these blocks all have the same size, the sequence of operation required to solve the different problem is exactly the same.
Hence, the resolution can be vectorized (considering a "vector" of problems).

The code is written in C++ and we uses classes to represent the matrices and vector.
Internally, the matrices and vectors can be represented as 2D dense arrays.

However, I can't figure out how to express our problem with ArBB : the parallelism is only available over one dimension of these arrays.

The source attached is very short (approx 200 lines including results verification and time measurements). It can be compiled with any linux C++ compiler

I have written a SSE and a AVX version using intrinsics. Performances are very good (x5 for SSE and x8 for AVX) but the code is ugly and not maintainable. Indeed, the data structure has to be completely adapted in order to pack the data and contains padding.

Hence, I'd like to use ArBB for this problem.

I would be very gratefull if you have any suggestion.

Thank you for having read me up to here !

Wilfried K.

Downloadtext/x-c++src Base.cxx7.07 KB
3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.


Thanks for being interested in ArBB. The problem you presented is surely interesting. Please allow us some time to think about it. We'll get back to you as soon as we can.

Of course !

Thank you for your help.

If it helps, I can also post the code for the SSE and the AVX versions

Leave a Comment

Please sign in to add a comment. Not a member? Join today