# PDE Stencil and efficiency of the treament of boundary conditions.

## PDE Stencil and efficiency of the treament of boundary conditions.

Hi

Solving PDE, the stencil possibility in arbb is useful. My question is about the vectorization/parallelization of the boundary conditions where the stencil doesn't apply.

For example i use the stencil in that way :

// first the stencil

// ********************

void stencilMultiply( const arbb::f64& MatrixSubD0, const arbb::f64& MatrixSubD1, const arbb::f64& MatrixSubD2,const arbb::f64& MatrixD0, const arbb::f64& MatrixD1, const arbb::f64& MatrixD2,const arbb::f64& MatrixAbovD0, const arbb::f64& MatrixAbovD1, const arbb::f64& MatrixAbovD2,arbb::f64& vec_in_out)

{

arbb::f64 LB = neighbor(vec_in_out,-1,-1);

arbb::f64 B = neighbor(vec_in_out,0,-1);

arbb::f64 RB = neighbor(vec_in_out,1,-1);

arbb::f64 L = neighbor(vec_in_out,-1,0);

arbb::f64 C = vec_in_out;

arbb::f64 R = neighbor(vec_in_out,1,0);

arbb::f64 LT= neighbor(vec_in_out,-1,1);

arbb::f64 T = neighbor(vec_in_out,0,1);

arbb::f64 RT = neighbor(vec_in_out,1,1);

vec_in_out = MatrixSubD0*LB +MatrixSubD1*B + MatrixSubD2*RB+

MatrixD0* L + MatrixD1* C + MatrixD2* R + MatrixAbovD0*LT +

MatrixAbovD1* T + MatrixAbovD2*RT;

}

/// Then Matrix vector product

////************************************

void multiplyRESTRICT( const arbb::dense & vec_in, arbb::dense & vec_out,

const arbb::dense & MatrixSubD0, const arbb::dense & MatrixSubD1,

const arbb::dense & MatrixSubD2, const arbb::dense & MatrixD0,

const arbb::dense & MatrixD1, const arbb::dense & MatrixD2,

const arbb::dense & MatrixAbovD0, const arbb::dense & MatrixAbovD1,

const arbb::dense & MatrixAbovD2 )

{

const arbb::usize nx = vec_in.num_rows();

const arbb::usize ny = vec_in.num_cols();

// inside points

vec_out = vec_in ;

arbb::map(stencilMultiply)(MatrixSubD0, MatrixSubD1, MatrixSubD2, MatrixD0, MatrixD1, MatrixD2,

MatrixAbovD0, MatrixAbovD1, MatrixAbovD2, vec_out);

// boundary

{

vec_out(0,0) = MatrixD1(0,0)*vec_in(0,0)+ MatrixD2(0,0)*vec_in(1,0) + MatrixAbovD1(0,0)*vec_in(0,1) + MatrixAbovD2(0,0)*vec_in(1,1) ;

_for(arbb::usize i= 1 , i < nx-1 , ++i)

{

vec_out(i,0) = MatrixD0(i,0)*vec_in(i-1,0) + MatrixD1(i,0)*vec_in(i,0) + MatrixD2(i,0)*vec_in(i+1,0) +

MatrixAbovD0(i,0)*vec_in(i-1,1)+ MatrixAbovD1(i,0)*vec_in(i,1) + MatrixAbovD2(i,0)*vec_in(i+1,1) ;

}

_end_for;

...........

I decided to avoid section to get all the interiors of all the 2D vectors (to avoid to many copies). So i apply the stencil to all points. Then i correct the boundary points.

The treatment of boundaries implies nest on the boundary points. My question is how to achieve good efficiency of the boundary treatment. For example the last nest in the code.

Thank you very much

3 post / 0 new
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

I'd suggest you replace the _for loop with vector operations. The _for loop runs sequentially but vector operations can be parallelized. For small matrix sizes the difference might be negligible, but vector operations should lead to better performance for large sizes. Please see the code snippet below. Another advantage of using vector operations is that you do not need to deal with the special case of vec_out(0, 0). The shift() operation fills in default values (for f64 it is 0) for out-of-bound references. So the special case is taken care of automatically. As a general rule of thumb, the fewer control flows and special cases you have in your code, the better the ArBB runtime can optimize the code.

```dense MD0 = MatrixD0.row(0);
dense MD1 = MatrixD1.row(0);
dense MD2 = MatrixD2.row(0);

dense vi0 = vec_in.row(0);
dense vi1 = vec_in.row(1);

dense vo = MD0*shift(vi0, -1) +
MD1*vi0 +
MD2*shift(vi0, 1) +