re-using functions from other functions

re-using functions from other functions

Imagen de jasno

I was looking to multiply two arrays together to get an output array containing
different proportions of the first 2 arrays which works as below.

void multScaleVV(arbb::dense& result,
arbb::dense in1, arbb::dense in2, arbb::f32 factor) {

arbb::f32 inv = 1.0 - factor;
result = (in1 * inv) + (in2 * factor);

return;
}

Then, I wanted to compile a 2D array by a 1D array (one line at a time where the
number of columns in the 2D array was the same as the size of the 1D array) similarly to
above. I thought I could re-use the above by writing something like below

void multScale(arbb::dense& result,
arbb::dense in1, arbb::dense in2, arbb::f32 factor) {

arbb::i32 i;
_for (i = 0, i < value(in2.num_rows()), i++) {
call(multScaleVV)(result.row(value(i)),in1,in2.row(value(i)),factor);
} _end_for

return;
}

But this will not compile. I eventually tracked it down to the result.row(value(i)) bit not returning
something that it could match to arbb::dense& (it wouldn't allow it to be modifiable)
So then I tried the version below

void multScale(arbb::dense& result,
arbb::dense in1, arbb::dense in2, arbb::f32 factor) {

arbb::dense tmpRes(in2.num_cols());
arbb::i32 i;
_for (i = 0, i < value(in2.num_rows()), i++) {
call(multScaleVV)(tmpRes,in1,in2.row(value(i)),factor);
result.row(value(i)) = tmpRes;
} _end_for

return;
}

Which won't compile as you can't use the result of locally allocated memory (or something similar). Is there
a straightforward solution to this that I am overlooking ?

--
jason

publicaciones de 5 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Zhang Z (Intel)

Jason,

First of all, reusing the function for 1D arrays to compute results for 2D arrays isn't a good idea. The _for loop is a serial loop and all iterations run sequentially. You do not get the maximum parallelism that you should be able to get in this case. The core technique in writing ArBB code is to express parallelism as much as possible. So in this case, the best and the simplest way is to define another function for 2D arrays:

void multScaleVV2D(arbb::dense& result,
                arbb::dense in1, arbb::dense in2, arbb::f32 factor) {

    arbb::f32 inv = 1.0 - factor;
    result = (in1 * inv) + (in2 * factor);

    return;
}

This function is almost the same as the 1D version. You simply change the dense type from 1D to 2D. If you do not want to maintain duplicated code for 1D and 2D containers, then you can write a function template, with the dimensionality as a template parameter:

template
void multScaleVV(arbb::dense& result,
                arbb::dense in1, arbb::dense in2, arbb::f32 factor) {

    arbb::f32 inv = 1.0 - factor;
    result = (in1 * inv) + (in2 * factor);

    return;
}

Then, to operate on 1D arrays, you can write

call(multScaleVV<1>)(...);

Similarly, to operate on 2D arrays:

call(multScaleVV<2>)(...);

Now, get back to the problem of compilation errors. Your code does not compile because "result.row(i)" returns a read-only copy of row i. You cannot write to it. Even if you could write to it, you're writing to a copy of the row, instead of the row itself. In a situation where you do need to modify rows of a 2D container individually, you should use the arbb::replace_row() function. For example:

dense result;    
dense newVals;
usize i;

......

result = replace_row(result, i, newVals);

This replaces the i-th row in "result" with "newVals". As I noted in the beginning, however, you actually do not need to do this in your particular case.

Zhang

Imagen de jasno

Hi Zhang,

Thanks for your helpful comments, I had overlooked that _for was serial and had assumed it
was some magic that allowed for parallelization rather than a macro to allows the use of arbb scalars.

As to your examples, I undestood how to create a templated version of my function (for multiplying 1D
arrays) to allow multiplying of Nd arrays, the actual thing I was trying to do was multiply a 1D array
and a 2D array to give a 2D array where the 1D array size is the same width as the 2D array. The
multiplication was for each row of the 2D array to be multiplied by the 1D array to give the corresponding
row in the output 2D array. To do this I was attempting to peal off one row at a time from my input
2D array and do the multiplication using the original function, assiging the result to the appropriate
row of the output 2D array. Obviously I was incorrect making use of the _for to get the compiler
to parallelise the calling of the function since each time round the loop is entirely independent
(loop unrolling?).

I guess my only options are to do this serially, or to create a temporary 2D array using repeat_row from
my 1D array, do the multiplication to create my 2D result and then discard my temp 2D array ? I guess
its a trade off of the cost of generating the 2D array and being able to do the computation in parallel
versus the time take to do a row at a time (in parallel).

--
jason

Imagen de Zhang Z (Intel)

Jason,

I think using a temporary 2D array should lead to better performance than serially multiplying a bunch of 1D arrays. arbb::repeat_row() should not be a performance bottleneck in this case. If it is then we have a performance bug. It will be interesting to see what kind of overall performance you're getting by using arbb::repeat_row(). Please keep us updated.

Thank you very much.

Zhang

Imagen de jasno

Hi Zhang,

creating a temporary 2D array is orders of magnitude faster than the other method.

--
jason

Inicie sesión para dejar un comentario.