Two-dimensional dense averaging.

Two-dimensional dense averaging.

I have written function, its work is to add elements in each column and return container with average column's values. But it spends a little more time than non-optimized "linear" version.

Is there any way to optimize\\parallelize this function?

void Average (const dense &a, dense &res)
	_if (res.length()==a.num_cols())
		_for (usize i=0, i)a.col(i))/(i16)a.num_rows();
		} _end_for
	} _end_if
4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

The _for loop is a sequential loop. It is not parallelized. You should apply add_reduce to all columns of the 2D dense at the same time, like this:

dense sum = add_reduce(a, /*level=*/1);
res = sum / a.num_rows();

See the API documentation for the usage of add_reduce.


The OP was also including a cast from i8 to i16 (presumably to avoid saturation of the sum).


Hello everybody,

Speaking about optimization, if your 2-D data set is a big enough I would try:

-Transpose the data; Use, for example, a Diagonal Transpose since it is In-Place based anddoesn't need any additional memory;

- Then, apply your processing for rows and it will reduce number of cache misses;

- And finally, transposethe data setback if you need it for some another processing;

Best regards,

Leave a Comment

Please sign in to add a comment. Not a member? Join today