Two-dimensional dense averaging.

Two-dimensional dense averaging.

I have written function, its work is to add elements in each column and return container with average column's values. But it spends a little more time than non-optimized "linear" version.

Is there any way to optimize\\parallelize this function?

void Average (const dense &a, dense &res)
{
	_if (res.length()==a.num_cols())
	{
		_for (usize i=0, i)a.col(i))/(i16)a.num_rows();
		} _end_for
	} _end_if
}
4 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项
Zhang Z (Intel)的头像

The _for loop is a sequential loop. It is not parallelized. You should apply add_reduce to all columns of the 2D dense at the same time, like this:

dense sum = add_reduce(a, /*level=*/1);
res = sum / a.num_rows();

See the API documentation for the usage of add_reduce.

jimdempseyatthecove的头像

Zhang,

The OP was also including a cast from i8 to i16 (presumably to avoid saturation of the sum).

Jim

www.quickthreadprogramming.com

Hello everybody,

Speaking about optimization, if your 2-D data set is a big enough I would try:

-Transpose the data; Use, for example, a Diagonal Transpose since it is In-Place based anddoesn't need any additional memory;

- Then, apply your processing for rows and it will reduce number of cache misses;

- And finally, transposethe data setback if you need it for some another processing;

Best regards,
Sergey

登陆并发表评论。