Is there any way to speed up accumulating the absolute value of 3D dot product
int Sum = 0
for ( int i=0; i Sum+= abs((INT32)S[i*3]*D[i*3] + (INT32)S[i*3+1]*D[i*3+1] + (INT32)S[i*3+2]*D[i*3+2]);
N is in [100-200] range
This is part of the TBB task already, so no need to parallelize this accumulation.
Any ideas if IPP is any help here? Or SIMD intrinsics? Or anything else?
Thank you in advance!