Fast "sum" routine

Fast "sum" routine

I need to efficiently compute the element sum of a double precision vector (a[0]+a[1]+..a[n-1]) . Is there a routine in MKL for this. The BLAS ?asum compute sum of the magnitudes, unfortunately.


5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Intel compiler optimizations do this effectively.

I am using Intel 9.0, so I gather you are suggesting just a simple "for" loop. Any specific optimization directives I should use?
This does seem like multithreading/paralellization would help here as well...


A loop such as
for(int i=0, sum=0;i < n;++n)sum += a[i];
(with sum as a local variable declared the same type as a[])
should optimize easily. For example, on Xeon or P4, use options
icc -O -xW
or, for an SSE3 machine -xP.
-O1 may be superior to -O2 for loops of moderate length.

As a little test, I tried this on a Pentium D with /Qopenmp and OMP_NUM_THREADS=2 and saw 100 percent CPU usage. Very nice...

The build log window did say OpenMP defined loop was parallelized

double result=0.0;

const double *;

int nEntries=s.rows()*s.cols();

#pragma omp parallel for reduction(+:result)

for (int i=0;i



Leave a Comment

Please sign in to add a comment. Not a member? Join today