I am using a scientific calculation code. And I want to improve it a little bit if possible. I check the code with Amplifier. The most time consuming (heavily used) code is this:
double a = 0.0; for(j = 0; j < n; j++) a += w[j]*fi[((index[j] + i)<<ldf) + k];
To me it is just a dot product between w and fi. I am wondering:
1. Does Intel compiler will do it automaticall? (I mean treated the loop as the dot product of two vecterized array.)
2. Is there a way to improve the code? (I mean maybe define another array a1 the same size of w. Then all multiplied number can be stored in a1 (unrolled loop?). Do summation in the end. )
3. Other suggestions?
I am using parallel composer 2013 with visual studio. Any idea will be appreicated！:)