| Thread Tools | Search this thread |
|---|
| gkli | October 17, 2008 10:56 AM PDT Stream benchmark performance | ||||
I ran McCalpin's stream benchmark on the 3.4 GHZ Xeon and got 4759 MB/s for Triad. The original code was inlined and the operations were directly on the global arrays. I added a similar function but instead of directly using the global array, I passed in the arrays. The bandwidth for the new Triad function was only 3400 MB/s. Why did I lose so much bandwidth? The opt-report indicated both functions were inlined. #define N 2000000 static double * a ; static double * b ; static double * c ;
int main() { ...
times[3][k] = mysecond(); tuned_STREAM_Triad(a,b,c,scalar); // original function, inlined 4759 MB/s
times[3][k] = mysecond() - times[3][k];
times[4][k] = mysecond(); tuned_STREAM_Triad_Arg(a,b,c,scalar); // new function, inlined 3400 MB/s times[4][k] = mysecond() - times[4][k];
return 0; }
void tuned_STREAM_Triad(double* aa,double* bb,double* cc,double scalar) { int j; #pragma omp parallel for for (j=0; j<N; j++) a[j] = b[j]+scalar*c[j]; }
void tuned_STREAM_Triad_Arg(double* restrict aa,double* bb,double* cc,double scalar) { int j; #pragma omp parallel for #pragma ivdep for (j=0; j<N; j++) aa[j] = bb[j]+scalar*cc[j]; }
Compiled with icc -openmp -restrict
| |||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
| 8289 users have contributed to 31235 threads and 99109 posts to date. |
|---|
| In the past 24 hours, we have 7 new thread(s) 24 new posts(s), and 30 new user(s). In the past 3 days, the most popular thread for everyone has been comparison cilk++, openmp, pthreads first results The most posts were made to comparison cilk++, openmp, pthreads first results The post with the most views is Very amusing... Escalated as Please welcome our newest member Michael Johanson |