Adding vectors with MIC

Adding vectors with MIC

Bild des Benutzers Anwar Ludin


Is the following code sample for adding vectors correct? Can I make it even faster using vectorized operations?

void vectorAdd(float*a, float*b, float* r,int size)

   #pragma offload target(mic) in(a:length(size)) in(b:length(size)) inout(r:length(size)) 
   #pragmaopenmp parallel for shared(a,b,r) private(i)
   for(inti=0; i<size; ++i)

      r[i] = a[i]+b[i];



2 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers Tim Prince

You probably need float * restrict a, float * restrict b, float *restrict r   (or one of the ivdep pragmas) to get auto-vectorization.  Alignment would help if you make all the OpenMP chunks a multiple of 32.

A single offloaded vector operation like this would spend a majority of the time on data transfer.

Melden Sie sich an, um einen Kommentar zu hinterlassen.