my code takes lot of time to execute and returns incorrect result

my code takes lot of time to execute and returns incorrect result

Hello, 

I am new to programming with MIC cards. I am trying to run a very simple program but it appears that it is taking a long time to offload the data over to the MIC card and also the final output seems to be incorrect, can anyone help me figure out my mistake, please. 

#include <iostream>
#include <memory>
#include "omp.h"
#include <malloc.h>

using namespace std; 

int main()
{
    int xx=100000; 
    int yy=10000;
    
    unsigned long long size = xx*yy; 
    cout << " Simulate Data" << endl; 
    cout << "data size " << size*4 << endl; 
    
    int* aa = (int*) malloc(sizeof(int)*size); 
    for(unsigned long long ii=0; ii < xx*yy; ++ii)
    {
        aa[ii] =1; 
    }
    
    cout << " start offload " << endl; 
    unsigned long long dim = xx*yy; 
    #pragma offload target(mic:0) \
    in(aa:length(dim)) 
    {
        #pragma omp parallel for 
        for (unsigned long long ii; ii < xx*yy; ++ii)
        {
            aa[ii] *= 2; 
        }
    }
    
    cout << " offload end " << endl;
    cout << " Result  " << aa[10] <<"  " << aa[1000] << endl; 
    free(aa);  
    
    return 0;     
}

 

Thank you

Sincerely, 

AM

 

 

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You want to use

#pragma offload target(mic:0) \
    inout(aa:length(dim)) 
   

Note, the first offload has the overhead of transferring the code and initializing the MIC's OpenMP thread pool. Try:

for(int I=0; I<4; ++I) {

cout << " start offload " << endl; 
 double t0 = omp_get_wtime(); 
  unsigned long long dim = xx*yy; 
    #pragma offload target(mic:0) \
    in(aa:length(dim)) 
    {
        #pragma omp parallel for 
        for (unsigned long long ii; ii < xx*yy; ++ii)
        {
            aa[ii] *= 2; 
        }
    }
    double t1 = omp_get_wtime();
    cout << " offload end  " << t1 - t0 << endl;
    cout << " Result  " << aa[10] <<"  " << aa[1000] << endl; 
} // for
 

Jim Dempsey

www.quickthreadprogramming.com

Thank you very much for your prompt reply Jim. I really appreciate all your help. Now, my program is running correctly however the offload is still too slow. 

 

Please note that the code within your offloaded section is trivial.

Read (vector), multiply (vector), write (vector)

That is all it is doing (other than a little loop overhead)

Your offload code should be performing more work to recover the time to pass the data into and out of the MIC.

Choose something like a textbook matrix multiply as a sample code.

Jim Dempsey

www.quickthreadprogramming.com

Leave a Comment

Please sign in to add a comment. Not a member? Join today