OpenMP 4.0 target offload Report

OpenMP 4.0 target offload Report

Hi ..

I am trying to make a comparison statistics of offload using,

1). Intel compiler assisted offload VS. 2). OPENMP 4.0 target construct 

My QUESTION: HOW I CAN GET OPENMP 4.0 OFFLOAD REPORT(which environment variable I need to set..?), I used OFFLOAD Report=2; intel compiler directive offload it worked fine, BUT I AM GETTING VERY STRANGE STATISTICS WITH OPENMP 4.0 OFFLOAD (I am using Intel Xeon Phi as execution platform)

Here is the code

COMPILER DIRECTIVE OFFLOAD:

// Start time
        gettimeofday(&start, NULL);

        // Run SAXPY 
        #pragma offload target(mic:0) inout(x) out(y)
        {
                        #pragma omp parallel for default (none) shared(a,x,y)
                        for (i = 0; i < n; ++i){
                                y[i] = a*x[i] + y[i];
                        }                                                        
        } // end of target data

        // end time 
        gettimeofday(&end, NULL);

OPENMP 4.0 TARGET OFFLOAD:

// Start time
        gettimeofday(&start, NULL);

        // Run SAXPY
        #pragma omp target data map(to:x)
        {
                #pragma omp target map(tofrom:y)
                {
                        #pragma omp parallel for
                        for (i = 0; i < n; ++i){
                                y[i] = a*x[i] + y[i];
                        }
                }
        } // end of target data

 

------

Thanks in advace. (Raju)

 

12 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Change your OpenMP 4.0 code to

#pragma omp target map(to:x) map(tofrom:y)
        {
                        #pragma omp parallel for
                        for (i = 0; i < n; ++i){
                                y[i] = a*x[i] + y[i];
                        }
        }

Hi Ravi,

Thanks for your reply. I executed this updated code, my finding is as below:

1. compare to COMPILER(Intel) DIRECTIVE OFFLOAD, openMP target offload perform much slower.

2. I used OFFLOAD_REPORT=2 to see the report and it shows:

[Offload] [MIC 0] [File]                    omp_target_SAXPY_only.c
[Offload] [MIC 0] [Line]                    38
[Offload] [MIC 0] [Tag]                     Tag 0
[Offload] [HOST]  [Tag 0] [CPU Time]        85.259595(seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   8 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time]        84.532258(seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   8 (bytes) ; [where as Total Number of Bytes Offloaded (cpu->MIC) with compiler offload is= 2000000008 Bytes 

My input array, const int n = 500000000; float x[500000000]; float y[500000000];

 

I am wondering with OFFLOAD_REPORT=2, I am getting right underline happening in openMP 4.0..? or what else I can use for this .?

Thanks in advance again..!!

Can you show how x and y are declared in the code for both DIRECTIVE OFFLOAD and OpenMP offload.

Yes.

Both of the cases I have declared globally, as below:

const int n = 500000000;
float x[500000000];
float y[500000000];

Thanking you,

-Raju

The interpretation in the compiler of global variables which were not marked with #pragma omp declare target was wrong and this is fixed in the newer compiler.  

Try the following test 

#ifdef OPENMP
#pragma omp declare target
#else
#pragma offload_attribute(push, target(mic))
#endif
const int n = 5000;
float x[5000];
float y[5000];
int a = 10;
#ifdef OPENMP
#pragma omp end declare target
#else
#pragma offload_attribute(pop)
#endif

main()
{
   int i;
#ifndef OPENMP
#pragma offload target(mic:0) in(x: alloc_if(0) free_if(0)) inout(y: alloc_if(0) free_if(0))
        {
                        #pragma omp parallel for default (none) shared(a,x,y)
                        for (i = 0; i < n; ++i){
                                y[i] = a*x[i] + y[i];
                        }
        } // end of target data
        // end time
#else
        #pragma omp target map(always, to:x) map(always, tofrom:y)
                {
                        #pragma omp parallel for
                        for (i = 0; i < n; ++i){
                                y[i] = a*x[i] + y[i];
                        }
                }
#endif
}

OUTPUT I GET

icc -openmp raju.c
bash-4.2$ ./a.out
[Offload] [MIC 0] [File]                    raju.c
[Offload] [MIC 0] [Line]                    20
[Offload] [MIC 0] [Tag]                     Tag 0
[Offload] [HOST]  [Tag 0] [CPU Time]        0.430813(seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   40008 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time]        0.243459(seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   20008 (bytes)

bash-4.2$ icc -openmp raju.c -DOPENMP
bash-4.2$ ./a.out
[Offload] [MIC 0] [File]                    raju.c
[Offload] [MIC 0] [Line]                    29
[Offload] [MIC 0] [Tag]                     Tag 0
[Offload] [HOST]  [Tag 0] [CPU Time]        0.412741(seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   40004 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time]        0.234801(seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   20004 (bytes)

 

Thank you very much, Ravi. It worked fine. 

Best Regards// 

Raju

Hi Ravi,

I have few confusion, 

1. I am using icc (ICC) 15.0.2, and with this version I can not compile 

        #pragma omp target map(always, to:x) map(always, tofrom:y)-- it shows 

  error: identifier "to" is undefined
  #pragma omp target map(always, to:x) map(always, tofrom:y)

2. which compiler version you are using ..?

Thanks,

- Raju 

 Thanks in advance.  And one more findings: 

1. I am trying to execute   #pragma omp target offload with array size like >[50000/ Million] [using  #pragma omp declare target ]; in this case it gives me offload error as:

    offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)

I looked into this problem and would you please comment, is it memory alignment issue ..? or Can I do increase the phi_omp_stack ..?

Appreciated your comment and help.

// Raju 

I am using the 16.0 compiler.  The support for "always" does not exist in 15.0

You would need to split the pragma in to 3 pragmas

#pragma omp target update to(x,y) 
#pragma omp target 
{
}

#pragma omp target update from(y)

Regarding  signal 11,  if you send me the test case I can investigate the cause of the problem.

Hi Ravi,

With #pragma omp target update, I can run omp target offload without any issue. It doesn't create signal 11. 

Thank you so much for your help.

- Raju

The compiler defect that Ravi described in post #6 has been recorded in our internal tracking system (see id noted below). We will update this thread when the fix for this issue is available in an external release.

(Internal tracking id: DPD200376293)

Leave a Comment

Please sign in to add a comment. Not a member? Join today