Using MIC offload in OpenMP parallel sections

Using MIC offload in OpenMP parallel sections

Hi!

I want to make parallel summing in MIC and CPU with OpenMP. I split arrays in 2 parts. First part will be offloaded to MIC and summed there. Second part we be summed on CPU. I made 4 tests:

1. Both parts summed on CPU. This test used to be sure the summing code is valid.

2. MIC part offloaded to MIC and summed. CPU part summed on CPU.

3. MIC part offloaded to MIC and summed. CPU part summed on CPU. Both actions works in one section of parallel sections OpenMP.

4. MIC part offloaded to MIC and summed. CPU part summed on CPU. Both actions works in different sections of parallel sections OpenMP.

Here is tests code.

void test1(double* arrays, long arraySize, long arraysNumber, long micArraysNumber,double *sums, int iterations)
{
    printf("Test 1 - plain summingn");
    arraysSum(arrays, arraySize, micArraysNumber, sums, iterations);
    arraysSum(&arrays[micArraysNumber*arraySize], arraySize, arraysNumber - micArraysNumber, &sums[micArraysNumber], iterations);
}
void test2(double* arrays, long arraySize, long arraysNumber, long micArraysNumber,double *sums, int iterations)
{
    printf("Test 2 - offload and find sumsn");
    #pragma offload_transfer target(mic)         
          in(arrays[0:arraySize*micArraysNumber]: ALLOC) 
          in(sums[0:micArraysNumber]: ALLOC)
    
    #pragma offload target(mic) 
            in(arrays: length(0) REUSE) 
              out(sums: length(micArraysNumber) REUSE)
    {
        arraysSum(arrays, arraySize, micArraysNumber, sums, iterations);
    }
    
    #pragma offload_transfer target(mic)         
          in(arrays[0:arraySize*micArraysNumber]: FREE) 
          in(sums[0:micArraysNumber]: FREE)
    arraysSum(&arrays[micArraysNumber*arraySize], arraySize, arraysNumber - micArraysNumber, &sums[micArraysNumber], iterations);
}
void test3(double* arrays, long arraySize, long arraysNumber, long micArraysNumber,double *sums, int iterations)
{
    printf("Test 3 - offload and find sums inside parallel sectionsn");
    #pragma omp parallel sections shared(arrays, sums)
    {
        #pragma omp section
        {
            #pragma offload_transfer target(mic)         
          in(arrays[0:arraySize*micArraysNumber]: ALLOC) 
          in(sums[0:micArraysNumber]: ALLOC)
    
            #pragma offload target(mic) 
            in(arrays: length(0) REUSE) 
              out(sums: length(micArraysNumber) REUSE)
            {
                arraysSum(arrays, arraySize, micArraysNumber, sums, iterations);
            }
            #pragma offload_transfer target(mic)         
          in(arrays[0:arraySize*micArraysNumber]: FREE) 
          in(sums[0:micArraysNumber]: FREE)
            arraysSum(&arrays[micArraysNumber*arraySize], arraySize, arraysNumber - micArraysNumber, &sums[micArraysNumber], iterations);
        }
    }
}
void test4(double* arrays, long arraySize, long arraysNumber, long micArraysNumber,double *sums, int iterations)
{
    printf("Test 4 - offload data outside parallel sections and find sums inside parallel sectionsn");
    #pragma offload_transfer target(mic)         
          in(arrays[0:arraySize*micArraysNumber]: ALLOC) 
          in(sums[0:micArraysNumber]: ALLOC)
    #pragma omp parallel sections shared(arrays, sums)
    {
        #pragma omp section
        {
            #pragma offload target(mic) 
            in(arrays: length(0) REUSE) 
              out(sums: length(micArraysNumber) REUSE)
            {
                arraysSum(arrays, arraySize, micArraysNumber, sums, iterations);
            }
        }
        #pragma omp section
        {
            arraysSum(&arrays[micArraysNumber*arraySize], arraySize, arraysNumber - micArraysNumber, &sums[micArraysNumber], iterations);
        }
    }
    #pragma offload_transfer target(mic)         
          in(arrays[0:arraySize*micArraysNumber]: FREE) 
          in(sums[0:micArraysNumber]: FREE)
}

Tests 1,2,3 works good. Test 4 returns "offload error: process on the device 0 was terminated by signal 11". Sometimes test4 works after launching test3.

Compile options: icc  -Wall -openmp -fPIC

See main and utility functions in attached file.

Does anybody have idea why there is offload error?

Timofey

12 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Timofey,

I don't see your main program, can you post it?

Regards.

Oops, forget to click Start Upload.

Attachments: 

AttachmentSize
Downloadtext/x-c++src parallelsum.cpp5.31 KB

Your include file is still missing.

I didn't use include file for this program.

Hi Timofey,

As Tim said, the file "pgdm-be-Phi.h" is missing. But if I just comment out the line to ignore the missing file, I still can compile and run your program.

I run and re-run it many times, including resetting the coprocessor, and the program always returns the right result without any error. I don't see the error your reported at all. What MPSS version and Intel compiler version are you using? 

You are right - I forgot to remove pgdm-be-Phi.h from sources. It doesn't matter for this program so it can be deleted.

icc version information:

Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.0.1.117 Build 20121010

I searched mpss commands here http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-deve.... micinfo and mpssinfo commands are absent. The only command which is available is micctrl but I didn't find any prefix which returns mpss version.

Can you give me advice how to get mpss version please?

Timofey,

As an expriment. Replace your parallel sections with:

#pragma omp parallel shared(arrays, sums)  
{
  if(omp_get_thread_num() == 0)
  {  
    #pragma offload target(mic)   
    in(arrays: length(0) REUSE)   
    out(sums: length(micArraysNumber) REUSE)  
    {  
      arraysSum(arrays, arraySize, micArraysNumber, sums, iterations);  
    }  
  }  
  if(omp_get_thread_num() == 1) // assure only team member 1 executes this
  {  
    arraysSum(&arrays[micArraysNumber*arraySize], arraySize, arraysNumber - micArraysNumber, &sums[micArraysNumber], iterations);  
  }  
}  

The intention of the above test is to assure the master thread of the thread team of the parallel region, which is also the same thread that ran the code prior to the parallel region, is the thread executing the call to the MIC.

 

Jim Dempsey

Timofey, are you sure you're looking in the correct place?  micctrl by default is installed in /usr/sbin/ by the MPSS 2 installer, which conceivably you might have in your PATH.  micinfo and mpssinfo can be found in /opt/intel/mic/bin, which might not be in your PATH..  If you don't find them there, then there may be a problem with your MPSS installation.

To: jimdempseyatthecove

I execute your experiment. I got the same offload error. I tried to switch thread numbers in conditions and got the same error. I disable CPU code and got the same error. If I launch CPU code only then it works.

To: robert-reed (Intel)

I asked our administrators about MPSS. They said MPSS is disabled on our cluster by default. Then we tried to launch my program with MPSS and got the same error as earlier.

MPSS version is 2.1.5889-14.

Timofey,

I don't have my Xeon Phi yet so I cannot test the advice. While at Intel's Developer's Forum last week someone had a similar issue. What was suggested to diagnose the problem is (which I cannot tell you what, but only what to look for), is there is there is an offload option to display a verbose message as to what is going on (status, environment variables, errors).

What your problem sounds like is twofold: a) wrong environment variable settings in the Xeon Phi, and b) bad programming that exposes itself during offload. As for a) the enviornment variables in the Xeon Phi, you typically set via setting host variable MIC_... where ... is the environment variable to set in the Xeon Phi (stripping off the "MIC_").

Jim Dempsey

Some diagnostics can be reported by the OFFLOAD_REPORT environment variable with values 1 to 3 - check the documentation for details.   This should tell you if your code is running at all, although signal 11 sounds like it is, and then is running into a fatal exception somewhere.  We shouldn't get an exception if you try to run offload code when the card is not enabled, unless you force it to always run on a given coprocessor with #pragma offload target(mic:X).

Leave a Comment

Please sign in to add a comment. Not a member? Join today