Runtime error with Offload

Runtime error with Offload

HOST--ERROR:myoiOSSetPageAccess: mprotect failed!

 Please increase the maximum of memory map areas
        i.e. echo 256000 > /proc/sys/vm/max_map_count
offload error: process on the device 0 unexpectedly exited with code 1
HOST--ERROR:myoiThreadMutexDestroy1: Fail to destroy a mutex (0x18762b8)! error: 16
HOST--ERROR:myoiOSDetachSharedMemory: shmdt failed: Invalid argument
HOST--ERROR:myoiOSDetachSharedMemory: shmdt failed: Invalid argument
HOST--ERROR:myoiOSDestroySharedMemory: shmctl failed: Invalid argument
HOST--ERROR:myoiOSDetachSharedMemory: shmdt failed: Invalid argument
HOST--ERROR:myoiOSDetachSharedMemory: shmdt failed: Invalid argument
HOST--ERROR:myoiOSDestroySharedMemory: shmctl failed: Invalid argument
HOST--ERROR:myoiOSDetachSharedMemory: shmdt failed: Invalid argument
HOST--ERROR:myoiOSDetachSharedMemory: shmdt failed: Invalid argument
HOST--ERROR:myoiOSDestroySharedMemory: shmctl failed: Invalid argument
HOST--ERROR:myoiOSDetachSharedMemory: shmdt failed: Invalid argument
HOST--ERROR:myoiOSDetachSharedMemory: shmdt failed: Invalid argument

Receiving this message when I try to offload the code using _Cilk_shared .

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hitesh,

Is this the same code you were having trouble with in https://software.intel.com/en-us/forums/topic/507144?

 

I believe it is Frances.
Hitesh - Have you tried increasing the max_map_count as suggested in the diagnostics you received and rerunning your app?
This needs to be done on the coprocessor. Perhaps under your configuration you can easily ssh as root to the card and do the following:
[root@-mic0 ~]# cat /proc/sys/vm/max_map_count
65530
[root@-mic0 ~]# echo 256000 > /proc/sys/vm/max_map_count
[root@-mic0 ~]# cat /proc/sys/vm/max_map_count
256000

 

 

Based on advice from others, you must increase the max_map_count on both the host and coprocessor. Changing it only on the coprocessor will not help.
Also, 256000 may be a good start; however, may not be large enough if you have GBs of data.

 

thanks for the update.

I don't have root privileges on the system . But I would ask system Admin to look at it.

I want to know why I am receiving the error for such a small code snippet ,inspite of having large memory of Xeon and Phi.

My apologies. I had not realized the thread Frances cited contained your code. I had not viewed it and mistook it for your more recent thread (https://software.intel.com/en-us/forums/topic/509714) that I helped with. Let me ask about your smaller reproducer.

I was able to run your program following the advice/help received from others. The details are below.
First, regarding the original failure, I was advised that the portion of the error shown below represents the typical error signature when an app exhausts the virtual shared memory space:

HOST--ERROR:myoiOSSetPageAccess: mprotect failed!
Please increase the maximum of memory map areas
        i.e. echo 256000 > /proc/sys/vm/max_map_count

Next, the default max_map_count is (currently) low on the coprocessor and also it changes between versions of RHEL on the host. The default setting on my RHEL6.2 system was 65530. The MPSS Development team recommends setting this as high as 10000000.
The source code in your first post that Frances cited produces multiple compile-time warnings (an example is shown below) about a1, b1, and c1.

$ icpc -openmp test.cpp
test.cpp(64): warning #2707: pointer argument in _Cilk_offload function call is not pointer-to-shared
      _Cilk_offload_to(1) foo(thrd, nx, nz, bx, bz, halfLength,  a1,  b1,  c1);
                                                                 ^

The following code contains corrections from Development to fix those warnings.

#include <stdio.h>
#include "omp.h"
#include <cilk/cilk.h>
#include <cilk/cilk_api.h>
#include "offload.h"

#define SQUARE(i) i*i
#define mina(a,b) (((a)<(b)) ? (a) : (b))

//_Cilk_shared int  foo(int, int, int, int, int, int, float _Cilk_shared  ** a , float _Cilk_shared ** b, float _Cilk_shared ** c );
 _Cilk_shared int  foo(int, int, int, int, int, int, float *_Cilk_shared  * a , float *_Cilk_shared * b, float *_Cilk_shared * c );


float *_Cilk_shared * a1;
float *_Cilk_shared * b1;
float *_Cilk_shared * c1;

int main(int argc, char* argv[])
{
    /*printf("before the code "); */

    int i,j,bx,bz,halfLength;
     int nx,nz,ITER;
     int thrd;

    halfLength=2;
    nx=14000+halfLength*2;
    nz=3600+halfLength*2;

    if(argc < 2)
    {
        printf("Error !!! Give number of threads to be spawned . \n");
        exit(-1);
    }

    sscanf(argv[1],"%d",&thrd);
    sscanf(argv[2],"%d",&bx);
    sscanf(argv[3],"%d",&bz);
    /*printf("%s %d \n",__FILE__,__LINE__);*/


    /*-----------------------------Aligned memory allocation-------------------------------------------------------*/
     a1=(float  *_Cilk_shared *)_Offload_shared_aligned_malloc(sizeof(float *)*nx,64);
     b1=(float  *_Cilk_shared *)_Offload_shared_aligned_malloc(sizeof(float*)*nx,64);
     c1=(float  *_Cilk_shared *)_Offload_shared_aligned_malloc(sizeof(float*)*nx,64);

    for( i=0;i<nx;i++)
    {
        a1[i]=(float  _Cilk_shared *)_Offload_shared_aligned_malloc(sizeof (float)*nz,64);
        b1[i]=(float  _Cilk_shared *)_Offload_shared_aligned_malloc(sizeof (float)*nz,64);
        c1[i]=(float  _Cilk_shared *)_Offload_shared_aligned_malloc(sizeof (float)*nz,64);
    }

    #pragma omp parallel for private(i,j)
    for(i=0;i<nx;i++)
    {
        for(j=0;j<nz;j++)
        {
            a1[i][j]=2.0f;
            b1[i][j]=2.0f;
            c1[i][j]=2.0f;
        }
    }

    _Cilk_offload_to(1) foo(thrd, nx, nz, bx, bz, halfLength,  a1,  b1,  c1);

    for(i=0;i<nx;i++){
    _Offload_shared_aligned_free(a1[i]);
    _Offload_shared_aligned_free(b1[i]);
    _Offload_shared_aligned_free(c1[i]);
        }

_Offload_shared_aligned_free(a1);
_Offload_shared_aligned_free(b1);
_Offload_shared_aligned_free(c1);

return 0;
}

/*-----------------------------------------------------------------------------------------------------------------------*/
_Cilk_shared  int foo(int thrd, int nx, int nz, int bx, int bz, int halfLength, float *_Cilk_shared *  a, float * _Cilk_shared  *  b, float * _Cilk_shared * c)
{
 int ii,jj,i,j;

   printf("total thread=%d",thrd);

    __assume_aligned(a,64);
    __assume_aligned(b,64);
    __assume_aligned(c,64);

    omp_set_num_threads(thrd);

    #pragma omp parallel for  private (ii,jj,i,j) schedule(dynamic,1)
    for(ii=2;ii<nx-halfLength*2;ii+=bx)
    {
        for(jj=2;jj<nz-halfLength*2;jj+=bz)
        {
            for(i=ii;i<mina(bx+ii,nx-halfLength);++i)
            {
                #pragma simd
                for(j=jj;j<mina(bz+jj,nz-halfLength);++j)
                {
                    a[i][j] =b[i][j]+c[i][j];
                }
            }
        }
    }

return 0;
}

For your case with dimensions 14000 and 3600, it was necessary to increase the available virtual shared memory by increasing the max_map_count on both the host and coprocessor.
The source code above runs successfully when performing the actions listed below (NOTE: I guessed at the input parameters shown and found other values seemed to work also). Steps 1 and 2 are one-time only changes following any reboot.
1. On the coprocessor (as root), increase max_map_count
echo 10000000 > /proc/sys/vm/max_map_count
2. On the host (as root), check the current value and increase max_map_count accordingly
cat /proc/sys/vm/max_map_count
echo 10000000 > /proc/sys/vm/max_map_count

3. Compile and run
icpc -openmp test.cpp
./a.out 240 80 80

Without the settings in steps 1 and 2 above, the code suffers the run-time error you noted:

./a.out 240 80 80
HOST--ERROR:myoiOSSetPageAccess: mprotect failed!
Please increase the maximum of memory map areas
        i.e. echo 256000 > /proc/sys/vm/max_map_count
offload error: process on the device 0 unexpectedly exited with code 1
HOST--ERROR:myoiThreadMutexDestroy1: Fail to destroy a mutex (0x10cbb08)! error: 16
HOST--ERROR:myoiOSDetachSharedMemory: shmdt failed: Invalid argument
<…many more removed….>

Finally, the current default value of max_map_count may be increased in a future MPSS release.

 

Thanks Kevin . 

I was looking for the same explanation .
What I understood is the increasing the defined limit would help me to run the code correct.

 

You're welcome.

Leave a Comment

Please sign in to add a comment. Not a member? Join today