copying data from host to native MIC array

copying data from host to native MIC array

Hello,

I am trying to write a very simple program, in which I natively allocate some memory on coprocessor and try to copy data from host onto this natively allocated memory but I keep getting errors. Could anyone kindly advise what is going wrong in my code.

_attribute__ ((target(mic)))
unsigned long long numElems;
    
void 
PerformNativeAllocation(short* ptr, short* temp)
{
    cout << " Perform Native allocation " << endl; 
  
        #pragma offload target(mic:0) \
        nocopy(temp)
        {
            temp = (short*) malloc(numElems*sizeof(short)); 
            //free(temp);
        }
        
        #pragma offload target(mic:0)  \
        in(ptr[0:numElems] :into(temp) alloc_if(0) free_if(0))
        {
            for (unsigned long long ii=0; ii < numElems; ++ii)
            {
                temp[ii]*=2;
            }
            
            free(temp);
        }
}

 

 

Thank you

AM

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Development's guidance is: "Memory allocated by the user using malloc or some such API cannot participate in the data transfer pragmas. For the pragmas to be usable, the allocation must be done using the pragmas also."

There is an exception to that and if compelled one can call malloc/memcpy in offloaded code; however, there is inefficiency with the extra allocation for the IN() variable in addition to the user target-side malloc. There is an example demonstrating this under Example of Local Pointer on the Effective Use of the Intel Compiler's Offload Features page. Instead of using INTO, one uses malloc and memcpy in the offloaded code.

The alternative is to use the pragma allocation and INTO as shown below.

void PerformNativeAllocation(short* ptr, short* temp)
{

     cout << " Perform Native allocation " << endl;

        // allocate temp on target only
        #pragma offload_transfer target(mic:0) nocopy(temp : length(numElems) alloc_if(1) free_if(0))

        // transfer ptr values into temp
        #pragma offload target(mic:0)  \
        in(ptr[0:numElems] :into(temp) alloc_if(0) free_if(0))
        {
            for (unsigned long long ii=0; ii < numElems; ++ii)
            {
                temp[ii]*=2;
            }
        }

        // transfer values out and free target memory
        #pragma offload_transfer target(mic:0) out(temp[0:numElems] : into(ptr)  alloc_if(0) free_if(1))
}

 

Hello Kevin, 

Thank you very much for your reply and help. I have been allocating memory and transferring data over to MIC using the same approach as suggested by you; however, I was trying to see if that initial memory allocation time using "nocopy" clause can be reduced and it appears that it cannot. Thank you for the heads up though, this really saves a lot of my time. 

Sincerely, 

AM 

Ok. Maybe you have also already tried "hiding" the initial allocation some by making it asynchronous using the signal() clause and then either a subsequent offload_wait pragma, wait() clause for the INTO transfer, of the _Offload_signaled() API?

Yes, I tried that too in one of my double buffering toy programs but I still see around 15 - 20 sec worth of initial (one time) allocation (offload) delay. Once the memory is allocated, the transfer is pretty fast. Thank you for the heads up though Kevin. I really appreciate your input and help. 

AVM

Yes, the initial allocation slowness is a known matter. It is within the card's OS and hopefully it can continue decreasing over time.

Thank your for your help and reply Kevin. 

Leave a Comment

Please sign in to add a comment. Not a member? Join today