offload transfer (partial)

offload transfer (partial)

 Hi 

 Is it possible to offload transfer part of an array with the whole array preallocated on the host and the phi.

 For example

  void *p_host = _mm_malloc(size,alignment);

  void *p_phi=0;

#pragma offload target(mic) in(size) in(alignment) out(p_phi){

          p_phi = _mm_malloc(size,alignment)); 

}

 I would like to now transfer part of p_host to p_phi

Is this possible?

Thanks

Jamil

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Have a look at sampleC14.c (under /opt/intel/composer_xe_2013/Samples/en_US/C++/mic_samples/intro_sampleC) for an example of using the alloc and into specifiers. I believe those (at least into) provide the functionality and convenience you are interested in.

   Hi

    I may have to drop back to using scif as it does not appear that I can get the fine grained functionality that I need using the pragma approach.

   Are there any issues mixing pragmas and scif?

 Jamil

  

Hi Jamil,

I had used partial data transfers in one of the codes. Here is a sample code to demonstrate what I did.

#include<stdio.h>
#include<stdlib.h>
__attribute__((target(mic))) int *my_array;
int main()
{
//Allocate Host array
my_array=(int*)_mm_malloc(sizeof(int)*20,4096);
//Initialize Host array
for(int i=0;i<20;i++)
 my_array[i]=0;
//Allocate on coprocessor and transfer the entire array
//to the coprocessor
#pragma offload target(mic:0)
 in(my_array:length(20) alloc_if(1) free_if(0))
{
 for(int i=0;i<20;i++)
 {
 printf("%d",my_array[i]);fflush(0);
 }
 printf("n");fflush(0);
}
//Changed something on the host
for(int i=0;i<5;i++)
{
 my_array[i+5]=5;
}
//Transferred only the required bit to the card
#pragma offload target(mic:0)
 in(my_array[5:5] : into(my_array[5:5]) alloc_if(0) free_if(0))
{
 //printf("nPARTIAL TRANSFERn");fflush(0);
 for(int i=0;i<20;i++)
 {
 printf("%d",my_array[i]);fflush(0);
 }
}
//Free memory on the coprocessor
#pragma offload target(mic:0)
 nocopy(my_array:length(20) alloc_if(0) free_if(1))
{
}
_mm_free(my_array);
return 0;
}

I am not sure what exactly you are trying to do but I hope this helps. 

-Sumedh

Quote:

Jamil A. wrote:

I may have to drop back to using scif as it does not appear that I can get the fine grained functionality that I need using the pragma aproach.

Are there any issues mixing pragmas and scif?

 

I do not know; there could be and I do not believe mixing will be supported. I'll have our Developers weigh in. Can you offer more details about what control is lacking w/offload for your case?

 

  HI Sumedh

     Thanks for the example. I will be getting access to a phi in the next couple of days, so I will post a message with a more detailed example expanding on your code.

 Thanks

 Jamil

 

Regarding mixing pragma/scif, our developer replied:

"Memory allocation/deallocation must be done either using malloc/free, or using the pragmas. It cannot be a mixture of the two.

When memory is allocated on MIC using the pragmas, the alignment on MIC will equal that of the CPU, as long as the CPU alignment does not exceed 64 bytes. CPU data aligned higher than 64-bytes can be matched on MIC with an align modifier.

We have not tested using offload and additional SCIF connections in the same program."

 

Quote:

Kevin Davis (Intel) wrote:

Regarding mixing pragma/scif, our developer replied:

"Memory allocation/deallocation must be done either using malloc/free, or using the pragmas. It cannot be a mixture of the two.

When memory is allocated on MIC using the pragmas, the alignment on MIC will equal that of the CPU, as long as the CPU alignment does not exceed 64 bytes. CPU data aligned higher than 64-bytes can be matched on MIC with an align modifier.

We have not tested using offload and additional SCIF connections in the same program."

Hi Kevin, 

I wanted to know if the flags affecting the offloads will still work when you mix offload with SCIF? 

-Sumedh

 

Hi Sumedh/Kevin

I have got a case to work using the pragma approach.

As the pragma approach does not allow you to pass a pointer by value (i.e. in my case I have a pointer to device memory stored on the host), I am casting my pointers to size_t before passing them as an argument to the pragma call to get around this issue.

For cases like this (i.e I want to store the device pointer in a host structure) it would be very useful for the pragma approach to allow a passing pointers by value (both in and out) without casting.

It would also be useful to be able to override the bitwise copyable check for structures. I am currently having to memcpy the structure into an unsigned char array which I pass as an argument to the offload call and reconstruct my structure on the device side.

Thanks for your help

Jamil

Leave a Comment

Please sign in to add a comment. Not a member? Join today