Offload when result length is not known beforehand

Offload when result length is not known beforehand

I would like to use offload, but the length of the result is not until the offload has finished. I was wondering about the correct method of doing this. Here is my idea. Will this work? Is there a more elegant way?

Georg

const char *pBufOut=NULL;
std::size_t lOutBufLen;
// compute result on MIC, alloc buffer, but return only length.
#pragma offload target(mic) out(lOutBufLen) nocopy(pBufOut: length(0) alloc_if(0) free_if(0))
{
   // compute length of result
   ...
   pBufOut=malloc(computedSize);
  // copy contents of result to buffer
  ...
  lOutBufLen=computedSize;
}
// create suitable buffer on host side
pBufOut=malloc(lOutBufLen);
// copy result to host side, deallocate memory on MIC, do nothing else...
#pragma offload target(mic) out(pBufOut: length(lOutBufLen) alloc_if(0) free_if(0))
{;}

2 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

Hi Greg,

the solution that you outlined did not work on my system, probably because the host value of pBufOut remains equal to NULL after the offload, and the offload RTL does not know how to look up this array in the offloaded array table. The only way that I could make it work it is by passing a second array and copying the contents into it on the coprocessor:

#include <stdio.h>
#include <stdlib.h>

int main() {

  size_t bufLen;
  char* pBufMIC = NULL;
  char* pBufHost = NULL;

#pragma offload target(mic:0) nocopy(pBufMIC : length(0) alloc_if(0) free_if(0))
  {
    bufLen = 60;
    pBufMIC = (char*) malloc(bufLen);
    for (int i = 0; i < bufLen; i++)
  pBufMIC[i] = (char)(48+i%10);
  }

  printf("On the host, bufLen = %ld\n", bufLen);
  pBufHost = (char*) malloc(bufLen);

#pragma offload target(mic:0) nocopy(pBufMIC : length(0) alloc_if(0) free_if(0)) out(pBufHost : length(bufLen))
  {
    pBufHost[0:bufLen] = pBufMIC[0:bufLen];
  }

  for (int i = 0; i < bufLen; i++) printf("%c", pBufHost[i]);
  printf("\n");
}

$ icpc foo1.cc

$ ./a.out
On the host, bufLen = 60
012345678901234567890123456789012345678901234567890123456789

There is, however, a much more elegant solution in the virtual-shared memory model:

#include <stdlib.h>
#include <iostream>

char* _Cilk_shared pBuf = NULL;
_Cilk_shared size_t bufLen;

_Cilk_shared void MyFunction() {
  bufLen = 60;
  pBuf = (char*) _Offload_shared_malloc(bufLen);
#ifdef __MIC__
  std::cout << "Initialized the array on the coprocessor\n" << std::flush;
#endif
  for (int i = 0; i < bufLen; i++)
    pBuf[i] = (char)(48+i%10);
}

int main() {
  _Cilk_offload MyFunction();
  std::cout << "Back on the host, lOutBufLen = " << bufLen << std::endl;
  for (int i = 0; i < bufLen; i++) std::cout << pBuf[i]; std::cout << std::endl;
}

$ icpc foo2.cc

$ ./a.out

Initialized the array on the coprocessor
Back on the host, lOutBufLen = 60
012345678901234567890123456789012345678901234567890123456789

发表评论

登录添加评论。还不是成员?立即加入