Attempting to offload part of one array on the host to another array on the MIC

Attempting to offload part of one array on the host to another array on the MIC

Greeting,

I have been trying to write some code to test out an algorithm that can move data from a host buffer to a different buffer on the MIC. For instance, I have a 20 GB buffer residing in host memory. What I would like to do is have another buffer on the MIC that is, say, 4 GB in size where I can offload 2 GB of the main host buffer at a time. The reason for doing 2 GB at a time is so that I can do calculations on the first 2 GB while the asynchronous transfer of the next 2 GB is going on at the same time.

I did get this working for the case when I have two separate 2 GB arrays on the host and on the MIC (I'd like to avoid using more host memory as an intermediate buffer, though). Then, I transferred data to the intermediate buffer on the host and then scheduled the transfer onto the MIC. But since I had two separate named buffers, I also used if/else logic to control which buffer I was using, which I though ugly.

I've looked at using the 'into' modifier in the 'in' clause since this appears to do what I want it to, but rather than transferring 2GB over to the MIC, I'm seeing 6GB sent over and 8GB returned according to the offload report. Here's how I'm setting the data up:

[fortran]

complex(8), allocatable :: host_slab_buf(:)

!DIR$ ATTRIBUTES OFFLOAD : mic :: mic_slab_buf
complex(8), allocatable :: mic_slab_buf(:)

!Allocate the buffers.
allocate( host_slab_buf( num_elems ) )
allocate( mic_slab_buf( num_slab_elems ) )

! Allocate the memory on the MIC
!DIR$ OFFLOAD_TRANSFER target(mic:0) nocopy( mic_slab_buf: length(num_slab_elems) alloc_if(.true.) free_if(.false.) )

...

!DIR$ OFFLOAD_TRANSFER target(mic:0) in( host_slab_buf(start_host:end_host): into( mic_slab_buf(start_mic:end_mic) ) &

                                                                     alloc_if(.false.) free_if(.false.) ) signal(sigIN1)

!DIR$ OFFLOAD BEGIN target(mic:0) out(mic_slab_buf(start_mic:end_mic) : into(host_slab_buf(start_host:end_host) ) &

                                                             alloc_if(.false.) free_if(.false.) ) wait(sigIN1) signal(sigOUT1)

! do some work

!DIR$ END OFFLOAD

[\fortran]

The start_host/end_host and start_mic/end_mic are initialized to the range of entries I want to copy from the host over to the MIC. I also attached to whole code to look at.

I appreciate any help/insight on my issue.

Thank you.

AnhangGröße
Herunterladen test-offload.f907.99 KB
2 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Hi, 

I am yet to take a closer look at your code. However, I have tried out the following code and it seems transfer the correct about of data over to the coprocessor. 

#include<stdio.h>
#define SIZE (1024*1024*1024)
#define MIC_SIZE (1024*1024*128)
#define ALIGN 4096
int main()
{
 float *arr, *mic_arr;
arr=(float*)_mm_malloc(sizeof(float)*SIZE,ALIGN);
 mic_arr=(float*)_mm_malloc(sizeof(float)*MIC_SIZE, ALIGN);
#pragma offload_transfer target(mic:0) nocopy(mic_arr : length(MIC_SIZE) alloc_if(1) free_if(0))
#pragma offload target(mic:0) in(arr[0:MIC_SIZE] : into(mic_arr[0:MIC_SIZE]) alloc_if(0) free_if(0)) signal(arr)
 {
 }
#pragma offload target(mic:0) out(mic_arr[0:MIC_SIZE] : into(arr[1024*1024*128:MIC_SIZE]) alloc_if(0) free_if(0)) wait(arr)
 {
 }
#pragma offload_transfer target(mic:0) nocopy(mic_arr : length(MIC_SIZE) alloc_if(0) free_if(1))
return 0;
}

My offload report looks like this: 

Zitat:

[Offload] [MIC 0] [File] main.cpp
[Offload] [MIC 0] [Line] 14
[Offload] [MIC 0] [Tag] Tag 0
[Offload] [HOST] [Tag 0] [CPU Time] 4.991898 (seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data] 0 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time] 0.000171 (seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data] 8 (bytes)

[Offload] [MIC 0] [File] main.cpp
[Offload] [MIC 0] [Line] 16
[Offload] [MIC 0] [Tag] Tag 1
[Offload] [MIC 0] [File] main.cpp
[Offload] [MIC 0] [Line] 20
[Offload] [MIC 0] [Tag] Tag 2
[Offload] [HOST] [Tag 1] [CPU Time] 0.393879 (seconds)
[Offload] [MIC 0] [Tag 1] [CPU->MIC Data] 536870920 (bytes)
[Offload] [MIC 0] [Tag 1] [MIC Time] 0.000021 (seconds)
[Offload] [MIC 0] [Tag 1] [MIC->CPU Data] 8 (bytes)

[Offload] [HOST] [Tag 2] [CPU Time] 0.380884 (seconds)
[Offload] [MIC 0] [Tag 2] [CPU->MIC Data] 0 (bytes)
[Offload] [MIC 0] [Tag 2] [MIC Time] 0.000000 (seconds)
[Offload] [MIC 0] [Tag 2] [MIC->CPU Data] 536870912 (bytes)

[Offload] [MIC 0] [File] main.cpp
[Offload] [MIC 0] [Line] 24
[Offload] [MIC 0] [Tag] Tag 3
[Offload] [HOST] [Tag 3] [CPU Time] 0.001629 (seconds)
[Offload] [MIC 0] [Tag 3] [CPU->MIC Data] 16 (bytes)
[Offload] [MIC 0] [Tag 3] [MIC Time] 0.000166 (seconds)
[Offload] [MIC 0] [Tag 3] [MIC->CPU Data] 0 (bytes)

Could please a provide bare-bones reproducer?

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen