Memory allocated on MIC creates a copy on main memory?

Memory allocated on MIC creates a copy on main memory?

Hi,

I find out that main memory also stores a copy of data allocated on MIC Card when handling big data on MIC Card. 

For example, given below codes, the program uses 1G RAM when pausing at "Press key to continue 1".
However after loading the 1G emtpy data to MIC Card, it uses 2G RAM at "Press key to continue 2". Meanwhile 1G of MIC Card's memory is also occupied.
At "Press key to continue 3", when the memory on MIC is free, the program occupies 1G RAM again.

Is it the expected behavior of MIC Card? Is there any alternative to avoid this double memory usage?

Thanks,

Hunter

#define SIZE 1000000000
int main(){

    char* a = (char*) malloc(SIZE);
    memset(a, 0, SIZE);

    printf("Press key to continue 1\n");getchar();
    #pragma offload_transfer target(mic:0) \
        in(a:length(SIZE)  alloc_if(1) free_if(0))

    printf("Press key to continue 2\n");getchar();

    #pragma offload_transfer target(mic:0) \
        nocopy(a:length(SIZE)  alloc_if(0) free_if(1))

    printf("Press key to continue 3\n");getchar();

}

12 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de jimdempseyatthecove

If you are not intending to pass data into the MIC then do not allocate it on the Host. In your example you allocate on Host and wipe a memory buffer. This buffer need not reside on Host at all. Simply allocate and free in the scope of the Offload. If you need to remember on the Host, the pointer to the buffer allocated on the MIC, then simply pass the pointer out of the allocating offload and later pass the pointer back into the next offload.

BTW, the free_if(1) frees on MIC not on Host.

www.quickthreadprogramming.com

Thanks, Jim. The idea to allocate temp buffer on MIC and pass pointer into next offload is a good solution.
I need to transfer gigabytes' data both to and from MIC Card.
I wonder whether the observed phenomenon is normal or not that during offload-transferring only 1G data from host to MIC (or from MIC to host), host uses 2G memory and MIC uses 1G memory.

Hunter

Portrait de jimdempseyatthecove

>>I need to transfer gigabytes' data both to and from MIC Card.

Then you will need buffers on both .OR. you can investigate the MYO, which I haven't used, but appears to consume buffer space on both.

Your real question should be focused on the "need to transfer" and as yourself "Do I really need to transfer all the data all the time?".

Take a modeling program as and example. Do you really need the host to synchronize with the MIC at every integration time interval? Or do you only require the synchronization every Nth when you "need" to snapshot the state?

>> host uses 2G memory and MIC uses 1G memory.

(In my first read of #1 I messed the extra 1GB bump)

What you don't know is if the extra 1GB of memory is if it is virtual memory mapped to the MIC or if it is virtual memory mapped to host RAM & page file. (can you find a tool that will tell you where your virtual memory maps?).

Jim Dempsey

www.quickthreadprogramming.com

Thanks. Surely I understand that optimization on what and when to transfer is essential.

To investigate where the virtual memory maps to, I tried "pmap" command in linux. It seems to point to RAM.
Meanwhile the free memory on host is 2G less.

It also seems that during running the above test program, the "micuser" on MIC Card still uses 1G MIC Card memory even if after the 1G array "a" is free on MIC, i.e. after offload_transfer statment to free a on MIC:
#pragma offload_transfer target(mic:0) \
        nocopy(a:length(SIZE)  alloc_if(0) free_if(1))

I think when transferring array between host and MIC, there are equal size buffer on both, which are not handled properly by the compiler.

I will escalate this to the compiler Developers to assist with a closure inspection of the memory usage and let you know what is found.

(Internal tracking id: DPD200243898)

Portrait de jimdempseyatthecove

This is getting into where the COI support persons could reply.

Jim Dempsey

www.quickthreadprogramming.com

Thank you all and I am looking forward to any update to my problem.

I confirmed w/Development the memory usage observed is how the offload works at present. The offload run-time creates a buffer for “a” on the coprocessor and a corresponding buffer on the CPU instead of using the existing memory location of “a” on the CPU.  Then, there is a buffer copy from one buffer to the other. Developers are still investigating methods for avoiding the duplication on the CPU in a future release of MPSS and the compiler.

I updated the internal tracking id in my earlier reply and will reply to this thread again regarding any future changes.

Thanks Kevin. Besides the issue of additional buffer created on CPU, I would appreciate that you also take a look at the MIC memory usage unchanged after free_if(1) as metioned before:

It also seems that during running the above test program, the "micuser" on MIC Card still uses 1G MIC Card memory even if after the 1G array "a" is free on MIC after below statement, i.e. after offload_transfer statement to free "a" on MIC, micuser on MIC still uses 1G physical memory. Then there will be less memory for creating local memory on MIC using malloc in pragma offload session.

#pragma offload_transfer target(mic:0) \
        nocopy(a:length(SIZE)  alloc_if(0) free_if(1))

 

My apologies Hunter. I overlooked that aspect. I will investigate and post a reply shortly.

Is there any news? I have the same problem with mkl_mic_free_memory.

Connectez-vous pour laisser un commentaire.