Free memory after offload region - memleak?


I am trying to offload some of my code, which works without any errors. The problem I'm facing is, that while the code below runs without any errors, the memory usage of the program increases over time (when NUM2 is big enough it rises up to 32GB+). Since I think I am freeing it properly (runs fine without increased memory usage when I run it on the host without offloading), I can not further explain what is causing this memory leak.


How to compile/install/use third party library


I am using compiler assisted offload in my application, explicitly specifying which computations should be transferred to the Xeon Phi. Things are working correctly, but now I got to the point where I need to use a third party library in the part of the application that is offloaded to the Phi. If it is of any use, the library under consideration is PRIMME (http://www.cs.wm.edu/~andreas/software, code and documentation found at https://github.com/primme/primme).

Best Practical practice for Server/Dev Environments

At long last I am making my CentOS boot stick and copying BIOS versions to another stick.   Hopefully the magic smoke will not escape...

1 Primary Physical Box -3-4 mics -  2-3 small Projects, 4-6 remote users not terribly concurrent.

Some R, some Pynum, some C++ a bit of this a dash of that...

XEN -> and then carve up a stack of administrative servers and development vms?

Centos -> VMWare>

Something else...

Unable to compile the code for Xeonphi

I wrote a MPI Program for running on Xeon phi in native mode.
When I tried to compile the code, the following errors are coming.

$mpicc test.c -o test -mmic

/usr/bin/ld: skipping incompatible /opt/intel//impi/ when searching for -lmpifort
/usr/bin/ld: skipping incompatible /opt/intel//impi/ when searching for -lmpifort
/usr/bin/ld: cannot find -lmpifort
collect2: ld returned 1 exit status

Please help me out.

OFFLOAD_REPORT: Why is CPU Time smaller than MIC Time?


from compiler-assisted offload code, I get the following offload report:

[Offload] [HOST]  [Tag 2] [CPU Time]        5.053540(seconds)
[Offload] [MIC 0] [Tag 2] [CPU->MIC Data]   1080 (bytes)
[Offload] [MIC 0] [Tag 2] [MIC Time]        6.122002(seconds)
[Offload] [MIC 0] [Tag 2] [MIC->CPU Data]   1032 (bytes)

However, I expected that the total CPU time on the host is always greater than the MIC time on the mic, since it includes the execution time on the mic according to:

Aligned Allocators with C++11 on the MIC?

Short version: is this possible?

Long version: I have tried to use aligned allocators either directly or indirectly for a couple of different projects using the Xeon Phi, and have yet to be successful in compiling it x0 My understanding is that the allocator rules / parameters / syntax changed (again) similar to this issue:


Subscribe to Unix*