I am using a simple Merge Sort benchmark on the Xeon Phi. 78% of the total CPU time is consumed by "libiomp5.so"
I tried to reduce the watsed time by the OpenMP runtime library by setting the "export KMP_BLOCKTIME=0". Please note that the application is running natively on the MIC. I have also tried "export OMP_WAIT_POLICY=passive". No effect!
Why this does not have any effect on the execution time or the wasted CPU time?
I am implementing openMP offload program currently running on single Phi device.
My computing node has 2 Phi devices named mic0 and mic1.
Could you please let me know how I can extend the following code section to run on both devices simultaneously.
Assume the code sections can be run independently to each other where all data (input and output) are independent.
I need to know the structure of code and the directives to do it properly in openMP. Further, compilation method also required.
In certain cases clEnqueueReadBuffer doesn't transfer all the required data when executed on HD4600. System: Win7 x64, driver version 126.96.36.199.4170, 32-bit application.
It seems that in case of page-aligned destination buffer and transfer length that is not multiple of 4KB only multiple of 4KB is transfered. Sample code:
Hello, I'm running mp_linpack in two nodes(I have grasped how to run in one node :) ), I changed the path to mp_linpack/bin_intel/intel64, and modify the HPL.dat to set P=1, Q=2. Then I created hosts file, whose content is:
mic01 and mic03 is the two nodes where I want to run linpack. Then I use the command in MKL_userguide.pdf:
mpirun --prefix 1 -n 2 -hosts Node1,Node2 \
-genv MIC_LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \
-genv LD_LIBRARY_PATH $LD_LIBRARY_PATH ./xhpl_offload_intel64
I'm using mpss_gold_update_3-2.1.6720, with a uos version of 188.8.131.52-g5f2543d.
I'm now trying to install the vtune amplifier 2015 on this mpss, so I need sep3_15-k1om-184.108.40.206-g5f2543dsmp.ko and vtsspp-k1om-220.127.116.11-g5f2543dsmp.ko. But I can find only sep3_15-k1om-18.104.22.168-g5f2543dsmp.ko and vtsspp-k1om-22.214.171.124-gd50f2a5smp.ko, but not vtsspp-k1om-126.96.36.199-g5f2543dsmp.ko.
I'm running into a problem where data is not being written to my buffer when the kernels finish. I've tested my kernel in isolation in Eclipse running in Ubuntu on an Intel i5 CPU and it seems to output the correct results. When I move it over to CentOS I can't get printf statements to return from the kernel and my output buffers are never written to. Here is an example of my code:
double * coef_elts = (double *) calloc(p * voxels, sizeof(double));
[root@amax ~]# micctrl -s
mic0: booting (mode: linux image: /lib/firmware/mic/uos.img)
mic1: booting (mode: linux image: /lib/firmware/mic/uos.img)
I have two mic cards, but they just get stuck at booting state, and never be online. No matter I reboot the host or restart the mpss service, it just stayed at this state. Any help please? Thank you very much!