I have set up a system with Intel Xeon processors and Xeon Phi coprocessors under CentOS 7, kernel 3.10.0-327, Intel Composer Xe 2015.2.164. Installation os MPSS 3.6.1 went smoothly. Build up of MIC environment using nfs from host is just fine. I am able to compile MPI applications (e.g., https://software.intel.com/en-us/articles/using-the-intel-mpi-library-on-intel-xeon-phi-coprocessor-systems) for host and coprocessors (with -mmic).
Hello, my Xeon Phi was working for about a month and suddenly it has stopped working. Using the Intel MPSS Debugging Flowchart for Linux I followed the steps down and then across the bottom to eventually to resetting the configuration. Unloading works, cleanconfig gives no message, and then initdefaults gives:
[Warning] mic0: Generating compatibility network config file /opt/intel/mic/filesystem/mic0/etc/sysconfig/network/ifcfg-mic0 for IDB.
[Warning] This may be problematic at best and will be removed in a future release, Check with the IDB release.
I'm experimenting on cache eviction in my code using _mm_clevict(), and have the following observations:
(1) if data are evicted from L1 alone (_MM_HINT_T0), the code is slowed by ~30%.
(2) if data are evicted from L2 alone (_MM_HINT_T1), the code is slowed by ~70%.
(3) if data are evicted from both L1 and L2, the code is still slowed by ~70%.
I'm confused by (3), as I think some additional loads should miss both L1 and L2 compared to (2), which should cause further latency. Any thoughts? Thanks!
If I have a global array A, which contains all my data. I may use different parts of A in an offload sections. How do I just copy content of parts (more than one) I am interested in to mic but not copy all the content of global array. I have an example and how can I make it work? Thanks,
HPL benchmark performance obtained on a host + 1 MIC cards is coming only 154GFlops. The Host system has 102 GB memory. The theoretical peak is 1.2TF + + 256GFLOPS = 1.4TF. May I please know how to optimize the hpl performance? I've used the OFFLOAD execution, with the executable xhpl_offload_intel64.When i run hpl benchmark on simple host i am able to achieve 92 % performance. I am attaching all the files that i am using. Awaiting your quick reply.
Hi. I want to install the 'locale' package on Xeon Phi. How to do this?
It's not like a library, which I can cross-compile and upload it to the board. 'locale' is related to environmental variables and files. How can I install it? Is there a way to add this package to the Linux image of Xeon Phi on the host?
I would like to clear confusion regarding native mpi paradigm in intel xeon phi. Is it possible to use mpi to create two tasks in two separate node xeon phi natively ? That means, i want to use two coprocessors as two independent separate nodes without any association of hosts
I have been experimenting with the following compiler flags / options.
1. fp-model [strict, source, fast=2,etc]
I am just starting to learn how to program Intel PHI card. I have recently written the 1st and a very simple program (to copy some array from host to card memory) but unable to rebuild successfully.
Development environment details:
- Operating System: Windows Server 2012.
- Compiler: Intel Compiler 2016.
- Visual Studio 2013.
- Programming language: C++.
- Intel PHI card model: 6GB memory, 1.1GHz clock speed, 57cores.
- I have installed relevant drivers for window OS as per Intel guide.
Hello all. I am trying to reduce my offload costs by only copying the data that I need.
!original copy !$omp target update from(my_array) !this works !$omp target update from(1:1000,1) !$omp target update from(1:1000,2) !$omp target update from(1:1000,3) !this doesn't !$omp target update from(my_array(var1:var2, 1)) !$omp target update from(my_array(var1:var2, 2)) !$omp target update from(my_array(var1:var2, 3)) !starts coping gigabytes(????) of data or just seg faults