I am using Intel C++ compiler 15.0 for Windows and there's a simple case that cannot compile.
I have 4 _m128i 64byte elements which can contain 0 or non-zero (+ve, -ve) values. I want to extract non-zero values from them.
I looked at _mm_extract_epi8/_mmextract_epi16 but the syntax is int _mm_extract_epi16 (__m128i a, int imm) where imm is the index, hence I have to loop to get non-zero values.
Any intrinsics functions that can be used to avoid loop will be helpful. Inputs appreciated.
I'm experimenting with fast scrubbing / jumping to frames in a h264 bitstream and using the latest Media SDK to decode the stream. Decoding works fine and I can render the decoded frames in a OpenGL window.
Could someone help me with my software installation issue with Xeon Phi? I'm trying to compile my program on the host for future offload developments. In the case I don't include any offload functions in my codes, my program as well as some external libraries (Arpack, Petsc, Libmesh, etc.) are compiled with Intel MPI & MKL and running well on the host. However, if I put any testing offload codes, then it shows long messages during compilation and linking as shown below. An executable is created anyways but it's not running.
The (simplified) code:
Architecture: x86_64 (Haswell with 6 cores)
Compiler Version: icc 15.0
Performance degradation while compiling with autovectorization(-O2) on the code snippet below:
Hi, the systems hosting the MICs have been updated and hence the mpss must be updated.
Linux compute-19-17.local 3.19.1-1.el6.elrepo.x86_64 #1 SMP
gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)
Trying to rebuild the modules fails (it works fine with 2.6 type kernels) :
rpmbuild --rebuild mpss-modules-3.5-1.src.rpm
We have spin locks from TBB (rw). I am interested to know who owns the lock. Yes, we have information who spins at particular object, but where is that guy who holds a lock? How to identify it?