Hi :-)

# 3D convolution

I am using MKL's VSL multi-dimensional convolution routines to perform a 3D convolution. As a test, I am using the input function as a 3D gaussian as follows:

f(x,y,z) = exp(- (x-2)**2) exp(-(y+2)**2) exp(- (z**2))

and the convolution kernel as

g(x,y,z) = exp(-(x**2 + y**2 + z**2))

# Extract row/column of a matrix and subvector form a vector.

Hi! I would like to ask if there are routines for the following issues:

1) Extract a row/column of a matrix.

2) If theta is a vector extract a subvector. For example extract the elements 4 until 8 of  theta.

3) Merge 2 vectors. For example if a, b be row vectors I want to create a new vector c with the first row be the a vector and the second row will be the b vector.

I know that the above can be done using "for", but I' m interested in using routines (if there are).

Thank you in advance.

# Use ?gbmv with a 3d array.

Let assume the we have a 3d array A and a matrix B of size 2x5.

`int A[3][2][2]= { {{1,2},{3,4} }, {{5,6},{7,8}}, {{9,10},{11,12}} };`

I want to program the following:

cs =0.0;

for j=1 until 3

Multiply the Aj matrix with the t-j column of the B matrix and add the result to the term cs.

end

# Issue using c compiler with headers not being recognized

I get following error, however, fd_set is defined in /usr/include/sys/select.h

root@benjamin-Lenovo-IdeaPad-Y510P:~/bin/setwidth/src# icc -std=c99 -I /usr/include -I /usr/local/lib/R/include/  -O3 -ipo -xavx -openmp -c setwidth.c
In file included from setwidth.c(4):
/usr/local/lib/R/include/R_ext/eventloop.h(73): error: identifier "fd_set" is undefined
extern InputHandler *getSelectedHandler(InputHandler *handlers, fd_set *mask);
^

# Using embree on CentOS 6.4

When I running "embree" in CnetOS6.4,there goes error message"libembree_xeonphi.so.2  COI_MISSING_DEPENDENCY".
Is there any other way for me to debug?

# Interesting performance graph

Hi!

I am trying the example from chapter 4 of the "High Performance Programming for Intel Xeon Phi Coprocessors" book (lotsofcores.com). I am running the most optimized version of the program that the authors present.

Executing the program with different number of threads and plotting the number of flops gives an interesting result. The Phi version I use is a 57 cores version (3110 IIRC).

# PARDISO segmentation fault

idbc wrote after 80% of LL' factorization:

Program received signal SIGSEGV
mkl_blas_mc_sgem2vu_odd () in /mnt/storage/opt/intel/composer_xe_2013_sp1.0.080/mkl/lib/intel64/libmkl_mc.so

in the attachment there is matrix with the program and makefile to reproduce this fault.

Matrix is CSR 3-array-variation 1-based (Upper triangle part of hermitian matrix) with about 22 000 000 nonzeros and 64000x64000 size

The same program with smaller size worked, max size tested 17280x17280.

# Compiler internal error: IERROR_MODULE_ID_1204

Hello,

I am hitting an internal compiler error IERROR_MODULE_ID_1204 with icl 14.0.2 on Windows (64-bit compiles) when using  -Qstd=c++11 and optimization (-O1 or above) on a few places in a large application. I have had a hard time boiling it down to a trivial example and I can't yet eliminate a dependency on some external code, but to start the conversation here is the demo code:

# Use DAPL provider ofa-v2-mlx4_0-1 without an HCA

The DAPL provider ofa-v2-mlx4_0-1 gives much better short message latency for communication between Xeon Phi coprocessors within a single system than ofa-v2-scif0 (2-8 microseconds versus 15 microseconds).

Is it because ofa-v2-mlx4_0-1 uses the HCA? If it is not using the HCA, is it possible to enable this provider without installing an HCA into a stand-alone system with Xeon Phi coprocessors?