Параллельные вычисления

Cholesky with pdpotrf()

I am performing a Cholesky factorization with pdpotrf(). I am reading all the matrix in the master node and then I distribute it. Then, every node is handling a submatrix and call pdpotrf(). Then I just send back the submatrices to the master node and compose the solution.

I am amazed by that. How does it do it? I mean what algorithm does it implement? I suspect it's block partitioning and every node is communicating (I hope not much, but I would really like to know).

Broken dgeqp3 in Version 11.2 (Update 3) (Linux)

Hi all,

This was working in December 2014 when I last ran my code against MKL, but after upgrading to 11.2u3 I'm getting a response of -9 from the info parameter when calling dgeqp3... which is *really* weird because that indicates that the info parameter itself is wrong (being parameter 9).

My code works against reference LAPACK[1] and ATLAS / OpenBLAS so I'm inclined to suggest that a regression has appeared.

Understanding to CPU Time and Instructions retired

Hi All,

The following is the snapshot from VTune on my Haswell processor. However, I don't understand that why the CPU time and the number of instructions retired for the highlighted code (vpbroadcastq) are so significantly greater than the others in the same basic block. I thought the number of the retired instructions should be not too different, though there might be cache misses or TLB misses. Can someone explain some possible reasons for it? Thanks.

Preconditioner dcsrilu0 has returned the ERROR code -106

hi, all, Please give me some suggestions about error: Preconditioner dcsrilu0 has returned the ERROR code -106.

I tried to use ilu + gmres to solve Ax=b.

Here, A is like

2 -1

  -1 2 -1

      -1  2 -1

and so on. I have attached code. Thanks.


Problem with ZCGESV

The problem I am facing is that ZCGESV function crashes when matrix size is 46497 or more. When matrix size is, for example, 46202 everything works fine.

From what I can see in LAPACK sources at http://www.netlib.org/lapack/explore-html/d5/d4a/zcgesv_8f_source.html, there can be integer overflow for variable ptsx for large n as ptsx is declared as INTEGER:

Force xeon level precision on Xeon phi or vice versa

Hi all,

I have been running a program where precision of doubles mean a lot to my program.

However due to some strange reason it seems like Xeon phi is rounding off a few bits(at 10^-8th bit) and this seems to be causing some instabilities to my model. A small round off error grows over my model over iteration of time step and my model fails to converge.

here is  some sample differences in error.

Xeon phi value

small typo in Intel® 64 and IA-32 Architectures Software Developer’s Manual


It seems that there is a small typo in the Intel® 64 and IA-32 Architectures Software Developer’s Manual (Order Number: 253665-054US April 2015), page 3-149 (cmpss instruction) :

128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 64-bit memory location.

It should be 32-bit memory location.




Deadlock with MPI_Win_fence going from Intel MPI to

We encountered a problem when migrating a code from Intel MPI to The code in question is a complex simulation that first reads global input state from disk into several parts in memory and then accesses this memory in a hard to predict fashion to create a new decomposition. We use active target RMA for this (on machines which support this like BG/Q we also use passive target) since a rank might need data from the part that is at another rank to form its halo.

Подписаться на Параллельные вычисления