Recent posts
https://software.intel.com/en-us/recent/545515
endcsrmm is throwing 'integer division by zero' after upgrading to 11.0 update 5
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/538552
<p>After updating the installed MKL version when attempting to avoid another bug we discovered, previously working code has started throwing division by zero errors after sparse matrices exceed a certain nonzero count. (Again using parallel 64bit MKL)</p>
<blockquote><p>Unhandled exception at 0x000007FEDEC0DB3D (mkl_avx.dll) in TestSparseMultiply.exe: 0xC0000094: Integer division by zero.</p>
</blockquote>
<p>Callstack:</p>
<blockquote><p> mkl_avx.dll!000007fedec0db3d() Unknown<br />
mkl_intel_thread.dll!000007fee0549de3() Unknown</p>
<p> mkl_intel_thread.dll!000007fee02d5037() Unknown</p>
</blockquote>
<p> </p>
<p>Repro case:</p>
<pre class="brush:cpp;">#include <mkl.h>
int _tmain(int argc, _TCHAR* argv[])
{
const size_t sparseMatrixWidth = 6045696;
const size_t sparseMatrixHeight = 200;
const size_t sparseMatrixUsage = 1000000 / 2; // <= this should crash
// const size_t sparseMatrixUsage = 1000000 / 3; // <= This still works
double *sparseMatrixData = new double[sparseMatrixUsage * sparseMatrixHeight];
::memset(sparseMatrixData, 0, sizeof(double) * sparseMatrixUsage * sparseMatrixHeight);
MKL_INT *colIndices = new MKL_INT[sparseMatrixUsage * sparseMatrixHeight];
::memset(colIndices, 0, sizeof(MKL_INT) * sparseMatrixUsage * sparseMatrixHeight);
MKL_INT *rowOffsets = new MKL_INT[sparseMatrixHeight + 1];
::memset(colIndices, 0, sizeof(MKL_INT) * (sparseMatrixHeight + 1));
::srand(42);
rowOffsets[0] = 0 + 1; // 1 based
for (unsigned long y=0;y<sparseMatrixHeight;++y)
{
MKL_INT lastIndex = 0;
for (unsigned long x=0;x<sparseMatrixUsage;++x)
{
// Jump forward a random amount, ensuring even if we hit the maximum jump everytime we do not exceed the matrix limits
int jump = rand() % (sparseMatrixWidth / sparseMatrixUsage - 1);
lastIndex = lastIndex + 1 + jump; // Ensure we jump forward at least 1 column
sparseMatrixData[x + y*sparseMatrixUsage] = 1.0;
colIndices[x + y*sparseMatrixUsage] = lastIndex + 1; // 1 based
}
rowOffsets[y + 1] = sparseMatrixUsage * (y + 1) + 1; // 1 based
}
// Doublecheck: Verify sparse matrix properties
for (unsigned long y=0;y<sparseMatrixHeight;++y)
{
// Since we are using one based matrices, nothing is allowed to be zero
if (rowOffsets[y] == 0)
return -1;
// Row data must be ordered in memory
if (rowOffsets[y + 1]< rowOffsets[y])
return -1;
// If rows are not empty ...
if (rowOffsets[y] != rowOffsets[y + 1])
{
// ... make sure column indices are one based and do not exceed the matrix size
for (unsigned long i = rowOffsets[y];i < rowOffsets[y + 1];++i)
{
if (colIndices[i - 1] <= 0)
return -1;
if (colIndices[i - 1] >= sparseMatrixWidth)
return -1;
}
// ... make sure column indices are in ascending order
for (unsigned long i = rowOffsets[y];i < rowOffsets[y + 1] - 1;++i)
{
if (colIndices[i - 1 + 1] <= colIndices[i - 1])
return -1;
}
}
}
// Calculate matrix average by multiplying with a vector containing ones and scaling by the inverse vector length
const size_t rightVectorLength = sparseMatrixWidth;
double *vectorData = new double[rightVectorLength];
for (unsigned long x=0;x<rightVectorLength;++x)
{
vectorData[x] = 1.0;
}
// Store the result in this vector
const size_t resultVectorLength = sparseMatrixHeight;
double *resultData = new double[resultVectorLength];
char matdescra[6] = {'g', 'l', 'n', 'f', 'x', 'x'};
/* https://software.intel.com/sites/products/documentation/doclib/iss/2013/mkl/mklman/GUID-34C8DB79-0139-46E0-8B53-99F3BEE7B2D4.htm#TBL2-6
G: General. D: Diagonal
L/U Lower/Upper triangular (ignored with G)
N: non-unit diagonal (ignored with G)
C: zero-based indexing. / F: one-based indexing
*/
MKL_INT strideA = static_cast<MKL_INT>(rightVectorLength);
char transposeAtxt = 'N';
MKL_INT numRowsA = static_cast<MKL_INT>(sparseMatrixHeight),
numDestCols = static_cast<MKL_INT>(1),
numColsA = static_cast<MKL_INT>(sparseMatrixWidth);
double tempAlpha = 1.0 / sparseMatrixWidth;
double tempBeta = 0.0;
MKL_INT resultStride = static_cast<MKL_INT>(resultVectorLength);
::mkl_dcsrmm( &transposeAtxt,
&numRowsA,
&numDestCols,
&numColsA,
&tempAlpha,
matdescra,
sparseMatrixData,
colIndices,
rowOffsets,
rowOffsets + 1,
vectorData, &strideA,
&tempBeta,
resultData, &resultStride );
return 0;
}
</pre><p> </p>
Mon, 12 Jan 15 06:53:22 -0800Henrik A.538552Parameters for ?stemr
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/538220
<p>I have a problem where I need to calculate a number of eigenvectors and a different number of eigenvalues. Instead of calling dsyevr twice I plan on calling dsytrd -> dstemr * 2 -> dormtr. (Or alternatively stebz / stein)</p>
<p>However, I have noticed unexpected behavior regarding the eigenvector parameter (using 11.0 update 5, from C).</p>
<p>If I use jobz = 'N', then the call to dstemr sets the first element of the eigenvector array to 0.0 even when only calculating eigenvalues, and crashes if no array is provided. The documentation states that this variable is not used in this case. Is it sufficient to pass a single double as a dummy parameter or does the function also set more values in this array? Also while the documentation states that ldz should be >= 1 in this case, the function fails unless it is >= N.</p>
<p> </p>
<p>Secondly, If I use jobz = 'V', the documentation states</p>
<p>"<em>Array z(ldz, *), the second dimension of z must be at least max(1, m).</em></p>
<p><em>If jobz = 'V', and info = 0, then the first m columns of z contain the orthonormal eigenvectors of the matrix T corresponding to the selected eigenvalues, with the i-th column of z holding the eigenvector associated with w(i). </em>".</p>
<p>However, when calling from C, ldz contains the number of columns in the array, and should be set to the number of eigenvalues, but the parameter validation requires ldz to be >= N, which means that if I want to calculate the first 10 eigenvalues of a 1000 x 1000 matrix, I still would need to allocate the full size matrix. Am I missing something? Is this just due to LAPACK_ROW_MAJOR?</p>
<p> </p>
Fri, 09 Jan 15 02:53:01 -0800Henrik A.538220Dense * Sparse matrix calculations, is there an easier way
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/534557
<p>I am porting code (C, so row major) which makes use of ?gemm, ?syrk and ?syr2k calls from dense matrices to sparse matrices, and would to know if there is a simpler way of calculating the various internal matrix products than the following:</p>
<p> </p>
<p><strong>?syrk</strong>: use <strong>?csrmultd</strong>. I assume since this method only allows 1 based indexing the resulting dense matrix is column major, but I would like confirmation.</p>
<p> </p>
<p><strong>?syr2k</strong>: use two <strong>?gemm</strong> calls here instead.</p>
<p> </p>
<p><strong>?gemm</strong>: This case gets fairly complicated, and it would be extremely nice if someone can tell me if there are methods / options I am overlooking which would simplify this. A and D are the dense result and multiplicand matrices, S['] is a sparse matrix which may be transposed.</p>
<p>A = S['] * D + A<br />
- Use mkl_dcsrmm directly (using zero based indexing)</p>
<p>A = S['] * D' + A<br />
Either<br />
- Transpose D -> Dt<br />
=> A = S['] * Dt + A<br />
- use mkl_dcsrmm (using zero based indexing)<br />
Or<br />
- Convert S to one based indexing, forces mkl_dcsrmm to implicitly use col major C arrays (D' -> Dt)<br />
- Calculate temp. matrix Tt = S['] * Dt via mkl_dcsrmm<br />
- T' (row major) = Tt (col. major)<br />
- calculate A = T' + A;</p>
<p>
A = D * S['] + A<br />
- Transpose equation:<br />
-> A' = (D * S['])' + A' = S[!'] * D' + A'<br />
Either:<br />
- Convert S to one based indexing, forces mkl_dcsrmm to implicitly use col major C arrays (A' -> At, D' -> Dt)<br />
=> At = S[!'] * Dt + At<br />
-> Use mkl_dcsrmm<br />
Or:<br />
- Transpose A' => At, D' => Dt<br />
=> At = S[!'] * Dt + At<br />
- Use mkl_dcsrmm (using zero based indexing)<br />
- Transpose At</p>
<p>
A = D' * S['] + A<br />
-> Transpose equation:<br />
-> A' = (D' * S['])' + A' = S[!'] * D + A'<br />
Either:<br />
- Transpose A' => At<br />
=> At = S[!'] * D + At<br />
-> Use mkl_dcsrmm (using zero based indexing)<br />
-> Transpose At => A'<br />
Or:<br />
-> Calculate temp. matrix T = S['] * D via mkl_dcsrmm<br />
-> Calculate A = T' + A</p>
<p>
In theory I could always store my sparse matrix as one based, and if I need to treat it as zero based add dummy rows / columns to the dense matrices to catch the additional row/column created when multiplying, which means converting between the indexing won't take any time.</p>
<p> </p>
<p>One final question: Could someone confirm that it is possible to use <strong>mkl_?omatadd</strong> to calculate A = A + B without using a temp matrix? The documentation doesn't state whether the memory is allowed to overlap between input and output if no transposition is being done.</p>
Tue, 28 Oct 14 11:04:00 -0700Henrik A.534557Indexing an array of size between 2^31 and 2^32-1 with LP64?
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/472433
<p>I am working with spblas. My matrix dimension is about 300k, nnz is between 2^31 and 2^32-1. To keep the memory consumption as small as possible. I would like to use 32bit unsigned integer to index my element. Is it possible to do so with LP64 by defining MKL_INT as uint32_t? I tried it, but my program crashed with a segmentation fault when calling mkl_scsrmv.</p>
Sun, 08 Sep 13 23:50:40 -0700Yu S.472433Can I pass a subset of a matrix into another function in MKL?
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/422620
<p>I am trying to optimize a lot of matrix calculations in MKL that requires me to allocate large blocks of memory using something like :</p>
<p>double* test_matrix = (double*)mkl_malloc(n * sizeof(double), 64).</p>
<p>Recently, I have been finding a lot of memory allocation errors that are popping up - which are hard to replicate and even harder to debug. I am worried that there is some internal header data that MKL puts into the heap that I am not accounting for using my current method.</p>
<p><strong>Is there an "official" way of passing a subset of a MKL matrix into another function?</strong> Passing a copy would definitely increase my overhead too much. I am currently giving a reference of to the matrix subset like this:</p>
<p>double* a = (double*)mkl_malloc(4 * 4 * sizeof(double), 64);<br />double* b = (double*)mkl_malloc(4 * 4 * sizeof(double), 64); <br />double* c = (double*)mkl_malloc(2 * 2 * sizeof(double), 64);</p>
<p>... fill in values for a and b ... </p>
<p>cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 2, 2, 2, 1, &a[2], 4, &b[2], 4, 0, c, 2); <br />cout << "Result is: " << c[0] << c[1] << c[2] << c[3] << endl;</p>
Tue, 20 Aug 13 09:47:47 -0700Po422620Best function for inplace matrix addition (w. stride)
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/393488
<p>I often need to calculate the sum of a set of matrices or submatrices of a dataset. Unfortunately the two matrices do not always have the same stride, when I am selectively using a subset of a large dataset, which means I have to resort to calculating the sum by hand (alternatively, I could call vkadd or similar once per row, I'm not sure how much overhead this implies when calling vkadd 500 or 1000 times for a 500x500 matrix).</p>
<p>I am aware of the mkl_?omatadd function, but the documentation states that the input and output arrays cannot overlap, which means I would need an extra temporary matrix. While I would assume calculating A = A + m * B works inplace when not transposing matrices, unless this can be guaranteed for all future versions I cannot use that approach.</p>
<p>Are there any other functions which could be used for this calculation I have missed?</p>
Thu, 06 Jun 13 04:15:32 -0700Henrik A.393488Example projects (VS 2010) for custom redist DLL are broken
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/277152
<p>The example projects provided with the MKL to create a customized minimal redist dll have the paths to the included LIB files hardcoded instead of using $(MKLROOT), this means once the projects are copied away from the MKL install directory they fail to build with error "LNK1131: no library file specified".</p>
<p>Removing the .lib file from the project and adding it to the project properties, as well as adding $(MKLROOT)\\lib to the library paths fixes the problem.</p>
Thu, 02 Aug 12 07:55:35 -0700Henrik A.277152sparse matrix dense matrix multiplication
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/277217
<p>Dear all,</p>
<p>For my Krylov subspace basis build up, I am doing some sparse matrix - dense matrix multiplications. That is fine. I use mkl_dcsrmm for this. But the question now is I am keeping my Krylov vectors in a </p>
<p>std::vector </p>
<p>conceptually, the vectors are the columns of my dense matrix. However, the above interface is considering all the matrices in row major order, is there a way to change this. Since I keep my basis in an array of arrays, I should somehow make it interpreted colum-wise. Is there a work around for this? Or both matrix ordering should be the same when using the routine so one could not be row major(the sparse matrix) and the dense matrix, column major? This is far more efficient and easy to use if there is a work-around for this task.</p>
<p>For instance for the below simple code, I would like b to be interpreted column wise not row-wise while a is still row-wise.</p>
<p>int main()<br />{<br /> double a[] = {1.e0,2.e0,3.e0};<br /> int ia[] = {0,1,2,3};<br /> int ja[] = {0,1,2};<br /> double b[] = {1.e0,4.e0,2.e0,5.e0,3.e0,6.e0};<br /> double c[] = {0,0,0,0,0,0};<br /> MKL_INT m = 3;<br /> MKL_INT k= 3;<br /> MKL_INT n = 2;<br /> double alpha = 1.0;<br /> double beta = 0.0;<br /> //<br /> char matdescra[6];<br /> matdescra[0]= 'g';<br /> matdescra[1]= 'l';<br /> matdescra[2]= 'u';<br /> matdescra[3]= 'c';<br /> char transa = 'N';<br /> mkl_dcsrmm(&transa,<br /> &m, &n, &k,<br /> &alpha,<br /> matdescra,<br /> a, ja, ia,<br /> ia+1, b,<br /> &n, &beta, c, &n);<br /> //<br /> for(int z=0;z<6;z++)<br /> std::cout << c[z] << std::endl;<br /> return 0;<br />}</p>
Fri, 27 Jul 12 04:20:51 -0700utab277217Out of space in MKL
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/277562
<p>What happens if MKL cannot allocate space for internal buffers.Does it crash?Does it use a simple version that does not require any buffers?Is there anyto know if MKL ran out of buffer space?Is this documented anywhere?Erling</p>
Fri, 29 Jun 12 04:55:50 -0700erling_andersen277562Alternative to dgemm(A', A, B) which only calculates upper/lower triangular matrix? (CBLAS)
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/277756
<p>I need to calculate a matrix crossproduct of the form B = A' * A; This results in a symmetric matrix B, so it should be possible to have the multiplication only calculate the upper or lower triangular matrix B and flip it to fill the second half, thereby saving 50% of the calculation time. However I cannot find a method/option which does this.</p>
<p>I have tried manually implementing this calculation, by multiplying row vectors/blocks of A' by A and storing these in the corresponding blocks of B, however depending on the block size the overhead due to multiple calls can even lead to a decrease in performance (very small blocks) or a gain in performance < 50%.</p>
<p>Alternatively, what would the optimal block size be to reduce the overhead in multiple calls, and spinning up threads? Is any information available on how the algorithm partitions the data into multiple threads internally?</p>
Fri, 15 Jun 12 06:34:06 -0700Henrik A.277756