Can I pass a subset of a matrix into another function in MKL?

Can I pass a subset of a matrix into another function in MKL?

Bild des Benutzers Po

I am trying to optimize a lot of matrix calculations in MKL that requires me to allocate large blocks of memory using something like :

double* test_matrix = (double*)mkl_malloc(n * sizeof(double), 64).

Recently, I have been finding a lot of memory allocation errors that are popping up - which are hard to replicate and even harder to debug. I am worried that there is some internal header data that MKL puts into the heap that I am not accounting for using my current method.

Is there an "official" way of passing a subset of a MKL matrix into another function? Passing a copy would definitely increase my overhead too much. I am currently giving a reference of to the matrix subset like this:

double* a = (double*)mkl_malloc(4 * 4 * sizeof(double), 64);
double* b = (double*)mkl_malloc(4 * 4 * sizeof(double), 64);
double* c = (double*)mkl_malloc(2 * 2 * sizeof(double), 64);

... fill in values for a and b ...

cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 2, 2, 2, 1, &a[2], 4, &b[2], 4, 0, c, 2);
cout << "Result is: " << c[0] << c[1] << c[2] << c[3] << endl;

6 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers Henrik Arlinghaus

I've been using dgemm just fine when passing in subsections of matrices, just make sure the stride matches the real matrix size and the starting offset + height are clipped to the total size.

Bild des Benutzers Po

Thanks for your quick response Henrik. Is there any special operations I would need to perform when deallocating memory for the matrix being passed to an internal function?

Also, is there any issues with passing a return matrix pointer to a function for the function to use as output? For example, this is my usual way of doing things (creates some memory errors with increased complexity, not sure of precise source of the bugs):

double* a = (double*)mkl_malloc(2 * 2 * sizeof(double), 64);
double* b = (double*)mkl_malloc(2 * 2 * sizeof(double), 64); 
double* c = (double*)mkl_malloc(8 * 8 * sizeof(double), 64);

for(int i = 0; i < 4; i++) {
  a[i] = 1.0; // could be any value
  b[i] = 1.0; 

}

someOperation(a, b, &c[5]);

mkl_free_buffers();
mkl_free(a);
mkl_free(b);
mkl_free(c); 

Bild des Benutzers mecej4

Po, there are no truly two-dimensional arrays in any of your codes as shown. Although you can certainly use a pointer to a sufficiently large block of memory as a matrix, it is your responsibility to make sure that the code properly maps the conceptual matrix to a one-dimensional array. Therefore, your questions have no answers yet. For example, what do you expect the "matrix" c to contain after the call to someOperation()? How was c allocated, and how do you intend to access it in subsequent code? What does someOperation() expect as arguments, how does it declare the formal arguments, and how are the arrays used inside the function?

Bild des Benutzers Zhang Z (Intel)

To improve performance of Intel MKL, the memory allocator uses per-thread memory pools where buffers may be collected for fast reuse. The mkl_free_buffers() function can be used to free unused memory. You should call mkl_free_buffers() after the last call to Intel MKL functions. In large applications, if you suspect that the memory may get insufficient, you may call this function earlier, but anticipate a drop in performance that may occur due to reallocation of buffers for subsequent calls to Intel MKL functions.

If this does not solve your memory allocation problems, then you can also try setting the MKL_DISABLE_FAST_MM environment variable to 1 or call the mkl_disable_fast_mm() function. This makes MKL not to use memory pools for fast buffer allocation/de-allocation. But be aware that this change may negatively impact performance of some Intel MKL functions, especially for small problem sizes.

Bild des Benutzers Sergey Kostrov

>>...Recently, I have been finding a lot of memory allocation errors that are popping up - which are hard to replicate and
>>even harder to debug...

Please provide a complete reproducer of all these "memory" errors. I don't see any memory related errors in my codes related to cblas_dgemm function.

Melden Sie sich an, um einen Kommentar zu hinterlassen.