Using Intel® MKL in your C# program

Introduction 

Some users have asked how to call and link the Intel® Math Kernel Library (Intel® MKL) functions from their C# programs. While the standard way of interfacing with third party software libraries from C# is well documented, some of the steps in interfacing with Intel MKL specifically may still be confusing. The attached sample packages are intended to show how to navigate the whole process for Intel MKL users. These examples show how to create a custom dynamic link library (custom DLL) from Intel MKL static libraries, then call those functions from their C# source, and interface with that custom DLL.

Examples are provided for calling Intel MKL from five domains 
1. dgemm.cs - BLAS (CBLAS)
2. dgeev.cs - LAPACK
3. pardiso.cs - PARDISO (Parallel Direct Sparse Solver interface)
4. dfti_d1.cs - DFTI (the FFT interface)
5. vddiv.cs - VML vector math library

Let's take the function "dgeev" as example. This function is a typical routine in the Linear Algebra PACKage (LAPACK). It computes the eigenvalues and left and right eigenvectors of a general matrix. As LAPACK routines have Fortran interfaces, this means that scalars are passed as references, and only Fortranstyle (column-major) matrices are allowed. Many LAPACK routines use work arrays. Some array arguments may not be referenced under certain conditions. The interface in C# is follows:

[SuppressUnmanagedCodeSecurity]
[DllImport("mkl.dll", CallingConvention=CallingConvention.Cdecl)]
static extern void dgeev(ref char jobvl, ref char jobvr,
ref int n, [In, Out] double[] a, ref int lda,
[Out] double[] wr, [Out] double[] wi,
[Out] double[] vl, ref int ldvl, [Out] double[] vr, ref int ldvr,
[In, Out] double[] work, ref int lwork, ref int info);

In this LAPACK routine "vr" is not referenced if jobvr = 'N', and "vl" is not referenced if jobvl = 'N'. The 
user can pass a "null" value  in  the  case  of unreferenced arguments.  It does not matter for automatic 
pinning but it may require additional efforts to check null values in case of passing arrays as pointers 
with fixed keyword within an unsafe block. 

Building the Examples with Custom DLL

Example files: Intel_MKL_C#_Examples.zip

Follow these steps to build the example programs:

  • unzip the contents of the attached zip file
  • Open a Microsoft Visual Studio command prompt or add the Microsoft.NET Framework to the PATH environment variable in another command prompt
  • Run the build script (makefile) using nmake
    Example:

          >"C:\Program Files (x86)\Intel_sw_development_tools\compilers_and_libraries\windows\mkl\bin\mklvars.bat" intel64
          >nmake intel64 

  • The makefile provides further explanation of the parameters in comments

This will create custom DLL mkl.dll, which include the used functions and several executables for each of the example programs.

Building the Examples with  Intel® MKL Single Dynamic Library (mkl_rt.dll)

Dynamic interface libraries have been added since Intel MKL 10.3 for improving linkage from C#.  So it is not required to build a custom DLL. The Intel MKL library named mkl_rt.dll can be called directly from C# code. Below is an updated version of the examples. Follow the same steps and run the nmake command.

Example:  
>"
C:\Program Files (x86)\Intel_sw_development_tools\compilers_and_libraries\windows\mkl\bin\mklvars.bat" intel64
>n
make intel64 

Example files: Intel_MKL_C#_Examples_02.zip

Additional Examples with Complex Data Types

The below examples are provided for calling complex data types from three Intel MKL domains 
1. cscal.cs - BLAS (CBLAS)
2. cgemm.cs - LAPACK
3. LAPACKE_zgesv.cs - C interface LAPACK function

There is an informative discussion in MKL Forum, please read the thread. It includes several issues:

1) System.Numerics Complex type

If you use System.Numerics Complex type, it corresponds to "Z" type in MKL function : complex, double precision(64bit, 8 bytes)

For example, LAPACKE_zgesv, not LAPACKE_cgesv

2) Use the large Array >2GB

2.1) .NET allow allocate large array since .NET 4.5, by set the gcAllowVeryLargeObjects = true into the config file.

2.2) when passing large array to native space, the default marshaling doesn't allow large array > 2GB.  One workaround is to pin the array and pass a pointer to its first element to the native function with the fixed keyword:

using System.Numerics;
using System.Runtime.InteropServices;
[SuppressUnmanagedCodeSecurity]
 internal sealed unsafe class CNative
 {
  private CNative() {}
 /**  native LAPACKE_zgesv declaration */
    [DllImport("mkl_rt.dll", CallingConvention=CallingConvention.Cdecl,
      ExactSpelling=true, SetLastError=false)]
    internal static extern int LAPACKE_zgesv(
    int matrix_layout, int n, int nrhs,
    Complex* A,  int lda,  int[] ipiv, Complex* B,int ldb );
 }
public unsafe sealed class LAPACK
 { /** LAPACKE_zgesv wrapper */
   public static int zgesv(int matrix_layout, int n, int nrhs,
   Complex[] A, int lda, int[] ipiv,
   Complex[] B, int ldb)
  {
      fixed (Complex* pA = &A[0])
      fixed (Complex* pB = &B[0])
      // fixed (int* pipiv = &ipiv[0])
   return CNative.LAPACKE_zgesv(matrix_layout, n, nrhs, pA, lda,
      ipiv, pB, ldb);
  }
 }

Examples with Pinned Memory

As part of the .NET framework, C# can be thought of as “managed” code: memory is automatically managed by the .NET garbage collector. This is in contrast to “unmanaged” code such as C and C++, where memory is explicitly controlled by function calls in the code itself. While the basic idea of automatic garbage collection is easy to understand, some users may not realize that the .NET garbage collector can also move objects around in order to keep memory defragmented. This somewhat unexpected behavior can lead to undesirable consequences when accessing managed memory from unmanaged code. Most of the time, we do not have to worry about such details, as C# will automatically pin objects in memory when they are passed as arguments to unmanaged code. However, several MKL procedures utilize a “handle” to store internal data for later use, and in some cases the handle may store a pointer to a user-supplied data array. In such cases, it is necessary to take special precautions to ensure that the user-supplied input is stored in a fixed location in memory.

The easiest way to avoid conflicts between the garbage collector and unmanaged code is to explicitly pin data in memory so that our internal pointers always point to fixed objects. C# allows us to do this with the GCHandle class. For example, in the Direct Sparse Solvers routine dss_define_structure takes user-supplied data arrays describing the structure of a sparse matrix and stores pointers to them in the DSS handle. These arrays are needed during later stages of the DSS interface and are accessed via the DSS handle. Therefore, we must pin the user-supplied data before calling the DSS routines:

int[] rowIndex = new int[nRows];
int[] columns = new int[nCols];
/* Pin user-supplied data in managed memory */
GCHandle rowIndex_pin = GCHandle.Alloc(rowIndex, GCHandleType.Pinned);
GCHandle columns_pin = GCHandle.Alloc(columns, GCHandleType.Pinned);
/* Call the DSS routines */
// (see full example for details)
/* Release the memory pins */
rowIndex_pin.Free();
columns_pin.Free();

In the above code snippet, rowIndex_pin and columns_pin are instances of the GCHandle class that pin the data arrays rowIndex and columns in memory, thus providing the internal DSS handle with a fixed memory reference. The attached file Intel_MKL_C#_Examples_Pinned_Memory.zip contains examples of memory pinning for

  1. dss_sym.cs (the Direct Sparse Solvers interface)
  2. djacobi_rci.cs (Jacobian matrix calculation with reverse communication interface)
  3. ex_nlsqp_bc.cs (nonlinear least squares problem with linear bound constraints interface)

Follow the build steps for “Building the Examples with Custom DLL” to run these examples.

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

Hi,
I am using eigen values/vectors decomposition DSYEV in mkl for a university project and I need results to be reproductible.
I wrote a little wrapper (according to examples) to call DSYEV from C# :

[code=c-sharp]
namespace Front_toolbox_v2
{
class Solve
{
public double[,] decompo(double[,] A)
{
int lwork = -1;
int n = A.GetLength(0);
int lda = n;

double[] A_bis = new double[n * n];
double[] w = new double[n];
for (int i = 0; i < n; i++)
{
int cte = i * n;
for (int j = 0; j < n; j++)
{
A_bis[cte + j] = A[j, i];
}
}

lwork = -1;
double[] work_1 = new double[1];
int info = LAPACK_mkl.dsyev('V', 'U', n, A_bis, lda, w, work_1, lwork);
lwork = (int)work_1[0];
double[] work = new double[lwork];
info = LAPACK_mkl.dsyev('V', 'U', n, A_bis, lda, w, work, lwork);

double[,] res2 = new double[n, n + 1];
for (int i = 0; i < n; i++)
{
res2[i, 0] = w[i];
for (int j = 0; j < n; j++)
{
res2[i, j + 1] = A_bis[i + j * n];
}
}
GC.Collect(0);
return res2;
}
}
}

namespace mkl
{
public sealed class LAPACK_mkl
{
private LAPACK_mkl() { }

public static int dsyev(char jobz, char uplo, int N, double[] A, int LDA, double[] w, double[] work, int lwork)
{
LAPACKNative.kmp_set_warnings_off();
int num = Environment.ProcessorCount;
LAPACKNative.omp_set_num_threads(ref num);

int info = -1;
LAPACKNative.dsyev(ref jobz, ref uplo, ref N, A, ref LDA, w, work, ref lwork, ref info);
return info;
}
}

/** LAPACK native declarations */
[SuppressUnmanagedCodeSecurity]
internal sealed class LAPACKNative
{

[DllImport("libiomp5md", EntryPoint = "omp_set_num_threads")]
internal static extern void omp_set_num_threads(ref int num);

[DllImport("libiomp5md", EntryPoint = "kmp_set_warnings_off")]
internal static extern void kmp_set_warnings_off();

[DllImport("mkl", EntryPoint = "DSYEV", CallingConvention = CallingConvention.Cdecl), SuppressUnmanagedCodeSecurity]
internal static extern void dsyev(ref char jobz, ref char uplo, ref int n, double[] A, ref int lda, double[] w, double[] work, ref int lwork, ref int info);

private LAPACKNative()
{
kmp_set_warnings_off();
int num = Environment.ProcessorCount;
omp_set_num_threads(ref num);
}
}
}
[/code]

I red something about memory alignment and 16-byte boundaries... does this apply here ?
Thank you for any answer or comment on my code


I'm attempting to run the examples on my machine, Windows-7 VS 2010. I need to eventually integrate a good FFT library into my product and this one was recommended, but I can't get the examples to run. I've set up nmake, added paths of the MKL redistributable to my main path environment variable, etc.

The makefile execution fails:

C:projectsMKL Examples>nmake ia32 mklredist="C:Program Files (x86)IntelComposerXE-2011redist"

Microsoft (R) Program Maintenance Utility Version 10.00.30319.01
Copyright (C) Microsoft Corporation. All rights reserved.

Add path of the MKL redistributable to the path environment variable
set path=%MKLREDIST%ia32mkl;%MKLREDIST%ia32compiler;%path%
Build and run examples
nmake /a dgemm.exe dgeev.exe dfti_d1.exe pardiso.exe vddiv.exe

Microsoft (R) Program Maintenance Utility Version 10.00.30319.01
Copyright (C) Microsoft Corporation. All rights reserved.

Compile dgemm.cs
csc .dgemm.cs
Microsoft (R) Visual C# 2010 Compiler version 4.0.30319.1
Copyright (C) Microsoft Corporation. All rights reserved.

Run dgemm example
dgemm.exe
MKL cblas_dgemm example

alpha=1
beta=-1
Matrix A
1 2 3
4 5 6
Matrix B
0 1 0 1
1 0 0 1
1 0 1 0
Initial C
5 1 3 3
11 4 6 9
MKL FATAL ERROR: Cannot load mkl_intel_thread.dll
NMAKE : fatal error U1077: '.dgemm.exe' : return code '0x1'
Stop.
NMAKE : fatal error U1077: '"C:Program Files (x86)Microsoft Visual Studio 10.0VCBINnmake.EXE"' : return code '0x2'
Stop.

I am totally unable to add references to the MKL DLL's in VS2010. The standard "Add Reference" simply ignores my attempt.

How does one use this library with a C# project?

Janene


I downloaded MKL for evaluation, and I also downloaded the examples above.
The source code is pretty clear - but I do NOT manage to properly set up the links from the C# project to the MKL DLLs

Using the Visual studio command for project links, I can find the DLLs, but setting up links is not accepted.

What is the nature of the DLLs? COM or .NET?
Is there anything else I have to do to set up the links ??

The makefile was not helpful at all, I have a Windows 7 64-bit computer, and I can not find nmake on my machine ....



only for this version:
nmake ia32 MKLROOT="C:ProgramFileIntelCompile11.1.51mkl"
and i need to coy libiomp5md.dll & libiomp5md.lib to de folder IntelCompiler11.151mklia32lib



I have a C# program running on Mono/Linux platform. Does Intel MKL Linux version allow me to integrate the MKL into the Mono implementation as well? If so, is the instruction pretty similar?

Thanks.


If I had a C# program running on mono/linux, can I use the MKL Linux version with my program since this solution is relying on a DLL?


Hi, Jo.
I've checked on an 8 core Xeon - it is ok with MKL_NUM_THREADS=2 or 4 or (default) 8.
I used the makefile attached to the examples to build custom dll and run example with N=9000, K=8000, M=6000.
Could you please provide more details how do you build dll? And a small test if it is possible.
Thanks,
Vladimir


Hi MKL gurus,

We have successfully linked a C# program to MKL thanks to your examples, but a problem remain: It seems that MKL can't span multiple threads when being called from our managed environment. When calling cblas_dgemm from a freshly built DLL, the linkage is successfully done (pinning, ...) and the Matrix-Matrix multiplication is processed, but on an Intel Xeon 5130, we attained only 25% of CPU Usage, even after setting some environment variables (MKL_DYNAMIC=FALSE, MKL_NUM_THREADS=x) which seemed to have no impact on this issue...

Of course we were expecting some overhead coming from the C#->C transition, but such a behavior seems strange. Is it expected or have we gone wrong on something? Does C# constrain the use of unmanaged code to only one thread?

Best Regards.


Pages