I'm trying to use the MKL to act as a calculation library underneath some threaded C# and having some problems on 32-bit machines.
Using MKL 10.2.5.035, I've built a custom DLL as described in the knowledge base article. If I generate a 64-bit DLL and use that in a 64-bit application then I have no problem. If I switch to 32-bit I do.
If I leave the "threading=" argument off the makefile command line (as I do for 64-bit) then when I run the application tried to call into the MKL I get a message
"OMP: Error #134: Cannot set thread affinity mask.
OMP: System error #87: The parameter is incorrect."
If instead I set "threading=single" and use mkl_sequential.dll then I get an access violation with a call stack of
ChildEBP RetAddr Args to Child
08dcf8f8 7710272c 087407d8 087407d8 7709b76a ntdll!RtlpBreakPointHeap+0x23 (FPO: [Non-Fpo])
08dcf914 77140b37 093e0000 087407d8 7709b76a ntdll!RtlpValidateHeapEntry+0x16d (FPO: [Non-Fpo])
08dcf95c 770fa967 093e0000 50000063 087407e0 ntdll!RtlDebugFreeHeap+0x9a (FPO: [Non-Fpo])
08dcfa50 770a32f2 087407d8 087407e0 00000003 ntdll!RtlpFreeHeap+0x5d (FPO: [Non-Fpo])
08dcfa70 766314d1 093e0000 00000000 087407e0 ntdll!RtlFreeHeap+0x142 (FPO: [Non-Fpo])
08dcfa84 0f4d1316 093e0000 00000000 087407e0 KERNEL32!HeapFree+0x14 (FPO: [Non-Fpo])
08dcfae8 0f4d1cfb 0f4d0000 08dcfb14 770a97c0 mkl!_vmlFreeThreadLocalData+0x22
08dcfaf4 770a97c0 0f4d0000 00000003 00000000 mkl!_DllMainCRTStartup+0x1e (FPO: [Non-Fpo]) (CONV: stdcall) [f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\crtdll.c @ 476]
08dcfb14 770c20bb 0f4d1cdd 0f4d0000 00000003 ntdll!LdrpCallInitRoutine+0x14
08dcfbb8 770c22a2 00000000 00000000 08dcfbd4 ntdll!LdrShutdownThread+0xe6 (FPO: [Non-Fpo])
08dcfbc8 7663367e 00000000 08dcfc14 770a9d72 ntdll!RtlExitUserThread+0x2a (FPO: [Non-Fpo])
08dcfbd4 770a9d72 005ac470 60c777a3 00000000 KERNEL32!BaseThreadInitThunk+0x15 (FPO: [Non-Fpo])
08dcfc14 770a9d45 69f459c0 005ac470 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
08dcfc2c 00000000 69f459c0 005ac470 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
This seems to occur about 10 seconds after the calculation we are doing completes (which is when the last call to the MKL returns).
The calls to the MKL are being made inside a operation parallelised by the .NET Task Parallel libraries 4. The MKL calls are done via P/Invoke with a typical declaration of
[DllImport(MKL_DLL, CallingConvention = CallingConvention.Cdecl, ExactSpelling = true, SetLastError = false)]
public static extern void cblas_dgemv(CBLAS_ORDER order, CBLAS_TRANSPOSE TransA, int M, int N, double alpha, double A, int lda, double X, int incX, double beta, double Y, int incY);
One observation that seems reasonably solid is that if the calculation is run in a single thread first and subsequentially run multithreaded then the crash doesn't occur. If run multithreaded first time it's occurs pretty reliably.
I've tried turning on the Microsoft managed debug assistants to for garbage collections as we cross the P/Invoke boundary but these seem to force the calculation to become single threaded so the crash stops (although the slowdown is so dramatic I haven't been able to do enough samples to be sure it's gone rather than just become rare).
Am I doing something obviously wrong? Any ideas for where to look to track down the problem?
Thanks in advance