Issue introduced in MKL 11.0 Update 4 (64-bit Linux only)

Issue introduced in MKL 11.0 Update 4 (64-bit Linux only)

Ritratto di vasci_

After installing MKL 11.0 Update 4 over MKL 11.0 Update 2 on Linux our QA process is SIGSEGV at...

#0  0x00002aaab745874a in mkl_serv_malloc ()
 #1  0x00002aaab7f6bbcc in mkl_blas_mc3_dgemm_get_bufs ()
 #2  0x00002aaab6ae8a99 in mkl_blas_mc3_xdgemm_par ()
#3  0x00002aaab4c2cf74 in mkl_blas_xdgemm_par ()
 #4  0x00002aaab4b81ecb in mkl_blas_dgemm_2d_bsrc ()
 #5  0x00002aaab4b7b489 in gemm_host ()
 #6  0x00002aaabb92b4f3 in L_kmp_invoke_pass_parms ()
   from /opt/intel/composer_xe_2013.4.183/compiler/lib/intel64/libiomp5.so

100% reproducible in certain cases.

Reverting to MKL Update 2 solves the issue.

Seems to happen after many iterations , and many threads computation created/destroyed.

Note we are running multiple (boost) threads that call MKL. We call MKL_Thread_Free_Buffers at the completion of each thread.

15 post / 0 new
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione
Ritratto di Gennady Fedorov (Intel)

Andrew, How can we reproduce the issue?

Ritratto di vasci_

The only way to reproduce is for Intel to have a copy of our software and an evaluation license from us. I will pursue this through premier support.

Ritratto di Gennady Fedorov (Intel)

ok. we will take this issue as soon as you will submit it there

Ritratto di vasci_

OK, I created a ticket, but I said to reproduce Intel will have to download 400MB installer and license file but no response to that question.

No doubt, this will be a painful process for everyone to reproduce,but I cannot use MKL 11.0 Update 4 until this is resolved.

Ritratto di vasci_

Premier support issue # 697704

Ritratto di Tim Prince

I hope you put some of the missing details in your issue submission.

I don't see any clues as to which checklists you have followed; there are several good ones, including

http://software.intel.com/en-us/articles/determining-root-cause-of-sigse...

I can't even guess whether you explored simple remedies such as increasing stack (both global and thread stack) or using heap options.

Ritratto di Sergey Kostrov

>>...Seems to happen after many iterations...

Do you have that SIGSEGV error after all threads released memory and completed ( destroyed )? Or in the middle, or at the end, of processing?

This is what MSDN says about that very obsolete signal-error processing constant:
...
SIGSEGV
Illegal storage access. The default action terminates the calling program.
...

Ritratto di vasci_

Not sure what you mean by "obsolete"? On Linux, signals such as SIGSEGV are a fundamental part of the OS. A segementation violation can be caused by accessing an address that is illegal. Such as dereferencing a NULL pointer.

Ritratto di vasci_

Quote:

TimP (Intel) wrote:

I hope you put some of the missing details in your issue submission.

I don't see any clues as to which checklists you have followed; there are several good ones, including

http://software.intel.com/en-us/articles/determining-root-cause-of-sigse...

I can't even guess whether you explored simple remedies such as increasing stack (both global and thread stack) or using heap options.

The details are that MKL 11 Update 2 passes 300-400 QA tests without failure, MKL Update 4 fails 6+ of those tests with a segmentation violation inside MKL, reproducibly.  I have supplied premier support with a reproducible example. I will update this thread with the results.

Ritratto di vasci_

Currently I am having to give the Premier support person a tutorial in GDB.

But heres a clue for anyone at Intel who cares about this issue.

Does this look like a race condition in MKL?

Thread 1 is crashing with a segmentation violation in....

#11 0x00002aaab75d40da in mkl_serv_malloc ()
   from /opt/intel/composer_xe_2013.4.183/mkl/lib/intel64/libmkl_core.so
#12 0x00002b93a4980aec in mkl_blas_mc3_dgemm_get_bufs ()

Thread 2 is calling

#0  0x00002aaab75dfe00 in mkl_blas_dgemm_set_blks_size ()

#1  0x00002aaab66135d9 in gemm_host ()

Ritratto di Shane Story (Intel)

Hi Andrew, we definitely care and the local MKL team is now looking into the issue. We will report back once we have more information. -Shane

Ritratto di vasci_

 I just installed MKL 11 Update 5 and the problem has gone away....looks like someone found and fixed the isssue....

Ritratto di vasci_

To close the loop on this issue. Intel premier support confirmed there was an issue in Update 4 and it was fixed in Update 5. Thanks guys!

Ritratto di Gennady Fedorov (Intel)

we are always welcome to help you :)

Accedere per lasciare un commento.