MKL FFT crashes when multi-threaded and for non-power 2 size

MKL FFT crashes when multi-threaded and for non-power 2 size

   BUG:
MKL FFT crashes (Segmentation faults) for certain FFT sizes (for example 2496, when using complex numbers, )

crash observed with cpp_studio_xe_2013_update1_intel64.tgz
when compiled with icc and with gcc.
crash not observed when compiled with icc and -mkl=sequentail

I am running it on  a Intel® Xeon® Processor E5-2670 (8 cores per CPU)

for(unsigned nrOfSamples = 1;nrOfSamples <10000;++nrOfSamples );
   {
        std::cout << "nrOfSamples " << nrOfSamples << std::endl;
        fflush(NULL);

        MKL_LONG status;
        DFTI_DESCRIPTOR_HANDLE _fft;

        // Create the MKL FFT descriptor
        status = DftiCreateDescriptor(&_fft, DFTI_SINGLE, DFTI_COMPLEX,1, nrOfSamples);
        checkStatus(status);

        // The FFT is now fully specified
        status = DftiCommitDescriptor(_fft);
        checkStatus(status);

        // allocate buffer (make buffer too big, just to be sure that inplace FFT does not go beyond allocate memory
        std::complex<float> *x = new std::complex<float>[nrOfSamples*100];

        // Calculate forward FFT
        status = DftiComputeForward(_fft, x);
        checkStatus(status);

        // cleanup
        delete[] x;
        status = DftiFreeDescriptor(&_fft);
        checkStatus(status);
    }

-------------------------------------------------------------------

installed : cpp_studio_xe_2013_update1_intel64.tgz
OS : opensuse 12.2
-------------------------------------------------------------------
ICC compiler:crash observed

icc link options : -L$(MKLROOT)/lib/intel64 -lmkl_rt -lpthread -lm
compile options -mkl=parallel : crash ( Signal name : SIGSEGV, Signal meaning : Segmentation fault)

Note : compile options -mkl=sequentail : no crash observed

-------------------------------------------------------------------
GCC compiler: 4.7.1 : also crashes observed
-------------------------------------------------------------------

9 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

yes, this example is crushed. we will check more carefully what's going wrong with this code.

What we have discovered - the problem is caused by AVX code. as a temporarily work-around please try to turn off AVX branch be setting, as an example, MKL_CBWR=SSE4_2
I checked this approach on win7 and it works on my side.
--Gennady

Gennady,

Thnax for th equick response.
setting SSE4.2 worked,

Now I could run more tests, and now the next example crashes for DFTI_COMPLEX_COMPLEX (not for DFTI_COMPLEX_REAL
(crash happens typically at nrOfTransforms 3, nrOfSamples 2658):

for (unsigned nrOfTransforms = 1; nrOfTransforms <= 5; ++nrOfTransforms)
{
for (unsigned nrOfSamples = 1; nrOfSamples <= 10000; ++nrOfSamples)
{
std::cout << "Test 3c, Forward FFT Real-2-complex out-of-place nrOfTransforms " << nrOfTransforms << ", nrOfSamples " << nrOfSamples << std::endl;

MKL_LONG status;
DFTI_DESCRIPTOR_HANDLE _fft;

// allocate buffer (make buffer too big, just to be sure that inplace FFT does not go beyond allocate memory
float *x_in = new float [nrOfSamples*nrOfTransforms*10];
std::complex *x_out = new std::complex[nrOfSamples*nrOfTransforms*10];

status = DftiCreateDescriptor( &_fft, DFTI_SINGLE, DFTI_REAL, 1, nrOfSamples);
checkStatus(status);

status = DftiSetValue(_fft, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
checkStatus(status);

// Specify the number of transforms
status = DftiSetValue(_fft, DFTI_NUMBER_OF_TRANSFORMS, nrOfTransforms);
checkStatus(status);

//status = DftiSetValue(_fft, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_REAL);
status = DftiSetValue(_fft, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);
checkStatus(status);

// The FFT is now fully specified
status = DftiCommitDescriptor( _fft );

// Calculate forward FFT
status = DftiComputeForward(_fft, x_in, x_out);
checkStatus(status);

// cleanup
delete[] x_in, x_out;
status = DftiFreeDescriptor(&_fft);
checkStatus(status);
}
}

Dmitry Baksheev (Intel)的头像

To specify how the multiple input and output vectors are laid out, you should do something like this before committing the descriptor:
DftiSetValue(_fft, DFTI_INPUT_DISTANCE, nrOfSamples);
DftiSetValue(_fft, DFTI_OUTPUT_DISTANCE, nrOfSamples/2+1);

This would tell the compute function that
1) real input element n of vector k is located in x_in[ n + nrOfSamples*k] (here n=0...nrOfSamples-1)
2) complex output element n of vector k is located in x_out[ n + (nrOfSamples/2+1)*k] (here n=0...nrOfSamples/2)

Thanks
Dima

Dima,

you are correct that one should specify the input/output distance,

non-the-less the example code still crashes at the same position...

Dirk-Jan

Dirk-Jan

Dmitry Baksheev (Intel)的头像

Dirk-Jan,
I have reproduced the problem and I can suggest nothing but sequential FFT.
In MKL 11.0.1 there is DFTI_THREAD_LIMIT configuration setting, which should be set to 1 before DftiCommitDescriptor.
Thanks
Dima

Any idea when a fix is planned ? for which version ?

Dirk-Jan

Dirk-Jan, please check the example with the latest 11.0 update 5. I don't see the problem now.

登陆并发表评论。