MKL FFT crashes when multi-threaded and for non-power 2 size

MKL FFT crashes when multi-threaded and for non-power 2 size

   BUG:
MKL FFT crashes (Segmentation faults) for certain FFT sizes (for example 2496, when using complex numbers, )

crash observed with cpp_studio_xe_2013_update1_intel64.tgz
when compiled with icc and with gcc.
crash not observed when compiled with icc and -mkl=sequentail

I am running it on  a Intel® Xeon® Processor E5-2670 (8 cores per CPU)

for(unsigned nrOfSamples = 1;nrOfSamples <10000;++nrOfSamples );
   {
        std::cout << "nrOfSamples " << nrOfSamples << std::endl;
        fflush(NULL);

        MKL_LONG status;
        DFTI_DESCRIPTOR_HANDLE _fft;

        // Create the MKL FFT descriptor
        status = DftiCreateDescriptor(&_fft, DFTI_SINGLE, DFTI_COMPLEX,1, nrOfSamples);
        checkStatus(status);

        // The FFT is now fully specified
        status = DftiCommitDescriptor(_fft);
        checkStatus(status);

        // allocate buffer (make buffer too big, just to be sure that inplace FFT does not go beyond allocate memory
        std::complex<float> *x = new std::complex<float>[nrOfSamples*100];

        // Calculate forward FFT
        status = DftiComputeForward(_fft, x);
        checkStatus(status);

        // cleanup
        delete[] x;
        status = DftiFreeDescriptor(&_fft);
        checkStatus(status);
    }

-------------------------------------------------------------------

installed : cpp_studio_xe_2013_update1_intel64.tgz
OS : opensuse 12.2
-------------------------------------------------------------------
ICC compiler:crash observed

icc link options : -L$(MKLROOT)/lib/intel64 -lmkl_rt -lpthread -lm
compile options -mkl=parallel : crash ( Signal name : SIGSEGV, Signal meaning : Segmentation fault)

Note : compile options -mkl=sequentail : no crash observed

-------------------------------------------------------------------
GCC compiler: 4.7.1 : also crashes observed
-------------------------------------------------------------------

9 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

yes, this example is crushed. we will check more carefully what's going wrong with this code.

What we have discovered - the problem is caused by AVX code. as a temporarily work-around please try to turn off AVX branch be setting, as an example, MKL_CBWR=SSE4_2
I checked this approach on win7 and it works on my side.
--Gennady

Gennady,

Thnax for th equick response.
setting SSE4.2 worked,

Now I could run more tests, and now the next example crashes for DFTI_COMPLEX_COMPLEX (not for DFTI_COMPLEX_REAL
(crash happens typically at nrOfTransforms 3, nrOfSamples 2658):

for (unsigned nrOfTransforms = 1; nrOfTransforms <= 5; ++nrOfTransforms)
{
for (unsigned nrOfSamples = 1; nrOfSamples <= 10000; ++nrOfSamples)
{
std::cout << "Test 3c, Forward FFT Real-2-complex out-of-place nrOfTransforms " << nrOfTransforms << ", nrOfSamples " << nrOfSamples << std::endl;

MKL_LONG status;
DFTI_DESCRIPTOR_HANDLE _fft;

// allocate buffer (make buffer too big, just to be sure that inplace FFT does not go beyond allocate memory
float *x_in = new float [nrOfSamples*nrOfTransforms*10];
std::complex *x_out = new std::complex[nrOfSamples*nrOfTransforms*10];

status = DftiCreateDescriptor( &_fft, DFTI_SINGLE, DFTI_REAL, 1, nrOfSamples);
checkStatus(status);

status = DftiSetValue(_fft, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
checkStatus(status);

// Specify the number of transforms
status = DftiSetValue(_fft, DFTI_NUMBER_OF_TRANSFORMS, nrOfTransforms);
checkStatus(status);

//status = DftiSetValue(_fft, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_REAL);
status = DftiSetValue(_fft, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);
checkStatus(status);

// The FFT is now fully specified
status = DftiCommitDescriptor( _fft );

// Calculate forward FFT
status = DftiComputeForward(_fft, x_in, x_out);
checkStatus(status);

// cleanup
delete[] x_in, x_out;
status = DftiFreeDescriptor(&_fft);
checkStatus(status);
}
}

To specify how the multiple input and output vectors are laid out, you should do something like this before committing the descriptor:
DftiSetValue(_fft, DFTI_INPUT_DISTANCE, nrOfSamples);
DftiSetValue(_fft, DFTI_OUTPUT_DISTANCE, nrOfSamples/2+1);

This would tell the compute function that
1) real input element n of vector k is located in x_in[ n + nrOfSamples*k] (here n=0...nrOfSamples-1)
2) complex output element n of vector k is located in x_out[ n + (nrOfSamples/2+1)*k] (here n=0...nrOfSamples/2)

Thanks
Dima

Dima,

you are correct that one should specify the input/output distance,

non-the-less the example code still crashes at the same position...

Dirk-Jan

Dirk-Jan

Dirk-Jan,
I have reproduced the problem and I can suggest nothing but sequential FFT.
In MKL 11.0.1 there is DFTI_THREAD_LIMIT configuration setting, which should be set to 1 before DftiCommitDescriptor.
Thanks
Dima

Any idea when a fix is planned ? for which version ?

Dirk-Jan

Dirk-Jan, please check the example with the latest 11.0 update 5. I don't see the problem now.

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi