We are curious if we are using DFTI_NUMBER_OF_USER_THREADS correctly.
We use the MKL FFT library in our application: the application is thread rich, but we don't use OpenMP. We simply create all the POSIX (system level) threads ourself. Among all these threads, we want to share the MKL DFTI descriptors. The Descriptor, if we assume the model of FFTW or any other FFT library, typically computes a twiddle table based on the length of the FFT. This knowledge is "encapsulated" inside the descriptor.
Our hope is that by sharing descriptors among threads that we will reduce memory size (i,.e, share the twiddle tables). We seem to be successfully using MKL. Until recently. At one point, we do a very large FFT (16Meg) and everything fails. We believe (after some look in the forums here) that setting DFTI_NUMBER_OF_USER_THREADS to some reasonable value like 16 (it was 1 before) is the right thing to do, but we're not sure. It seems to fix the problem (setting it to 16), but we wanted to verify: Given the scenario described above, (we create our own threads and want to share the Descriptors among several non-realted threads), is this correct?
Now, our applications tend to be very FFT heavy: some threads on the front-end use an FFT, the main processing uses threads in a work-crew/map-reduce paradigm, and the back-end processing uses FFTs. In other words, all sorts of threads from all over the application can be sharing the Descriptors, and there is no known "apriori" limit. We don't have any insight how setting DFTI_NUMBER_OF_USER_THREADS to 16 allows the multiple threads to reuse it (in FFTW, there's no notion of this). Does each thread "register" with the descriptor? Is there thread-local data with the descriptor? Once a thread has used the descriptor, can only that thread reuse it in that way? Or can I keep re-using the descriptor in multiple threads? (I.e., setting the DFTI_NUMBER_OF_USER_THREADS to 16, have some 16 threads use it, then another 16 threads, then a different 16 threads, or do the same threads have to reuse it?).
If anyone knows about how DFT_NUMBER_OF_USER_THREADS works with the descriptor, it would be very helpful. We think this fixes out problem, but we'd like to know if we have the right solution: once a thread has used a "sharing" slot, can no other thread use it?
Thanks in advance. I am happy to supply some code showing how we use it. I also want to thank the Intel Forums for helping us find the DFTI_NUMBER_OF_USER_THREADS in the first place!