Unpack result of Intel® MKL FFT to align with Matlab

By Jing Xu, Jonghak Kim, Published: 11/29/2017, Last Updated: 05/22/2019



In this article, how to unpack the result of Intel® Math Kernel Library (Intel® MKL) Fast Fourier Transform (FFT) routines will be introduced. This unpacked result will be compared to that of Matlab*.


Introduction of Fourier Transform

Fourier transform changes a signal from time-domain into frequency-domain by decomposing the signal of time into the frequencies that make it up. It is a complex-valued based function of frequency. The absolute Norm 2 value represents the amount of that frequency in the original signal, while the complex argument is the phase offset of the basic sine in that frequency. Figure [img:ft_intro] illustrates Fourier transform and its result.

Fourier Transform

Curve shown in the front-left panel is the one-dimention signal that needed to be analyzed. According to the theory, all signals can be decomposed into a combination of basic sine signals using the following formula:

Thus, the original curve is decomposed into a set of basic sine signals, shown in the figure as sine signals. Each sine signal is in its specific frequency. These sine signals are arranged according to their frequencies in ascending order. Projecting these sine signals to the front-right panel yields the spectrum of these component sine signals. The horizontal axis shows component frequencies, the vertical axis represents peaks in the frequency domain at each individual component frequency.


Fourier transform has several important properties. Understanding them helps to make use of Intel® MKL functions.

  • Complex domain
  1. transform yields results made from complex values. They can be used to calculate the amplitude and phase of the corresponding component sine signal.
  • Conjugate
  1. in mathematics refers to complex conjugate of a complex number, who is the number with an equal real part and imaginary part equal in magnitude but opposite in sign. Results of Fourier transform holds the conjugate property. Nearly half of the results are conjugate numbers of another half. Thus, to save storage in memory when computing with Intel®MKL, only almost half of the results of the Fourier transform are stored.

Introduction of Fast Fourier Transform

A fast Fourier transform (FFT) is an algorithm that samples a signal over a period of time (or space) and divides it into its frequency components. For more detailed information, please refer to https://en.wikipedia.org/wiki/Fast_Fourier_transform.


Fast Fourier Transform with Intel® Math Kernel Library

As FFT are used heavily in the technical and image processing field to perform various types of computational tasks, FFT is already a part of Intel® MKL. They are known as Discrete Fourier transform (DFT) routines. Intel® MKL also provide an interface named as DFTI in its manual document. These routines are highly optimized for Intel architecture-based platforms, i.e. IA-32-based platforms, Intel® 64-based platforms, and XEON Phi processors.

The DFT routines in Intel® MKL provide up to 7-dimentional transform, and are optimized to speed up calculation by making effective use of caches and processor architectures. DFT routines also provide a number of options that could affect calculation. They had been summarized into a table which can be found in FFT Functions. Users can define precision, dimension, scale factor, number of threads, and so on by setting proper values according to the table.



Steps to Use Intel® Math Kernel Library FFT routines

FFT (Fast Fourier Transform) functions are for single-processor or shared-memory systems. Calling of FFT functions in Intel® MKL involves 5 step.

  1. Calling DftiCreateDescriptor to allocate a new descriptor for FFT calculation.
  2. If options/switches that should be set for the calculation, DftiSetValue should be called.
  3. Calling DftiCommitDescriptor to confirm settings and make descriptor ready for the calculation. Any changes to the descriptor won’t take effect until DftiCommitDescriptor is called.
  4. The calculation itself is completed by calling DftiComputeForward/DftiComputeBackward. If configuration parameters keep the same, these function calls can be run as many times as needed. Configuration parameters can be changed by calling DftiSetValue, followed by DftiCommitDescriptor to make changes take effect.
  5. Once calculation(s) is/are done, DftiFreeDescriptor should be called to deallocate the descriptor. Memories that are occupied by the descriptor will be returned to the operating system internally.


Packed format

Result of the forward transform of real data is a conjugate-even sequence. Due to the symmetry property, only a part of the complex-valued sequence is stored in memory. The DFTI_PACKED_FORMAT configuration parameter defines how the data is packed. Possible values of DFTI_PACKED_FORMAT depend on the values of the DFTI_CONJUGATE_E
VEN_STORAGE configuration parameter.

DFTI_CONJUGATE_EVEN_STORAGE defines how input and output data are stored during FFT calculation. It can be any of these values: DFTI_COMPLEX_REAL (default) and DFTI_COMPLEX_COMPLEX. In both cases, the conjugate-even symmetry of the data enables storing only about a half of the whole mathematical result, so that one part of it can be directly referenced in the memory while the other part can be reconstructed depending on the selected storage configuration.

_REAL, DFTI_PACKED_FORMAT can be DFTI_CCS_FORMAT, DFTI_PACK_FORMAT, or DFTI_PERM_FORMAT. These formats support 1d- and 2d-transforms only. CCE format will be described in detail in this article. More detailed information of CCS, PACK and PERM format can be found at https://software.Intel®.com/en-us/mkl-developer-reference-c-dfti-packed-format.

How to unpack CCE format

The storage format of complex pairs defers according to matrix dimensionality and size, which will be described below. In the following section, d indicates real numbers, v indicates complex numbers.

One-dimensional transform

As Fourier transform theory indicates, the first element of the Fourier transform result is the DC value. Thus, elements from the 2nd are conjugated for unpacking.


Two-dimensional transform (even size)

For two-dimensional data with even size, unpacking result data would be shown as the following.


Two-dimensional transform (odd size)

For two-dimensional data with odd size, unpacking result data would be shown as the following.


Sample code

You can find example codes here, Intel® MKL FFT example codes.


#include <mkl.h>
#include "mkl_dfti.h"
#include <stdio.h>
#include <math.h>

#define PI 3.1415926

int main(int argc, char** argv)
    int i = 0;
    const int LEN = 10;
    double dx = 1.0 / LEN;
    double *x_real = (double*)mkl_malloc(LEN*sizeof(double), 64);
    MKL_Complex16 *x_cmplx = (MKL_Complex16*)mkl_malloc(LEN*sizeof(MKL_Complex16), 64);
    DFTI_DESCRIPTOR_HANDLE my_desc_handle;
    MKL_LONG status;

    printf("Dat: ");
    for(i = 0; i < LEN; i++)
        x_real[i] = 2 + cos(2*PI*5*dx*i+30.0/180*PI) + 5*cos(2*PI*3*dx*i-120.0/180*PI) + 10*
        printf("%.2f\t", x_real[i]);
    status = DftiCreateDescriptor(&my_desc_handle, DFTI_DOUBLE, DFTI_REAL, 1, (MKL_LONG)LEN);
    status = DftiSetValue(my_desc_handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
    status = DftiSetValue(my_desc_handle, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);
    status = DftiCommitDescriptor(my_desc_handle);
    status = DftiComputeForward(my_desc_handle, x_real, x_cmplx);
    status = DftiFreeDescriptor(&my_desc_handle);
    for (i = LEN/2+1; i < LEN; i++)
        x_cmplx[i].real = x_cmplx[LEN-i].real;
        x_cmplx[i].imag = (-1) * x_cmplx[LEN-i].imag;
    printf("REL: ");
    for (i = 0; i < LEN; i++)
        printf("%.2f\t", x_cmplx[i].real);
    printf("IMG: ");
    for (i = 0; i < LEN; i++)
        printf("%.2f\t", x_cmplx[i].imag);
    printf("AMP: ");
    for (i = 0; i < LEN; i++)
        float len = sqrt(x_cmplx[i].real*x_cmplx[i].real+x_cmplx[i].imag*x_cmplx[i].imag);
        printf("%.2f\t", len);
    printf("ANG: ");
    for (i = 0; i < LEN; i++)
        float len = atan2(x_cmplx[i].imag, x_cmplx[i].real)*180/PI;
        printf("%.2f\t", len);

    return 0;

Result of the sample code is:
























































Compareing with Matlab* result

The following is the result calculated by Matlab*. By comparing these two results, we can find that these two results are the same.

ans =

  Columns 1 through 5
  20.0000 + 0.0000i  -0.0000 - 0.0000i   0.0000 - 0.0000i  12.5000 -64.9519i  -0.0000 + 0.0000i

  Columns 6 through 10
   8.6602 + 0.0000i  -0.0000 - 0.0000i  12.5000 +64.9519i   0.0000 + 0.0000i  -0.0000 + 0.0000i


In this article, how to unpack result of Intel® MKL Fast Fourier Transform routines in CCE format is introduced. For other formats like CCS, PACKED, users can refer to Intel® MKL developer reference for more detailed information.

Also, this unpacked result is compared to that of Matlab*, and confirmed that both Intel® MKL and Matlab* yield the same results.




Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804