Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 11/07/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

DFTI_COMPLEX_STORAGE, DFTI_REAL_STORAGE, DFTI_CONJUGATE_EVEN_STORAGE

Depending on the value of the DFTI_FORWARD_DOMAIN configuration parameter, the implementation of FFT supports several storage schemes for input and output data (see document [3] for the rationale behind the definition of the storage schemes). The data elements are placed within contiguous memory blocks, defined with generalized strides (see DFTI_INPUT_STRIDES, DFTI_OUTPUT_STRIDES). For multiple transforms, all sets of data should be located within the same memory block, and the data sets should be placed at the same distance from each other (see DFTI_NUMBER_OF TRANSFORMS and DFTI_INPUT DISTANCE, DFTI_OUTPUT_DISTANCE).

TIP:

Avoid setting up multidimensional arrays with lists of pointers to one-dimensional arrays. Instead use a one-dimensional array with the explicit indexing to access the data elements.

FFT Examples demonstrate the usage of storage formats.

DFTI_COMPLEX_STORAGE: storage schemes for a complex domain

For the DFTI_COMPLEX forward domain, both input and output sequences belong to a complex domain. In this case, the configuration parameter DFTI_COMPLEX_STORAGE can have one of the two values: DFTI_COMPLEX_COMPLEX (default) or DFTI_REAL_REAL.

NOTE:

In the Intel® oneAPI Math Kernel Library (oneMKL)FFT interface, storage schemes for a forward complex domain and the respective backward complex domain are the same.

With DFTI_COMPLEX_COMPLEX storage, complex-valued data sequences are referenced by a single complex parameter (array) AZ so that a complex-valued element zk1, k2, ..., kd of the m-th d-dimensional sequence is located at AZ[m*distance + stride0 + k1*stride1 + k2*stride2+ ... kd*strided] as a structure consisting of the real and imaginary parts.

This code illustrates the use of the DFTI_COMPLEX_COMPLEX storage:

complex *AZ = malloc( N1*N2*N3*M * sizeof(AZ[0]) );
MKL_LONG ios[4], iodist; // input/output strides and distance
...
// on input:  Z{k1,k2,k3,m}
// = AZ[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]
status = DftiComputeForward( desc, AZ );
// on output: Z{k1,k2,k3,m}
// = AZ[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]

With the DFTI_REAL_REAL storage, complex-valued data sequences are referenced by two real parameters AR and AI so that a complex-valued element zk1, k2, ..., kd of the m-th sequence is computed as AR[m*distance + stride0 + k1*stride1 + k2*stride2+ ... kd*strided] + √(-1) * AI[m*distance + stride0 + k1*stride1 + k2*stride2+ ... kd*strided].

This code illustrates the use of the DFTI_REAL_REAL storage:

float *AR = malloc( N1*N2*N3*M * sizeof(AR[0]) );
float *AI = malloc( N1*N2*N3*M * sizeof(AI[0]) );
MKL_LONG ios[4], iodist; // input/output strides and distance
...
// on input:  Z{k1,…,kd,m}
// =   AR[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]
// + I*AI[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]
status = DftiComputeForward( desc, AR, AI );
// on output: Z{k1,…,kd,m}  
// =   AR[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]
// + I*AI[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]

DFTI_REAL_STORAGE: storage schemes for a real domain

The Intel® oneAPI Math Kernel Library (oneMKL) FFT interface supports only one configuration value for this storage scheme: DFTI_REAL_REAL. With the DFTI_REAL_REAL storage, real-valued data sequences in a real domain are referenced by one real parameter AR so that real-valued element of the m-th sequence is located as AR[m*distance + stride0 + k1*stride1 + k2*stride2+ ... kd*strided].

DFTI_CONJUGATE_EVEN_STORAGE: storage scheme for a conjugate-even domain

The Intel® oneAPI Math Kernel Library (oneMKL) FFT interface supports two configuration values for this parameter: DFTI_COMPLEX_COMPLEX (default) and DFTI_COMPLEX_REAL (for 1D problems only). The conjugate-even symmetry of the data enables storing only about a half of the whole mathematical result, so that one part of it can be directly referenced in the memory while the other part can be reconstructed depending on the selected storage configuration.

With the DFTI_COMPLEX_COMPLEX storage, the complex-valued data sequences in the conjugate-even domain are referenced by one complex parameter AZ so that a complex-valued element zk1, k2, ..., kd of the m-th sequence can be referenced or reconstructed as described below.

Consider a d-dimensional real-to-complex transform:



Because the input sequence R is real-valued, the mathematical result Z has conjugate-even symmetry:

zk1, k2, ..., kd = conjugate (zN1-k1, N2-k2, ..., Nd-kd),

where index arithmetic is performed modulo the length of the respective dimension. Obviously, the first element of the result is real-valued:

z0, 0, ..., 0 = conjugate (z0, 0, ..., 0 ).

For dimensions with even lengths, some of the other elements are real-valued too. For example, if Ns is even,

z0, 0, ..., Ns /2, 0, ..., 0 = conjugate (z0, 0, ...,Ns /2, 0, ..., 0 ).

With the conjugate-even symmetry, approximately a half of the result suffices to fully reconstruct it. For an arbitrary dimension h, it suffices to store elements zk1, ...,kh, ..., kd for the following indices:

  • kh = 0, ..., ⌊Nh /2⌋
  • ki = 0, …, Ni -1, where i = 1, …, d and ih

The symmetry property enables reconstructing the remaining elements: for kh = ⌊Nh /2⌋ + 1, ... , Nh- 1. In the Intel® oneAPI Math Kernel Library (oneMKL) FFT interface, the halved dimension is the last dimension.

The following code illustrates usage of the DFTI_COMPLEX_COMPLEX storage for a conjugate-even domain:

float *AR = malloc( N1*N2*M * sizeof(AR[0]) );
complex *AZ = malloc( N1*(N2/2+1)*M * sizeof(AZ[0]) );
MKL_LONG is[3], os[3], idist, odist; // input and output strides and distance
...
// on input:  R{k1,k2,m}  
// = AR[is[0] + k1*is[1] + k2*is[2] + m*idist]
status = DftiComputeForward( desc, R, C );
// on output:
// for k2=0…N2/2:      Z{k1,k2,m} = AZ[os[0]+k1*os[1]+k2*os[2]+m*odist]
// for k2=N2/2+1…N2-1: Z{k1,k2,m} = conj(AZ[os[0]+(N1-k1)%N1*os[1]
//                                              +(N2-k2)%N2*os[2]+m*odist]) 

For the backward transform, the input and output parameters and layouts exchange roles: set the strides describing the layout in the backward/forward domain as input/output strides, respectively. For example:

...
status = DftiSetValue( desc, DFTI_INPUT_STRIDES,  fwd_domain_strides );
status = DftiSetValue( desc, DFTI_OUTPUT_STRIDES, bwd_domain_strides );
status = DftiCommitDescriptor( desc );
status = DftiComputeForward( desc, ... );
...
status = DftiSetValue( desc, DFTI_INPUT_STRIDES,  bwd_domain_strides );
status = DftiSetValue( desc, DFTI_OUTPUT_STRIDES, fwd_domain_strides );
status = DftiCommitDescriptor( desc );
status = DftiComputeBackward( desc, ... );
IMPORTANT:

For in-place transforms, ensure the first element of the input data has the same location as the first element of the output data for each dimension.

See Also