Developer Reference

  • 0.10
  • 10/21/2020
  • Public Content
Contents

DFTI_COMPLEX_STORAGE, DFTI_REAL_STORAGE, DFTI_CONJUGATE_EVEN_STORAGE

Depending on the value of the
DFTI_FORWARD_DOMAIN
configuration parameter, the implementation of FFT supports several storage schemes for input and output data (see document [3] for the rationale behind the definition of the storage schemes). The data elements are placed within contiguous memory blocks, defined with generalized strides (see
DFTI_INPUT_STRIDES, DFTI_OUTPUT_STRIDES
). For multiple transforms, all sets of data should be located within the same memory block, and the data sets should be placed at the same
distance
from each other (see
DFTI_NUMBER_OF TRANSFORMS
and
DFTI_INPUT DISTANCE, DFTI_OUTPUT_DISTANCE
).
Avoid
setting up multidimensional arrays with lists of pointers to one-dimensional arrays. Instead use a one-dimensional array with the explicit indexing to access the data elements.
FFT Examples demonstrate the usage of storage formats.

DFTI_COMPLEX_STORAGE: storage schemes for a complex domain

For the
DFTI_COMPLEX
forward domain, both input and output sequences belong to a complex domain. In this case, the configuration parameter
DFTI_COMPLEX_STORAGE
can have one of the two values:
DFTI_COMPLEX_COMPLEX
(default) or
DFTI_REAL_REAL
.
In the
Intel® oneAPI Math Kernel Library
FFT interface, storage schemes for a forward complex domain and the respective backward complex domain are the same.
With
DFTI_COMPLEX_COMPLEX
storage, complex-valued data sequences are referenced by a single complex parameter (array)
AZ
so that a complex-valued element
z
k
1
,
k
2
, ...,
k
d
of the
m
-th
d
-dimensional sequence is located at
AZ[m*distance + stride
0
+ k
1
*stride
1
+ k
2
*stride
2
+ ... k
d
*stride
d
]
as a structure consisting of the real and imaginary parts.
This code illustrates the use of the
DFTI_COMPLEX_COMPLEX
storage:
complex *AZ = malloc( N1*N2*N3*M * sizeof(AZ[0]) ); MKL_LONG ios[4], iodist; // input/output strides and distance ... // on input: Z{k1,k2,k3,m} // = AZ[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ] status = DftiComputeForward( desc, AZ ); // on output: Z{k1,k2,k3,m} // = AZ[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]
With the
DFTI_REAL_REAL
storage, complex-valued data sequences are referenced by two real parameters
AR
and
AI
so that a complex-valued element
z
k
1
,
k
2
, ...,
k
d
of the
m
-th sequence is computed as
AR[m*distance + stride
0
+ k
1
*stride
1
+ k
2
*stride
2
+ ... k
d
*stride
d
]
+ √(-1)
* AI[m*distance + stride
0
+ k
1
*stride
1
+ k
2
*stride
2
+ ... k
d
*stride
d
]
.
This code illustrates the use of the
DFTI_REAL_REAL
storage:
float *AR = malloc( N1*N2*N3*M * sizeof(AR[0]) ); float *AI = malloc( N1*N2*N3*M * sizeof(AI[0]) ); MKL_LONG ios[4], iodist; // input/output strides and distance ... // on input: Z{k1,…,kd,m} // = AR[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ] // + I*AI[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ] status = DftiComputeForward( desc, AR, AI ); // on output: Z{k1,…,kd,m} // = AR[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ] // + I*AI[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]

DFTI_REAL_STORAGE: storage schemes for a real domain

The
Intel® oneAPI Math Kernel Library
FFT interface supports only one configuration value for this storage scheme:
DFTI_REAL_REAL
. With the
DFTI_REAL_REAL
storage, real-valued data sequences in a real domain are referenced by one real parameter
AR
so that real-valued element of the
m
-th sequence is located as
AR[m*distance + stride
0
+ k
1
*stride
1
+ k
2
*stride
2
+ ... k
d
*stride
d
]
.

DFTI_CONJUGATE_EVEN_STORAGE: storage scheme for a conjugate-even domain

The
Intel® oneAPI Math Kernel Library
FFT interface supports two configuration values for this parameter:
DFTI_COMPLEX_REAL
(default) and
DFTI_COMPLEX_COMPLEX
. In both cases, the conjugate-even symmetry of the data enables storing only about a half of the whole mathematical result, so that one part of it can be directly referenced in the memory while the other part can be reconstructed depending on the selected storage configuration.
With the
DFTI_COMPLEX_REAL
storage, the complex-valued data sequences in the conjugate-even domain can be reconstructed as described in section
DFTI_PACKED_FORMAT
.
Although
DFTI_CONJUGATE_EVEN_STORAGE=DFTI_COMPLEX_REAL
is the default setting for the
DFTI_REAL
forward domain, avoid using this storage scheme. This storage scheme is deprecated for one-dimensional transforms and the default will change in a future release. For two-dimensional transforms, this storage scheme is deprecated, support will be removed, and the default will change in a future release.
With the
DFTI_COMPLEX_COMPLEX
storage, the complex-valued data sequences in the conjugate-even domain are referenced by one complex parameter
AZ
so that a complex-valued element
z
k
1
,
k
2
, ...,
k
d
of the
m
-th sequence can be referenced or reconstructed as described below.
Use the
DFTI_CONJUGATE_EVEN_STORAGE=DFTI_COMPLEX_COMPLEX
configuration setting, which will become the default in future. This setting is supported for all transform ranks, provides a uniform pattern for reconstructing the entire conjugate-even sequence from the part of it that is actually stored, and is compatible with data layouts supported by other FFT libraries, such as FFTW. This storage scheme disregards the setting of
DFTI_PACKED_FORMAT
.
Consider a d-dimensional real-to-complex transform
Because the input sequence
R
is real-valued, the mathematical result
Z
has conjugate-even symmetry:
z
k
1
,
k
2
, ...,
k
d
=
conjugate
(
z
N
1
-
k
1
,
N
2
-
k
2
, ...,
N
d
-
k
d
),
where index arithmetic is performed modulo the length of the respective dimension. Obviously, the first element of the result is real-valued:
z
0, 0, ..., 0
=
conjugate
(
z
0, 0, ..., 0
).
For dimensions with even lengths, some of the other elements are real-valued too. For example, if
N
s
is even,
z
0, 0, ...,
N
s
/2, 0, ..., 0
=
conjugate
(
z
0, 0, ...,
N
s
/2, 0, ..., 0
).
With the conjugate-even symmetry, approximately a half of the result suffices to fully reconstruct it. For an arbitrary dimension
h
, it suffices to store elements
z
k
1
, ...,
k
h
, ...,
k
d
for the following indices:
  • k
    h
    = 0, ..., [
    N
    h
    /2]
  • k
    i
    = 0, …
    , N
    i
    -1, where
    i
    = 1, …
    , d
    and
    i
    h
The symmetry property enables reconstructing the remaining elements: for
k
h
= [
N
h
/2] + 1, ... ,
N
h
- 1. In the
Intel® oneAPI Math Kernel Library
FFT interface, the halved dimension is
the last dimension
.
The following code illustrates usage of the
DFTI_COMPLEX_COMPLEX
storage for a conjugate-even domain:
float *AR = malloc( N1*N2*M * sizeof(AR[0]) ); complex *AZ = malloc( N1*(N2/2+1)*M * sizeof(AZ[0]) ); MKL_LONG is[3], os[3], idist, odist; // input and output strides and distance ... // on input: R{k1,k2,m} // = AR[is[0] + k1*is[1] + k2*is[2] + m*idist] status = DftiComputeForward( desc, R, C ); // on output: // for k2=0…N2/2: Z{k1,k2,m} = AZ[os[0]+k1*os[1]+k2*os[2]+m*odist] // for k2=N2/2+1…N2-1: Z{k1,k2,m} = conj(AZ[os[0]+(N1-k1)%N1*os[1] // +(N2-k2)%N2*os[2]+m*odist])
For the backward transform, the input and output parameters and layouts exchange roles: set the strides describing the layout in the backward/forward domain as input/output strides, respectively. For example:
... status = DftiSetValue( desc, DFTI_INPUT_STRIDES, fwd_domain_strides ); status = DftiSetValue( desc, DFTI_OUTPUT_STRIDES, bwd_domain_strides ); status = DftiCommitDescriptor( desc ); status = DftiComputeForward( desc, ... ); ... status = DftiSetValue( desc, DFTI_INPUT_STRIDES, bwd_domain_strides ); status = DftiSetValue( desc, DFTI_OUTPUT_STRIDES, fwd_domain_strides ); status = DftiCommitDescriptor( desc ); status = DftiComputeBackward( desc, ... );
For in-place transforms, ensure the first element of the input data has the same address as the first element of the output data for each dimension.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804