Implementation of Fast Fourier Transform for Image Processing in DirectX* 10

By Raghupathi R Muthyalampalli,

Published:09/01/2011   Last Updated:09/01/2011

Updated DirectX* 11 Sample Available!!!

This code sample has been updated for DirectX 11* and is available at Fast Fourier Transform for Image Processing in DirectX* 11. If you need a DirectX 10 version of this code sample, please download the sample on this page.

Download DirectX 10 Sample Code [ZIP | 6.36MB]


The Discrete Fourier Transform (DFT) is a specific form of Fourier analysis to convert one function (often in the time or spatial domain) into another (frequency domain). DFT is widely employed in signal processing and related fields to analyze frequencies contained in a sample signal, to solve partial differential equations, and to preform other operations such as convolutions. Fast Fourier Transform (FFT) is an efficient implementation of DFT and is used, apart from other fields, in digital image processing. Fast Fourier Transform is applied to convert an image from the image (spatial) domain to the frequency domain. Applying filters to images in frequency domain is computationally faster than to do the same in the image domain.

This document will not go into the theory of FFT but will address the implementation of the algorithm in converting a 2D image to the frequency domain and back to the image domain (Inverse FFT). Once the image is transformed into the frequency domain, filters can be applied to the image by convolutions. FFT turns the complicated convolution operations into simple multiplications. An inverse transform is then applied in the frequency domain to get the result of the convolution.

The sample application is developed in DirectX 10 and demonstrates the forward and inverse transforms of the image to the frequency domain and back. Applying the filters is pretty straight forward once the transform takes place, and hence is not discussed here.

The application uses ping-pong textures, a common technique used in many GPGPU (General-purpose computing on graphics processing units) applications. Ping-pong textures involve a pair of texture surfaces that a shader uses both as input as well as output data. The shader program uses one texture as input to do some computation and writes the output to the second texture. Subsequent iterations will swap the input and output textures (thus the input from previous iteration becomes the output in the current iteration and so on).

The next sections give a brief description of FFT and its advantages and application in image processing, implementation of the algorithm in DirectX10 and conclusions.

Fast Fourier Transform

Fourier Transform decomposes an image into its real and imaginary components which is a representation of the image in the frequency domain. If the input signal is an image then the number of frequencies in the frequency domain is equal to the number of pixels in the image or spatial domain. The inverse transform re-transforms the frequencies to the image in the spatial domain. The FFT and its inverse of a 2D image are given by the following equations:

Where f(m,n) is the pixel at coordinates (m, n), F(x,y) is the value of the image in the frequency domain corresponding to the coordinates x and y, M and N are the dimensions of the image.

As can be seen from the equation, a naïve implementation of this algorithm is very expensive. But the beauty of FFT is that it is separable, namely, the 2D transform can be done as 2 1D transforms as shown below (shown only in the horizontal direction) - one in the horizontal direction followed by the other in the vertical direction on the result of the horizontal transform. The end result is equivalent to performing the 2D transform in the frequency space.

The FFT that's implemented in the application here requires that the dimensions of the image are a power of two. Another interesting property of the FFT is that the transform of N points can be rewritten as the sum of two N/2 transforms (divide and conquer). This is important because some of the computations can be reused thus eliminating expensive operations.

The output of the Fourier Transform is a complex number and has a much greater range than the image in the spatial domain. Therefore to accurately represent these values, they are stored as floats. Furthermore, the dynamic range of the Fourier coefficients are too large to be displayed on the screen, and hence, these values are scaled (usually by dividing by Height*Width of the image) to bring them within the range of values that can be displayed [3].

The next section will go into the implementation details of the FFT algorithm and its inverse in a GPGPU application.

Implementation details

Since the FFT is a divide and conquer algorithm, the various steps can be implemented in multiple passes in a shader by rendering the result of each pass to a texture. These steps are called butterflies, because of the shape of the data-flow diagram. The FFT algorithm recursively breaks down a DFT of composite size N = n1n2 into n1 smaller transforms of size n2. These smaller DFTs are then combined with size-n1 butterflies, which themselves are DFTs of size n1 (performed n2 times on corresponding outputs of the sub-transforms) pre-multiplied by roots of unity (known as twiddle factors) [5].

The application employs four 2D textures - two for the ping-pong operations, one for the source data (for each pass) and the one for holding the indices and weights for performing the butterfly steps. The textures used for ping-ponging are marked as either a RenderTarget texture or a source depending on whether it is used as destination or source in the current pass. This enables the shader to use the output of the previous pass as input in the current pass. Following are the steps in implementing the FFT algorithm:

  • compute indices and weights for performing the butterfly operations
  • compute log2(Width) horizontal butterflies
  • compute log2(Height) vertical butterflies

If the height and width of the image are equal, then only one texture can be used for the butterfly values, for both the horizontal and vertical passes. Also note that in the vertical butterfly pass the input is the result of the horizontal butterfly pass. In each butterfly pass the current pixel is combined with another pixel using a complex multiply and add and written to the current location. In other words if the current pixel is a then,

a = a + wb

Where w is a complex number representing the weight and b is some other pixel. Each texel of the butterfly texture contains the locations of a, b and the value of w and passed to the shader by the application. Note that for simplicity, only gray scale images are considered in the current implementation. However, extending the algorithm to multiple color channels is straight forward and will only require more textures for additional channels.

Given below is the shader program that computes the butterflies in the horizontal direction.

float4 HButterflyPS( SCR_VS_OUTPUT In ) : SV_Target


float2 coord = In.Tex;

// get the scrambled indices, that give the location for a and 

// b for computing a+wb or a-wb  

float4 IndicesAndWeights = 


float2(coord.x, ButterflyPassNumber));

float2 Indices = IndicesAndWeights.rg;

float2 Weights =;

float2 a1, b1;

a1 = SourceImgTexture.Sample( g_samLinear, float2(Indices.x, 

coord.y) ).ra;

b1 = SourceImgTexture.Sample( g_samLinear, float2(Indices.y, 

coord.y) ).ra;

// perform a+w*b or a-w*b

// w*b (this is complex multiplication)

float2 res;

// w*b => (wr + i*wi)(br + i*bi) => [(wr*br-


res.r = Weights.r*b1.r - Weights.g*b1.g;

res.g = Weights.g*b1.r + Weights.r*b1.g;

a1 = a1 + res;

return a1.xxxy;


The shader looks-up the butterfly texture for indices of the two pixels to be combined and the weight and computes the result by performing the complex math a+wb or a-wb. To make matters simple, the sign is encoded as part of the weight and hence only the addition is performed in the shader program all the times. The result is returned with the real value in the first three components and the imaginary value in the fourth component.

To transform the image from frequency domain to the spatial domain the exact same operations are performed but on the frequencies. Since the frequencies are not in the range to be displayed on the screen they are scaled by a factor of 1/(Width*Height) as shown below.

float4 ScalePS( SCR_VS_OUTPUT In ) : SV_Target


float4 Image = SourceImgTexture.Sample( g_samLinear, In.Tex );

Image *= ScaleFactor;

return Image;



Where ScaleFactor is set by the application to 1/(Width*Height). Finally the result is displayed in the normal mode by concentrating the lowest frequencies towards the center. The below figures show a gray scale image and the result after applying FFT on it.

Figure 1: The original image displayed in gray scale

Figure 2: The normal mode of the image after applying the FFT

Future work and Conclusions

This DirectX 10 application not only implements the FFT and its inverse but is also intended to serve as a framework for implementing image processing algorithms. A CTexture class is implemented for handling 1D and 2D textures operations. The CTexture class supports various texture formats and can be easily extended to support 3D textures. Future work will focus on making the framework a plug-in architecture allowing developers to write image processing filters and "plug-in" their algorithms into the framework.

This document detailed the implementation of FFT and its inverse for transforming a 2D image from the spatial domain to the frequency domain and back. The advantage of representing an image in the frequency space is that performing some operations on the frequencies is much more efficient than doing the same in the image space. Many of the convolutions are just multiplications in the frequency domain (the computational cost in the image space is O(N2) vs. O(Nlog(N)) in the Fourier space for N points). This enables efficient implementations of very large convolutions in image processing and other algorithms/operations in many fields.

References and Related Articles

[2] [Cooley65] Cooley, J. W. and Tukey, O. W. "An Algorithm for the Machine Calculation of Complex Fourier Series." Math. Comput. 19, 297-301, 1965.
[3] Jason L. Mitchell, Marwan Y. Ansari and Evan Hart "Advanced Image Processing with DirectX® 9 Pixel Shaders" - From ShaderX2 - Shader Programming Tips and Tricks with DirectX 9, 2003
[4] Thilaka Sumanaweera, Donald Liu "Medical Image Reconstruction with the FFT" - GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation

About the Author

Raghu Muthyalampalli is a Sr. Software Engineer working on client enabling in the Software Solutions Group that enables client platforms through software optimizations. Prior to working at Intel he had a short stint at eBay working as a Software Engineer. He earned his Master's degree in Computer Science from the University of Illinois at Chicago. His email is


Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804