Convert regular FFT code to cilk

Convert regular FFT code to cilk


The attached code is an FFT implementation. It should work on any input vector length

I converted each for loop into a clik_for loop.

When I did it in line "for (i = 1; i < n; i += 2)" the results of the FFT were wrong.

There is also a probem with the while loops.

I tried to convert: "while (n > mmax)" to "cilk_for (;n>mmax;)" but I got a syntax error: missing control variable declaration and initialization.

What is the right way to convert a regular FFT code to a clik code ? 



Downloadtext/x-csrc fft.c5.32 KB
6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I don't see any cilk_for in the attachment.  Did you hope for "clik_for" to be so interpreted?  How could it compile successfully?

As you saw, cilk_for is strict about requiring all the fields to be filled in in such a way that the loop is countable, and none of the parameters are modified elsewhere in the loop.  It's a parallelization construct, so need not accept anything which is not parallelizable.

Are you certain that you want nested parallelism as your first step?

One of your loops has a race --- each iteration of the following loop modifies the same j.

 j = 1;    
cilk_for (int i = 1; i < n; i += 2) {
     if (j > i) {    
         tempr = data[j];     
         data[j] = data[i];    
         data[i] = tempr;     
         tempr = data[j+1];
         data[j+1] = data[i+1];
         data[i+1] = tempr; }
         m = n >> 1;
         while (m >= 2 && j > m) {     
            j -= m;    
            m >>= 1; 
         j += m;    

There could be other similar errors in your program --- I didn't check the rest of the program too carefully.
Using Cilk screen to race-detect your program may help eliminate some of these errors.

More generally, I would also question whether implementing your own FFT is actually the most efficient way to solve your problem.   I would think that using an existing optimized library would likely be more efficient in terms of both performance and development effort.



The code has races on m, j,wr, wi, tempr, tempi.  The example demonstrates a common difficulty with parallelizing serial code: the serial code makes heavy use of induction variables, that is variables whose value is incrementally updated on each iteration based on the variable's value in the previous iteration.  For example, wr/wi are being computed based on their values in the previous iteration of the m-loop.  You'll need to change the calculation to compute their value's from scratch.  Typically production-strength FFTs precompute a lookup table for these values (often called "twiddle factors").  Likewise the calculations of m and j need to be done in a way that does not depend on their previous values.

Some general advice: always declare variables in the innermost scope possible.  For example, tempr should be declared inside the two blocks that it is used.  That will eliminate the race on tempr.  The race on tempi can be eliminated in a similar way.

Dear Members,

I will search a simpler C code and try to "cilk" it.



Remember that if you haven't checked for races, they are almost certainly lurking in your code. The Cilk Tools include the Cilkscreen Race Detector and the Cilkview Scalability Analyzer. They're available as a free download from

- Barry

Leave a Comment

Please sign in to add a comment. Not a member? Join today