Would someone happen to have a working example of an FFT convolution using IPP? I'm trying to setup a basic image compare, so far the results aren't great.

# Working FFT convolution example?

## Working FFT convolution example?

For more complete information about compiler optimizations, see our Optimization Notice.

>>Would someone happen to have a working example of an FFT convolution using IPP? I'm trying to setup a basic

>>image compare, so far the results aren't great...

IPP documents in Pdf-format have a great set of different working (!) examples and please take a look. Also, could you provide additional details what was done on your side before you come up to the conclusion that '...results aren't great...'. I'd like to see your test case, please.

I have seen this in my own implementation where I do image filtering using ippConvolve, and I see that the computation speed is not increasing when I do increase the kernel size (keeping the image size constant). This is what you would except if it was using DFT/FFT internally instead of a standard convolution operation.

>>...As far as I know, ippConvolve already internally use FFT/DFT, when the image size is larger than X...

What do you mean X? You're Not specific ( completely fuzzy ) even if you're claiming '...I do image filtering using ippConvolve...'.

Thanks guys for your help.

**Quote:**

Sergey Kostrovwrote:IPP documents in Pdf-format have a great set of different working (!) examples and please take a look. Also, could you provide additional details what was done on your side before you come up to the conclusion that '...results aren't great...'. I'd like to see your test case, please.

I've gone through the FFT example I did find which were a pretty big help, but they were just simple forward/inverse FFT examples, not quite what I'm doing. If anything I'm just looking for something a little more fleshed out so I can get some idea as to what I may be doing wrong.

**Quote:**

Thomas Jensenwrote:

I have seen this in my own implementation where I do image filtering using ippConvolve, and I see that the computation speed is not increasing when I do increase the kernel size (keeping the image size constant). This is what you would except if it was using DFT/FFT internally instead of a standard convolution operation.

Where is this ippConvolve function? I looked through the documentation pdf's and example source code and I haven't found anything like this.

>>...Where is this ippConvolve function?..

Please take a look at **ipps.h** header file for a set of functions with **ippsConv** prefix. For example,

...

/* /////////////////////////////////////////////////////////////////////////////

// **Convolution functions**

///////////////////////////////////////////////////////////////////////////// */

/* /////////////////////////////////////////////////////////////////////////////

// Name: ippsConv

// Purpose: Linear Convolution of 1D signals

// Parameters:

// pSrc1 pointer to the first source vector

// pSrc2 pointer to the second source vector

// lenSrc1 length of the first source vector

// lenSrc2 length of the second source vector

// pDst pointer to the destination vector

// Returns: IppStatus

// ippStsNullPtrErr pointer(s) to the data is NULL

// ippStsSizeErr length of the vectors is less or equal zero

// ippStsMemAllocErr no memory for internal buffers

// ippStsNoErr otherwise

// Notes:

// Length of the destination data vector is lenSrc1+lenSrc2-1.

// The input signal are exchangeable because of

// commutative convolution property.

// Some other values may be returned by FFT transform functions

*/

IPPAPI(IppStatus, **ippsConv_32f**, ( const Ipp32f* pSrc1, int lenSrc1,

const Ipp32f* pSrc2, int lenSrc2, Ipp32f* pDst))

IPPAPI(IppStatus, **ippsConv_16s_Sfs**, ( const Ipp16s* pSrc1, int lenSrc1,

const Ipp16s* pSrc2, int lenSrc2, Ipp16s* pDst, int scaleFactor))

IPPAPI( IppStatus, **ippsConv_64f**,( const Ipp64f* pSrc1, int lenSrc1,

const Ipp64f* pSrc2, int lenSrc2, Ipp64f* pDst))

...

>>...As far as I know, **ippConvolve** already internally use FFT/DFT...

Thomas,

Could you explain where **ippConvolve** name comes from?

I just checked several IPP versions from 7.1 down to 3.0 and IPL ( Image Processing Library ). I see that term **Conv** is used all the time and term **Convolution** never used in names for functions.

Thanks in advance.

**Quote:**

Sergey Kostrovwrote:>>...Where is this ippConvolve function?..

Please take a look at

ipps.hheader file for a set of functions withippsConvprefix. For example,

...

/* /////////////////////////////////////////////////////////////////////////////

//Convolution functions

///////////////////////////////////////////////////////////////////////////// */

/* /////////////////////////////////////////////////////////////////////////////

// Name: ippsConv

// Purpose: Linear Convolution of 1D signals

// Parameters:

// pSrc1 pointer to the first source vector

// pSrc2 pointer to the second source vector

// lenSrc1 length of the first source vector

// lenSrc2 length of the second source vector

// pDst pointer to the destination vector

// Returns: IppStatus

// ippStsNullPtrErr pointer(s) to the data is NULL

// ippStsSizeErr length of the vectors is less or equal zero

// ippStsMemAllocErr no memory for internal buffers

// ippStsNoErr otherwise

// Notes:

// Length of the destination data vector is lenSrc1+lenSrc2-1.

// The input signal are exchangeable because of

// commutative convolution property.

// Some other values may be returned by FFT transform functions

*/IPPAPI(IppStatus,

ippsConv_32f, ( const Ipp32f* pSrc1, int lenSrc1,

const Ipp32f* pSrc2, int lenSrc2, Ipp32f* pDst))

IPPAPI(IppStatus,ippsConv_16s_Sfs, ( const Ipp16s* pSrc1, int lenSrc1,

const Ipp16s* pSrc2, int lenSrc2, Ipp16s* pDst, int scaleFactor))

IPPAPI( IppStatus,ippsConv_64f,( const Ipp64f* pSrc1, int lenSrc1,

const Ipp64f* pSrc2, int lenSrc2, Ipp64f* pDst))

...

Alright, I'll take a look. Thanks.

>>Would someone happen to have a working example of an FFT convolution using IPP?...

Theo, Let me know if you are interested in these two test cases. Here are short descriptions:

...

// Sub-Test 13 - **Problem with ippsFIR_Direct_32f and ippsConv_64f functions**

...

// Sub-Test 14 - **Performance Evaluation of ippsFIR_Direct_32f and ippsConv_64f functions**

...

If Yes, I'll provide sources **As Is** and you will need to do all the rest clean ups.

**Quote:**

Sergey Kostrovwrote:>>Would someone happen to have a working example of an FFT convolution using IPP?...

Theo, Let me know if you are interested in these two test cases. Here are short descriptions:

...

// Sub-Test 13 -Problem with ippsFIR_Direct_32f and ippsConv_64f functions

...

// Sub-Test 14 -Performance Evaluation of ippsFIR_Direct_32f and ippsConv_64f functions

...

If Yes, I'll provide sourcesAs Isand you will need to do all the rest clean ups.

Sure! It may be helpful. Thanks.

// Sub-Test 13 - **Problem with ippsFIR_Direct_32f and ippsConv_64f functions**

{

/*

CrtPrintf( RTU("Sub-Test 13\n") );

// Test-Case 1

{

const RTint N = 7;

const RTint M = 11;

Ipp32f pSrc[N];

Ipp32f pDstC[N+M-1];

Ipp32f pDstF[N+M-1];

Ipp32f pTaps[M];

Ipp32f pDlyLine[2*M];

RTint iDlyLineIndex;

RTint iTapsPos;

RTint i;

RTint j;

RTint k;

for( i = 0; i < N; i++ ) // Initialization

pSrc[i]= ( RTfloat )i;

for( j = 0; j < ( N+M-1 ); j++ )

{

pDstC[j] = 0.0f;

pDstF[j] = 0.0f;

}

for( j = 0; j < M; j++ )

pTaps[j] = 0.0f;

for( j = 0; j < ( 2*M ); j++ )

pDlyLine[j] = 0.0f;

iTapsPos = 0;

CrtPrintf( RTU("Taps Position: %ld\n\n"), iTapsPos );

pTaps[ iTapsPos ] = 1.0f; // pTaps[ (M/2)-1 ] = 1.0L;

iDlyLineIndex = 0; // iDlyLineIndex = ( M-1 );

// Processing with [ ippsConv_32f ]

CrtPrintf( RTU("Calling IPP [ ippsConv_32f ] function\n") );

st = ::**ippsConv_32f**( pSrc, N, pTaps, M, pDstC );

if( st == ippStsNoErr )

CrtPrintf( RTU("Successful: Status = %ld\n"), ( RTint )st );

else

CrtPrintf( RTU("Error : Status = %ld\n"), ( RTint )st );

// [ ippsFIR_Direct_32f ] [ ippsFIR_Direct_64f ] Directly filters a block of samples through a single-rate

// FIR filter

// const Ipp32f* pSrc const Ipp64f* pSrc - pointer to the input array

// Ipp32f* pDst Ipp64f* pDst - pointer to the output array

// int numIters int numIters - number of samples in the input array

// const Ipp32f* pTaps const Ipp64f* pTaps - pointer to the array containing the taps values,

// the number of elements in the array is tapsLen

// int tapsLen int tapsLen - number of elements in the array containing the taps values

// Ipp32f* pDlyLine Ipp64f* pDlyLine - pointer to the array containing the delay line values

// the number of elements in the array is 2*tapsLen

// int* pDlyLineIndex int* pDlyLineIndex - pointer to the current delay line index

// Processing with [ ippsFIR_Direct_32f ]

CrtPrintf( RTU("Calling IPP [ ippsFIR_Direct_32f ] function\n") );

st = ::**ippsFIR_Direct_32f**( pSrc, pDstF, N+iTapsPos, pTaps, M, pDlyLine, &iDlyLineIndex );

if( st == ippStsNoErr )

CrtPrintf( RTU("Successful: Status = %ld\n"), ( RTint )st );

else

CrtPrintf( RTU("Error : Status = %ld\n"), ( RTint )st );

CrtPrintf( RTU("\n") ); // Displaying results

CrtPrintf( RTU("Displaying Convolution and FIR results:\n") );

for( k = 0; k < ( N+M-1 ); k++ )

{

CrtPrintf( RTU("\tIndex %4ld [ CONV = %f FIR = %f ]\t\tDIFF = %.16f\n"),

( RTint )k, pDstC[k], pDstF[k], ( pDstC[k] - pDstF[k] ) );

}

RTbool bOk = RTfalse;

CrtPrintf( RTU("\n") );

CrtPrintf( RTU("Comparison of Convolution and FIR results:\n") );

for( k = 0; k < ( N+M-1 ); k++ )

{

// Regular Verification

// if( pDstC[k] != pDstF[k] )

// {

// bOk = RTtrue;

// CrtPrintf( RTU("\tIndex %4ld [ CONV = %f Not Equal FIR = %f ]\t\tDIFF = %.24f\n"),

// ( int )k, pDstC[k], pDstF[k], ( pDstC[k] - pDstF[k] ) );

// }

// Verification if a difference is greater than an Eplsilon for

// the Single-Precision Floating-Point type

if( CrtFabs( pDstC[k] - pDstF[k] ) > IPP_EPS23 )

{

bOk = RTtrue;

CrtPrintf( RTU("\tIndex %4ld [ CONV = %f Not Equal FIR = %f ]\t\tDIFF = %.24f\n"),

( RTint )k, pDstC[k], pDstF[k], ( pDstC[k] - pDstF[k] ) );

}

}

if( bOk == RTfalse )

CrtPrintf( RTU("\tConvolution and FIR results are Identical\n") );

}

*/

}

## Attachments:

Attachment | Size |
---|---|

Download subtest13.cpp | 3.72 KB |

>>...// Sub-Test 13 - **Problem with ippsFIR_Direct_32f and ippsConv_64f functions**

History is as follows: Some time in 2012 a problem with results returned by these IPP functions ( results did not match ) was detected and that is why investigation was done.

Note: Please try to verify if everything is correct with your current version of IPP.

Sorry that I was unclear, I was writing from memory. With ippConvolve I meant ippiConvFull...

With X, I dont know, since X is determined by IPP internally, and I'm just a user. However, I sure IPP does use fourier internally in ippiConvFull/Valid, when kernel size is larger than X.

I did a quick search for more information about this, but couldn't find it...

Test: Using IPL (very old IPP), I was using image sharpening using convolution of the image and a smaller kernel with a sharpening setup. My sharpening was configurable with three parameters. One parameter affected the kernel size. When interactively sharpening an image, and modifying that one parameter, I see that the convolution speed being directly propertional to the kernel size. Larger kernel, longer processing time. This is expected for a simple implementation of a convolution function.

Later, in IPP 6, I reimplemented my sharpening filter using ippiConvFull. I did implement threading, so the new implementation was faster anyway, but I also noticed that the processing time was no longer dependent on the kernel size. Small kernel = 0.1s, large kernel = 0.1s.

Later, I posted a question about this in the IPP forum, and I learned from an Intel person that IPP internally use fourier transforms to implemenent convolution.

>>...I was writing from memory. With ippConvolve I meant ippiConvFull...

Thanks for clarification, Thomas.

>>...Later, in IPP 6, I reimplemented my sharpening filter using ippiConvFull. I did implement threading, so the new

>>implementation was faster anyway, but I also noticed that the processing time was no longer dependent on the kernel

>>size. Small kernel = 0.1s, large kernel = 0.1s...

Theo, please take into account that internal threading of IPP functions won't be used in future versions of IPP. That was discussed many times already on the forum.

Great, thanks for the sample code and help you guys.

By the way, just looking through ippi.h and I found what Thomas mentioned about ippiConvFull using FFT when kernel size > X which states:

"If the resulting IppiSize > CRITERION, then convolution is done using 2D FFT."

>>..."If the resulting IppiSize > CRITERION, then convolution is done using 2D FFT."...

I found two references to a word **CRITERION** and I would be glad to know its value as well.

**PS:** Even 11-year old **ippi.h** from IPP v3.0 has these two references! Just verified.

Hi All,

these criterions are rather complex - I think it's clear that criterion should depend on data type, on kernel size, on image size, on CPU architecture; for "valid" case there also should be the second criterion (stop FFT use) - when output size is rather small because kernel and image have near the same sizes. in current IPP for SSE2 code criterion is the next: #define _CONVFULL_32F_START_FFTUSE 63 * 47. In the new API that will be introduced in IPP 8.0 all Conv/Corr functions have got one more parameter - algType - that can be "direct", "FFT" and "auto" - so anyone will have more flexibility to choose the most suitable one.

Regards, Igor

>>...these criterions are rather complex...

It doesn't matter how complex they are. Our point is that we would like to know some numbers, or we want to see some examples, and all that stuff have to be clearly explained in IPP documentation. Any fuzzy statements do not help in real cases.

>>...I think it's clear that criterion should depend on data type, on kernel size, on image size, on CPU architecture; for "valid" case there

>>also should be the second criterion (stop FFT use)...

This is how it looks like in **ippi.h** header file:

...

// If the resulting **IppiSize > CRITERION**, then convolution is done

// using 2D FFT.

...

In overall, I see that it depends just one one parameter **IppiSize** and it doesn't depend on anything else unless that comment in **ippi.h** header is wrong.

Sergey, you are "eating letters" - ok, we'll correct this comment to something look like "if input parameters satisfy some complex conditions - convolution is done with 2D FFT". Or you would like to see the next criterion description with specific numbers for all supported architectures:

if(((( dstSize.width <= CROSSCORRVALID_FFT_CRITERION_OUTSIZE )||\

( tplRoiSize.width <= CROSSCORRVALID_FFT_CRITERION_TPLSIZE )||\

( srcRoiSize.width <= CROSSCORRVALID_FFT_CRITERION_SRCSIZE )))&&\

( dstSize.width <= MAX_DIRECT_DIMENSION_SIZE )&&\

( tplRoiSize.width * tplRoiSize.height < CROSSCORRVALID_MAX_DIRECT_TPL_SIZE )){\

return owniCrossCorrValid_NormLevel_##fType( pSrc, srcStep, srcRoiSize, pTpl,\

tplStep, tplRoiSize, pDst, dstStep );\

All IPP functions, that have low level optimized code, have a lot of different branches for different cases - for example the most simple function - ippsCopy_8u - has 47 code branches based on internal criterions. All such criterions are library internals and our IP.

Regards, Igor

Igor,

IDZ forums read by thousands of developers around the world. So, please don't use all these Russian-like expressions, slangs, idioms, etc. This is Not the first time when you do this. Thanks in advance.

>>Sergey, you are "eating letters"...

I quoted a statement which is at least 10-year-old. Then, all these pieces of codes are **useless** until we see values.

CROSSCORRVALID_FFT_CRITERION_OUTSIZE = ?

CROSSCORRVALID_FFT_CRITERION_TPLSIZE = ?

CROSSCORRVALID_FFT_CRITERION_SRCSIZE = ?

MAX_DIRECT_DIMENSION_SIZE = ?

CROSSCORRVALID_MAX_DIRECT_TPL_SIZE = ?

or, have clear explanation(s).

Igor, If values for these constantans are "classified" than why did you post the piece of code? We are asking for some number. For example, like:

**...If image size is greater than 4MB, or dimensions are greater than 1024x1024 ( float ), then convolution is done using 2D FFT**

This is what everybody would be glad to know.