Convolution and Correlation Usage
Examples
This section demonstrates how you can use the routines to perform some common convolution and correlation operations both for single-threaded and multithreaded calculations. The following two sample functions
Intel® oneAPI Math Kernel Library
scond1
and
sconf1
simulate the
convolution and correlation functions
SCOND
and
SCONF
found in IBM
ESSL* library. The functions assume single-threaded calculations and can be
used with C or C++ compilers.
Function
scond1
for
Single-Threaded Calculations#include "mkl_vsl.h"
int scond1(
float h[], int inch,
float x[], int incx,
float y[], int incy,
int nh, int nx, int iy0, int ny)
{
int status;
VSLConvTaskPtr task;
vslsConvNewTask1D(&task,VSL_CONV_MODE_DIRECT,nh,nx,ny);
vslConvSetStart(task, &iy0);
status = vslsConvExec1D(task, h,inch, x,incx, y,incy);
vslConvDeleteTask(&task);
return status;
}
Function
sconf1
for Single-Threaded Calculations#include "mkl_vsl.h"
int sconf1(
int init,
float h[], int inc1h,
float x[], int inc1x, int inc2x,
float y[], int inc1y, int inc2y,
int nh, int nx, int m, int iy0, int ny,
void* aux1, int naux1, void* aux2, int naux2)
{
int status;
/* assume that aux1!=0 and naux1 is big enough */
VSLConvTaskPtr* task = (VSLConvTaskPtr*)aux1;
if (init != 0)
/* initialization: */
status = vslsConvNewTaskX1D(task,VSL_CONV_MODE_FFT,
nh,nx,ny, h,inc1h);
if (init == 0) {
/* calculations: */
int i;
vslConvSetStart(*task, &iy0);
for (i=0; i<m; i++) {
float* xi = &x[inc2x * i];
float* yi = &y[inc2y * i];
/* task is implicitly committed at i==0 */
status = vslsConvExecX1D(*task, xi, inc1x, yi, inc1y);
};
};
vslConvDeleteTask(task);
return status;
}
Using Multiple Threads
For functions such as
, you can use multiple threads for
invoking the task execution against different data sequences. For such cases,
use task copy routines to create
sconf1
described in the previous example, parallel calculations may be
more preferable instead of cycling. If
m
>1m
copies of the task object before the calculations stage and then run these
copies with different threads. Ensure that you make all necessary parameter
adjustments for the task (using
Task Editors) before copying
it.
The sample code in this case may look as follows:
if (init == 0) {
int i, status, ss[M];
VSLConvTaskPtr tasks[M];
/* assume that M is big enough */
. . .
vslConvSetStart(*task, &iy0);
. . .
for (i=0; i<m; i++)
/* implicit commitment at i==0 */
vslConvCopyTask(&tasks[i],*task);
. . .
Then,
m
threads may be
started to execute different copies of the task:
. . .
float* xi = &x[inc2x * i];
float* yi = &y[inc2y * i];
ss[i]=vslsConvExecX1D(tasks[i], xi,inc1x, yi,inc1y);
. . .
And finally, after all threads have finished the
calculations, overall status should be collected from all task objects. The
following code signals the first error found, if any:
. . .
for (i=0; i<m; i++) {
status = ss[i];
if (status != 0) /* 0 means "OK" */
break;
};
return status;
}; /* end if init==0 */
Execution routines modify the task internal state
(fields of the task structure). Such modifications may conflict with each other
if different threads work with the same task object simultaneously. That is why
different threads must use different copies of the task.
Optimization Notice
|
---|
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
|
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.