Contents

Convolution and Correlation Data Allocation

This section explains the relation between:
  • mathematical finite functions
    u
    ,
    v
    ,
    w
    introduced in Mathematical Notation and Definitions ;
  • multi-dimensional input and output data vectors representing the functions
    u
    ,
    v
    ,
    w
    ;
  • arrays
    u
    ,
    v
    ,
    w
    used to store the input and output data vectors in computer memory
The convolution and correlation routine parameters that determine the allocation of input and output data are the following:
  • Data arrays
    x
    ,
    y
    ,
    z
  • Shape arrays
    xshape
    ,
    yshape
    ,
    zshape
  • Strides within arrays
    xstride
    ,
    ystride
    ,
    zstride
  • Parameters
    start
    ,
    decimation

Finite Functions and Data Vectors

The finite functions
u(p)
,
v(q)
, and
w(r)
introduced above are represented as multi-dimensional vectors of input and output data:
inputu(i
1
,...,i
dims
)
for
u
(p
1
,...,p
N
)
inputv(j
1
,...,j
dims
)
for
v
(q
1
,...,q
N
)
output(k
1
,...,k
dims
)
for
w
(r
1
,...,r
N
).
Parameter
dims
represents the number of dimensions and is equal to N.
The parameters
xshape
,
yshape
, and
zshape
define the shapes of input/output vectors:
inputu(i
1
,...,i
dims
)
is defined if
1
i
n
xshape
(
n
)
for every
n=1,...,
dims
inputv(j
1
,...,j
dims
)
is defined if
1
j
n
yshape
(
n
)
for every
n=1,...,
dims
output(k
1
,...,k
dims
)
is defined if
1
k
n
zshape
(
n
)
for every
n=1,...,
dims
.
Relation between the input vectors and the functions
u
and
v
is defined by the following formulas:
inputu(i
1
,...,i
dims
)=
u
(p
1
,...,p
N
)
, where
p
n
= P
n
min
+ (i
n
-1)
for every
n
inputv(j
1
,...,j
dims
)=
v
(q
1
,...,q
N
)
, where
q
n
=Q
n
min
+ (j
n
-1)
for every
n
.
The relation between the output vector and the function
w(r)
is similar (but only in the case when parameters
start
and
decimation
are not defined):
output(k
1
,...,k
dims
)= w(r
1
,...,r
N
)
, where
r
n
=R
n
min
+ (k
n
-1)
for every
n
.
If the parameter
start
is defined, it must belong to the interval
R
n
min
start(n)
R
n
max
. If defined, the
start
parameter replaces
R
min
in the formula:
output(k
1
,...,k
dims
)=w(r
1
,...,r
N
)
, where
r
n
=start(n) + (k
n
-1)
If the parameter
decimation
is defined, it changes the relation according to the following formula:
output(k
1
,...,k
dims
)=w(r
1
,...,r
N
)
, where
r
n
= R
n
min
+ (k
n
-1)*decimation(n)
If both parameters
start
and
decimation
are defined, the formula is as follows:
output(k
1
,...,k
dims
)=w(r
1
,...,r
N
)
, where
r
n
=start(n) + (k
n
-1)*decimation(n)
The convolution and correlation software checks the values of
zshape
,
start
, and
decimation
during task commitment. If
r
n
exceeds
R
n
max
for some
k
n
,n=1,...,dims
, an error is raised.

Allocation of Data Vectors

Both parameter arrays
x
and
y
contain input data vectors in memory, while array
z
is intended for storing output data vector. To access the memory, the convolution and correlation software uses only pointers to these arrays and ignores the array shapes.
For parameters
x
,
y
, and
z
, you can provide one-dimensional arrays with the requirement that actual length of these arrays be sufficient to store the data vectors.
The allocation of the input and output data inside the arrays
x
,
y
, and
z
is described below assuming that the arrays are one-dimensional. Given multi-dimensional indices
i
,
j
,
k
Z
N
, one-dimensional indices
e
,
f
,
g
Z
are defined such that:
inputu(i
1
,...,i
dims
) is allocated at
x(e)
inputv(j
1
,...,j
dims
) is allocated at
y(f)
output(k
1
,...,k
dims
) is allocated at
z(g)
.
The indices
e
,
f
, and
g
are defined as follows:
e
= 1 +
xstride
(n)·
dx
(n)
(the sum is for all
n=1,...,dims
)
f
= 1 +
ystride
(n)·
dy
(n)
(the sum is for all
n=1,...,dims
)
g
= 1 +
zstride
(n)·
dz
(n)
(the sum is for all
n=1,...,dims
)
The distances
dx(n)
,
dy(n)
, and
dz(n)
depend on the signum of the stride:
dx(n) = i
n
-1 if xstride(n)>0, or dx(n) = i
n
-xshape(n) if xstride(n)<0
dy(n) = j
n
-1 if ystride(n)>0, or dy(n) = j
n
-yshape(n) if ystride(n)<0
dz(n) = k
n
-1 if zstride(n)>0, or dz(n) = k
n
-zshape(n)
if zstride(n)<0
The definitions of indices
e
,
f
, and
g
assume that indexes for arrays
x
,
y
, and
z
are started from unity:
x(e)
is defined for
e=1,...,length(x)
y(f)
is defined for
f=1,...,length(y)
z(g)
is defined for
g=1,...,length(z)
Below is a detailed explanation about how elements of the multi-dimensional output vector are stored in the array
z
for one-dimensional and two-dimensional cases.
One-dimensional case.
If
dims
=1
, then
zshape
is the number of the output values to be stored in the array
z
. The actual length of array
z
may be greater than
zshape
elements.
If
zstride>1
, output values are stored with the stride:
output(1)
is stored to
z(1)
,
output(2)
is stored to
z(1+zstride)
, and so on. Hence, the actual length of
z
must be at least
1+zstride*(zshape-1)
elements or more.
If
zstride<0
, it still defines the stride between elements of array
z
. However, the order of the used elements is the opposite. For the
k-
th output value, output(
k
) is stored in
z(1+|zstride|*(zshape-k))
, where
|zstride|
is the absolute value of
zstride
. The actual length of the array
z
must be at least
1+|zstride|*(zshape - 1)
elements.
Two-dimensional case.
If
dims
=2
, the output data is a two-dimensional matrix. The value
zstride(1)
defines the stride inside matrix columns, that is, the stride between the
output(k
1
, k
2
)
and
output(k
1
+1, k
2
)
for every pair of indices
k
1
, k
2
. On the other hand,
zstride(2)
defines the stride between columns, that is, the stride between
output(k
1
,k
2
)
and
output(k
1
,k
2
+1)
.
If
zstride(2)
is greater than
zshape(1)
, this causes sparse allocation of columns. If the value of
zstride(2)
is smaller than
zshape(1)
, this may result in the transposition of the output matrix. For example, if
zshape = (2,3)
, you can define
zstride = (3,1)
to allocate output values like transposed matrix of the shape
3x2
.
Whether
zstride
assumes this kind of transformations or not, you need to ensure that different elements output (
k
1
, ...,k
dims
) will be stored in different locations
z(g)
.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804