Loop Vectorization 01

Loop Vectorization 01

Portrait de Royi

Hello,

I have a Vectorization optimization problem.
I have a struct pDst which have 3 fields named: 'red', 'green' and 'blue'.
The type might be 'Char', 'Short' or 'Float'.This is given and can not be altered.
We have another array pSrc which represents an image [RGB] - Namely an array of 3 pointers which every one of them point to a layer of an image.Each layer is built using IPP plane oriented image (Namely, Each plane is formed independently - 'ippiMalloc_32f_C1'):http://software.intel.com/sites/products/documentation/hpc/ipp/ippi/ippi_ch3/functn_Malloc.html

We would like to copy it as described in the following code:

for(int y = 0; y < imageHeight; ++y)
{
    for(int x = 0; x < imageWidth; ++x)
    {
        pDst[x + y * pDstRowStep].red     = pSrc[0][x + y * pSrcRowStep];
        pDst[x + y * pDstRowStep].green = pSrc[1][x + y * pSrcRowStep];
        pDst[x + y * pDstRowStep].blue    = pSrc[2][x + y * pSrcRowStep];
    }

 

Yet, in this form the compiler can't vectorize the code.
At first it says: "loop was not vectorized: existence of vector dependence.".
When I use the  #pragma ivdep to help the compiler (Since there's no dependence) I get the following error:"loop was not vectorized: dereference too complex.".

Anyone has an idea how to allow vectorization?

I have Intel Compiler 13.0.

Thanks.

4 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de Jennifer J. (Intel)

Check out the pragma simd feature.

The online doc is here: http://software.intel.com/sites/products/documentation/studio/composer/e...

Please read this article first - http://software.intel.com/en-us/articles/requirements-for-vectorizing-lo...

Jennifer

Portrait de Royi

Hi Jennifer,
You referred me to an article where it says how to enforce vectorization.
Where I want to understand why isn't it automatically vectorized.

Maybe there is a different form I should arrange the data to yield the needed vectorization.

Thank You.

Portrait de mark-sabahi (Intel)

The compiler does not vectorize the code because it believes vectorizing it would be inefficient mostly due to use of Array of Structure access (AOS) which requires generations of gather/scatter instructions that are slow relative to use of linear access instructions:

struct X {
float red, green, blue;
};

struct X *restrict pDst;
float *restrict pSrc[3];

void foo(int imageHeight, int imageWidth, int pDstRowStep, int pSrcRowStep){
int x, y;
for (y = 0; y < imageHeight; y++){
for (x = 0; x < imageWidth; x++){
pDst[x + y * pDstRowStep].red = pSrc[0][x + y * pSrcRowStep];
pDst[x + y * pDstRowStep].green = pSrc[1][x + y * pSrcRowStep];
pDst[x + y * pDstRowStep].blue = pSrc[2][x + y * pSrcRowStep];
}
}
}

$ icc -vec-report2 -c -restrict vec10.cpp -V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.0.0.079 Build 20120731
Copyright (C) 1985-2012 Intel Corporation. All rights reserved.

vec10.cpp(11): (col. 4) remark: loop was not vectorized: vectorization possible but seems inefficient.
vec10.cpp(10): (col. 3) remark: loop was not vectorized: not inner loop.

Section 5.3 about SoA vs AoS at http://software.intel.com/en-us/articles/a-guide-to-auto-vectorization-w... might give some helpful information about this.

You can bypass the compiler's cost benefit analysis and have it vectorize the loop by using the -vec-threshold0 option, but the code may run slow for default vectorization target which is SSE2:

$ icc -vec-report2 -c -restrict vec10.cpp -V -vec-threshold0
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.0.0.079 Build 20120731

vec10.cpp(11): (col. 4) remark: LOOP WAS VECTORIZED.
vec10.cpp(10): (col. 3) remark: loop was not vectorized: not inner loop.

The compiler can vectorize the code for AVX without the use of -vec-threshold0 option but I am not sure if it will give much speed up compare to the non-vectorized version:

$ icc -vec-report2 -c -restrict vec10.cpp -V -xAVX
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.0.0.079 Build 20120731

vec10.cpp(11): (col. 4) remark: LOOP WAS VECTORIZED.
vec10.cpp(10): (col. 3) remark: loop was not vectorized: not inner loop.

Connectez-vous pour laisser un commentaire.