Can icpc vectorize my problem?

Can icpc vectorize my problem?

Hi,

I'm trying to implement a class (first in 1d) to behave like Fortran. I create a "Array1d" class that contains 3 infos: size, offset, and pointer of the array.

I provide the 3 files needed to be able to compile. Using everything to help the compiler (-O3 -xSSSE3 -restrict) I don't understand why one of my loops does not vectorize? (line 30 in main.cpp)

I define () operator to be able to do:

array(i)=something, with i which can be a positive or negative index (depends on the offset).

This operator implemented like (o1 is an offset):

T& restrict operator()(const int64_t i){return ptr[i-o1];}

If I do:

for(i=min;i<=max;++i)
array(i)=3.14f;

This loop does not vectorize. I don't understand why? because I thought that inlining process would perform this loop

for(i=min;i<=max;++i)
array.ptr[i-array.o1]=3.14f;

and this loop vectorize.

So why the icpc compiler cannot vectorize the first loop, and can do it on the explicit inlined version?

And curiously when I have a look on the assembly code (generated using -S) it seems that there no problem with dependency issue...

I'm using the 11.1.064 intel c++ compiler on Ubuntu x86_64. It seems that the 11.1.069 give the same result.

Thank you by advance.

AttachmentSize
Downloadtext/x-chdr array_1d.h2.16 KB
Downloadtext/x-c++src main.cpp859 bytes
Downloadtext/x-chdr range.h628 bytes
Downloadapplication/octet-stream makefile649 bytes
15 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi, Raphael

Thank you for raising this issue. I verified that this loop can be vectorized on windows, but failedon linux. I have enteredthis in our problem tracking system. I will let you know when I have an update on this issue.

Thank you.

Thank you Yolanda for your quick answer.

Can you tell me with wich version you could vectorize on windows please? Because I collegue tried on windows before creating this thread and he couldn't vectorize.

I hope in the future, the correction will be able to vectorize more complicated expressions, on 2d 3d... arrays.

try adding the option -ansi-alias . That seems to work with the given test case.

$ icpc -restrict -vec-report2 -V main.cpp

Intel C++ Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100203 Package ID: l_cproc_p_11.1.069

Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

Edison Design Group C/C++ Front End, version 3.10.1 (Feb 3 2010 19:19:06)

Copyright 1988-2007 Edison Design Group, Inc.

main.cpp(20): (col. 5) remark: LOOP WAS VECTORIZED.

main.cpp(24): (col. 10) remark: LOOP WAS VECTORIZED.

main.cpp(30): (col. 5) remark: loop was not vectorized: existence of vector dependence.

GNU ld version 2.17.50.0.6-5.el5 20061020

$ icpc -restrict -vec-report2 -V -ansi-alias main.cpp

Intel C++ Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100203 Package ID: l_cproc_p_11.1.069

Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

Edison Design Group C/C++ Front End, version 3.10.1 (Feb 3 2010 19:19:06)

Copyright 1988-2007 Edison Design Group, Inc.

main.cpp(20): (col. 5) remark: LOOP WAS VECTORIZED.

main.cpp(24): (col. 10) remark: LOOP WAS VECTORIZED.

main.cpp(30): (col. 5) remark: LOOP WAS VECTORIZED.

GNU ld version 2.17.50.0.6-5.el5 20061020

In this case this loop can be vectorized. But as I said previously I hope that the compiler will be able to vectorize more complicated expression.

If I use -ansi-alias, first it is dangerous for other part of the code which does not correspond to the rule. From the icpc man:

If your program adheres to these rules, then this option
allows the compiler to optimize more aggressively. If it
doesn't adhere to these rules, then it can cause the com
piler to generate incorrect code
.

second of all if I try something a little bit more complicted (but not so complicated):

for(i=min;i<=max;++i)
    array_copy(i)=array(i);

unfortunately the compiler cannot vectorize this expression even with -ansi-alias.

but the worth thing is, the next expression does not vectorize too:

for(i=min;i<=max;++i)
    array_copy.ptr[i-array_copy.o1]=array.ptr[i-array.o1];

It seems that the restrict qualifier is not well "propagated".

I know that, it is difficult for C/C++ compiler to vectorize compare to Fortran (because of aliasing rules). But I thought that restrict qualifier would be enough to avoid these problem and provide performances (especialy for computing science purpose).

Hi, Raphael

I tested with latest Intel C++Compiler for Windows. Version 11.1.054, and 11.1.048 also works. See:

C:\develop\bug\25731>icl /c /Qrestrict /Qvec-report2main.cpp

Intel C++ Compiler Professional for applications running on IA-32, Version 11.1 Build 20100203 Package ID: w_cproc_p_11.1.060

Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

main.cpp

C:\develop\bug\25731\main.cpp(20): (col. 5) remark: LOOP WAS VECTORIZED.

C:\develop\bug\25731\main.cpp(24): (col. 10) remark: LOOP WAS VECTORIZED.

C:\develop\bug\25731\main.cpp(30): (col. 5) remark: LOOP WAS VECTORIZED.

To compile on windows I add one more header file for "stdint.h". Attached my build files.

Attachments: 

AttachmentSize
Downloadapplication/zip arrayvec.zip38.79 KB

Here's result for Intel64:

C:\develop\bug\25731>icl /c /Qrestrict /Qvec-report2 main.cpp

Intel C++ Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100203 Package ID: w_cproc_p_11.1.060

Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

main.cpp

C:\develop\bug\25731\main.cpp(20): (col. 5) remark: LOOP WAS VECTORIZED.

C:\develop\bug\25731\main.cpp(24): (col. 10) remark: LOOP WAS VECTORIZED.

C:\develop\bug\25731\main.cpp(30): (col. 5) remark: LOOP WAS VECTORIZED.

Thank you Yolanda.

Are you able to vectorize a more complicated expression (on windows)? Something like:

for(i=min;i<=max;++i)
    array_copy(i)=array(i);

In my case I'm not able to do it. Thank for your help.

Hi, Raphael Lencrerot

No, I cannot vectorize this on Windows.
Thanksfor raising the problem. I'll investigate this and get back to you laterwith anupdate.

Thank you.

In general, you should compile the code with the -S option and look at the assembly language carefully. The thread below shows that the vec report will sometimes say a loop has been vectorized, but then the vectorized portion is not actually inserted into an executable code path. Instead, a "dumb" code snippet can be inserted in the executable code path, and a vectorized loop can just be dangling and uncalled.

http://software.intel.com/en-us/forums/showthread.php?t=70820

-Jeff

Hi, Raphael

This can also get vectorized by the latest Intel C++ Compiler for Windows 11.1.060.

I modify your main program as simple as:

#include "array_1d.h"

int main()
{
    int64_t i,min,max,size;
    Array1d array;
    Array1d array_copy;
    float* vec_tmp;

    array.resize(Range(-20,10));
    min=array.range1().min();
    max=array.range1().max();

    for(i=min;i<=max;++i)
       array_copy(i)=array(i);

    return 0;
}

Command output:

C:\\develop\\bug\\25731>icl main.cpp /Qvec-report1 /Qrestrict /QxSSSE3 /S

Intel C++ Compiler Professional for applications running on IA-32, Version 11.1 Build 20100203 Package ID: w_cproc_p_11.1.060 Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

main.cpp C:\\develop\\bug\\25731\\main.cpp(16): (col. 2) remark: LOOP WAS VECTORIZED.

Grep from assemler file:

.B1.16:                         ; Preds .B1.15 .B1.16
        movaps    xmm0, XMMWORD PTR [edi+eax*4]                 ;17.18
        movaps    xmm1, XMMWORD PTR [16+edi+eax*4]              ;17.18
        movaps    XMMWORD PTR [-80+ebx+eax*4], xmm0             ;8.20
        movaps    XMMWORD PTR [-64+ebx+eax*4], xmm1             ;8.20

Thank you.

Thank you Yolanda and Jeff.
Finally the intel compiler works much much better than I though (using -ansi-alias).
The following lines:

for(i=min;i<=max;++i)
    array_copy(i)=array(i);

are interpreted as a memcpy. It is the raison why the compile does print vectorize message, but use _intel_fast_memcpy instead. Using "-S -fcode-asm" I could see much more informations. And I could try something like:

for(i=min;i<=max;++i)
    array_copy(i)=2*array(i)+1;

the code generated is actually vectorized.

So I will try to do more and more complicated complicated expression.
My final purpose is to try to describe a finite difference schema and to see if we can have roughtly the same performances compared to a fortran code.

Final purpose (example of finite difference schema):

for(k=zmin,k<=zmax;++k)
{
    for(j=ymin,j<=ymax;++j)
    {
        for(i=xmin,i<=xmax;++i)
        {
            lapx=coefx(-1)*u(i-1,j,k)+coefx(0)*u(i,j,k)+coefx(1)*u(i+1,j,k);
            lapy=coefy(-1)*u(i,j-1,k)+coefy(0)*u(i,j,k)+coefy(1)*u(i,j+1,k);
            lapz=coefz(-1)*u(i,j,k-1)+coefz(0)*u(i,j,k)+coefz(1)*u(i,j,k+1);

            u_update(i,j,k)=-beta*u_update(i,j,k)+alpha(i,j,k)*(lapx+lapy+lapz);
        }
    }
}

Final question, I have tried to see what exactly is the impact of -ansi-alias but it not clear:

This option tells the compiler to assume that the program adheres to ISO C Standard aliasability rules.

does it mean that ONLY pointers qualified with restrict keyword is are concerned by the aliasability rule?

-ansi-alias informs the compiler that it should assume your program doesn't violate the standard typed aliasing rules. It's the same as gcc -fstrict-aliasing, which is a default for gcc. It means that you would not require restrict keyword when there are no 2 pointers of compatible type in parameter list, and also permits optimization of operations on arrays of arrays a[][].
Presumably, the reason for not setting it as a default is that one major Windows compiler doesn't perform optimizations based on the aliasing rules.

Ok! thank you for your explanations.
Generally on linux people who do numerical computing and use intel compilers need this kind of optimization. Maybe you should think about activating this by default on linux?

Another suggestion, would it be possible to warn the user that a loop is converted into a intel_fast_memcpy when vec-report is activated, even on fortran compiler?

I try to remember to set -ansi-alias -prec-div -prec-sqrt in icpc.cfg and icc.cfg each time I update the compiler installation. My memory isn't always up to the task.

I suppose the fast_memcpy notifications would be appropriate in opt-report. The next question is, what use do you intend to make of the information? If your data moves aren't big enough to benefit from the substitution, I fear it could be awkward to over-ride.

Leave a Comment

Please sign in to add a comment. Not a member? Join today