Erroneous code optimization found on 64-bit ICL 13.1.x

Erroneous code optimization found on 64-bit ICL 13.1.x

This compiler issue could be reproduced using this code snippet

/* main.c */
#include <stdio.h>

/* Define LEN to 1 could give the correct result, 2 or larger will give the wrong result without macro "ICL_WORKAROUND" defined in LreciprtL.c */
#define LEN 2
/* Function to calculate x^(-0.5) */
int
LreciprtL(int x);

static int
bench_reciprt(void)
{
    int Lsrc[LEN];

    int i;

    for (i = 0; i < LEN; i++) Lsrc[i] = (int) (0.760045 * 2147483648.0 + 0.5);


    for (i = 0; i < LEN; i++) printf("in[%d]: %lf\n", i, (double) Lsrc[i] / 2147483648.0);
    printf("-------------------\n");
    for (i = 0; i < LEN; i++) printf("out[%d]: %lf\n", i, (double) LreciprtL(Lsrc[i]) / 2147483648.0);

    return 0;
}


int main()
{
    return bench_reciprt();
}

 

/* LreciprtL.c */
#include "int_math.h"
/* uncomment this to enable the workaround, so the function could give the right answer, e.g. 0.760045^(-0.5) / 2 = 0.573522 (/2 is for down scale to smaller than 1.0) */
//#define ICL_WORKAROUND
static const int L05 = 1073741824;
/* Calculate x^(-0.5) for 0.25 < x < 1, result in 2Q30 (down scaled by 2)*/
int
LreciprtL(int x)
{
    const int PLUSONE2Q30 = L05;

    const int  a0 = (const int) (-3.4982 / 4 * 2147483648.0 + 0.5);
    const short  a1 = (const short) ( 1.8077 / 4 * 32768.0 + 0.5);
    const int iy0 = (const int) ( 2.7260 / 4 * 2147483648.0 + 0.5);
#ifdef ICL_WORKAROUND
    int i;
#endif
    int a  = LmacLLS(a0, x, a1);
    int iy = LmacLLS(iy0, x, S_L(a));

    iy = LshlLU(iy, 1); 


#ifdef ICL_WORKAROUND
    for (i = 0; i < 3; i++)
    {
        a =  LmpyLL(x, iy) ; 
        a =  LsubLL(PLUSONE2Q30, LshlLU(LmpyLL(a, iy), 1)) ; 
        iy = LmacLLL(iy, a, iy) ; 
    }
#else
    a =  LmpyLL(x, iy) ; 
    a =  LsubLL(PLUSONE2Q30, LshlLU(LmpyLL(a, iy), 1)) ; 
    iy = LmacLLL(iy, a, iy) ; 

    

    a =  LmpyLL(x, iy) ; 
    a =  LsubLL(PLUSONE2Q30, LshlLU(LmpyLL(a, iy), 1)) ; 
    iy = LmacLLL(iy, a, iy) ; 

    

    a =  LmpyLL(x, iy) ; 
    a =  LsubLL(PLUSONE2Q30, LshlLU(LmpyLL(a, iy), 1)) ; 
    iy = LmacLLL(iy, a, iy) ; 
#endif

    return iy ;
}

 

 

/* int_math.h */
/* Define basic math operations */
#define _asl32(a, s) ((a) * (1 << (unsigned)(s)))

static __forceinline int L_A (int a)
{
    return a + a;
}

static __forceinline short S_L (int a)
{
    return (short) (a >> 16);
}

static __forceinline int LshlLU (int a, unsigned s)
{
    return (int) _asl32(a, s);
}
static __forceinline int LsubLL(int a, int b) {
    return a - b; }

static __forceinline int AmpyLL (int a, int c)
{
    return (int)(((long long)a * c) >> 32);
}

static __forceinline int LmpyLL (int a, int c)
{
    return L_A(AmpyLL(a, c));
}


static __forceinline int AmpyLS (int a, short c)
{
    return (int)(((long long)a * c) >> 16);
}

static __forceinline int LmacLLS (int a, int x, short y)
{
    return a + L_A(AmpyLS(x, y));
}

static  __forceinline int
LmacLLL(int a, int x, int y) {
    return a + LmpyLL(x, y); }

 

The problem is found on icl 13.1.x with MSVS 2010 or 2012, on windows 7 64 bit machine. The compiler is set to build intel64 targets, and Multi-File optimization is on (/Qipo).

Steps to reproduce the issue

unzip the attached project

open ConsoleApplication1.sln with VS2012, build release flavor.

run x64\Release>ConsoleApplication1.exe

the result would be:

in[0]: 0.760045
in[1]: 0.760045
-------------------
out[0]: -0.319917
out[1]: -0.319917

definitely wrong for x^(-0.5) which should be positive.

 

Ways to mitigate the issue:

1. define ICL_WORKAROUND in LreciprtL.c

2. set LEN to 1 in main.c

3. Turn off global optimization using IDE settings (set interprocedural optimization to Single file /Qip)

4. use #pragma optimize("", off) and #pragma optimize("", on) to turn off optimization around function LreciprtL() in LreciprtL.c

Either one of the 4 ways above could give the right answer:

in[0]: 0.760045
in[1]: 0.760045
-------------------
out[0]: 0.573522
out[1]: 0.573522

 

AttachmentSize
Downloadapplication/zip ConsoleApplication1_0.zip666.46 KB
16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi,

Apparently it’s a /Qipo optimization issue which enables inlining is causing the issue.

/ob0 option is also fixes the issue.

Thanks,

Reddy

Sorry, the above fix(/ob0) is a wrong observation, anyway this issue can be reproduced with 15.0 compiler also.

I will raise this issue to development team and keep update you on the status.

Thanks,

Reddy

 

Quote:

Mal.Reddy (Intel) wrote:

Sorry, the above fix(/ob0) is a wrong observation, anyway this issue can be reproduced with 15.0 compiler also.

I will raise this issue to development team and keep update you on the status.

Thanks,

Reddy

 

 

Thanks Reddy for your quick reply.

So /ob0 won't fix this issue means this issue is not caused by inlining but some thing else?

Another question is this issue could be reproduced with 15.0, does that mean that all versions from icl 13.0 to 15.0 would all have this issue?

 

Thanks,

Eugene

Hi Eugene,

Yes, the issue is at different optimization phase in the /Qipo and it is a regression with other compiler versions as mentioned.

Same is reported to compiler development team.

Thanks,

Reddy

 

Hi,

In the below code calls,

    int iy = LmacLLS(iy0, x, S_L(a));

    iy = LshlLU(iy, 1);

The result overflows the legal limit for signed integer,2^31.

So the work around is use the switch “-Qstrict-overflow-“by which the compiler will be careful not to optimize in a way that creates temporary values that may overflow.

However investigation is continued to detect this kind of cases automatically and avoid optimizing without reducing performance benefits.

Thanks,

Reddy

 

Hi Reddy,

Thanks for your investigation, but I can't find -Qstrict-overflow or anything similar on ICL 13.1 manual, and I'm using MSVS. Is it a new feature that is not supported in 13.x?

Thanks,

Eugene

 

Quote:

Mal.Reddy (Intel) wrote:

Hi,

In the below code calls,

    int iy = LmacLLS(iy0, x, S_L(a));

    iy = LshlLU(iy, 1);

The result overflows the legal limit for signed integer,2^31.

So the work around is use the switch “-Qstrict-overflow-“by which the compiler will be careful not to optimize in a way that creates temporary values that may overflow.

However investigation is continued to detect this kind of cases automatically and avoid optimizing without reducing performance benefits.

Thanks,

Reddy

 

Can anyone in Intel answer the question above?

Thanks,
Richard

Could you answer Eugene's question? It blocks us at present.

Thanks,
Richard

Quote:

Eugene M. wrote:

Hi Reddy,

Thanks for your investigation, but I can't find -Qstrict-overflow or anything similar on ICL 13.1 manual, and I'm using MSVS. Is it a new feature that is not supported in 13.x?

Thanks,

Eugene

 

Quote:

Mal.Reddy (Intel) wrote:

Hi,

In the below code calls,

    int iy = LmacLLS(iy0, x, S_L(a));

    iy = LshlLU(iy, 1);

The result overflows the legal limit for signed integer,2^31.

So the work around is use the switch “-Qstrict-overflow-“by which the compiler will be careful not to optimize in a way that creates temporary values that may overflow.

However investigation is continued to detect this kind of cases automatically and avoid optimizing without reducing performance benefits.

Thanks,

Reddy

 

 

Adding "/Qstrict-overflow-" to Configuration Peroperties-->C/C++-->Command Line--> Additional Options would fix this.

Though no more documentation could be found for this option in icl User and Reference Guide.

Any Intel staff could provide more info about this option?

Thanks,

Eugene

Hi Eugene,

I am also not able to find documentation for  "/Qstrict-overflow-" option.

But this option is similar to gcc option -fno-strict-overflow in Linux which is also workaround for your issue.

So you can find documentation for -fno-strict-overflow option in the below link.

https://gcc.gnu.org/gcc-4.2/changes.html

which basically disables -fstrict-overflow which is turned on by default at -O2.

Let me check with development team why it is not documented.

Thanks,

Reddy

Hi Reddy,

Thanks for your explanation, I'm wondering does this option assumes that it could do some transform so that the left shift to iy, i.e.,

iy = LshlLU(iy, 1); 

could be saved by left shift constants a0, a1 and iy0 beforehand?

 

Also, this option should only affect fixed-point code optimization, floating point algorithms should not be affected, right?

 

Thanks,

Eugene

Quote:

Mal.Reddy (Intel) wrote:

Hi Eugene,

I am also not able to find documentation for  "/Qstrict-overflow-" option.

But this option is similar to gcc option -fno-strict-overflow in Linux which is also workaround for your issue.

So you can find documentation for -fno-strict-overflow option in the below link.

https://gcc.gnu.org/gcc-4.2/changes.html

which basically disables -fstrict-overflow which is turned on by default at -O2.

Let me check with development team why it is not documented.

Thanks,

Reddy

Hi Eugene,
This option does not affect floating point. It only affects integer operations.
The default is for the compiler to assume that it is safe to make integer transformations without causing signed overflow.
In this case, it is assuming the distributive property:
a * (b + c) = a * b + a * c

The comment is correct, it is to allow more constants to be used.

With -fno-strict-overflow or -Qstrict-overflow-, the compiler will be safe and assume all integer operations can overflow.

For floating point, there is -fp-model precise. It assumes that reordering expressions may cause precision errors.

-Qstrict-overflow- option was actually added for GCC compatibility, and was put in the MS version of the compiler to match the feature set.
That is why documentation is not there, we assumed that most people would be GCC users.
GCC is probably the best source for documentation:

https://gcc.gnu.org/wiki/FAQ

It's -fno-strict-overflow.

 

Thanks,
Reddy

Hi Reddy,

Thanks a lot for your explanation. 

I guess the assumption of applying distributive property wouldn't introduce overflow is kind of risky. At least in Audio signal processing, where in

a * (b + c) = a * b + a * c

b and c could be Q31 numbers, i.e. they could be really close to +/- 2^31, there's often no guarantee that a transform like this wouldn't overflow intermediate value.

Do you think it could be safer if this option is not enabled by default by -O2?

 

Thanks,

Eugene

 

Quote:

Mal.Reddy (Intel) wrote:

Hi Eugene,
This option does not affect floating point. It only affects integer operations.
The default is for the compiler to assume that it is safe to make integer transformations without causing signed overflow.
In this case, it is assuming the distributive property:
a * (b + c) = a * b + a * c

The comment is correct, it is to allow more constants to be used.

With -fno-strict-overflow or -Qstrict-overflow-, the compiler will be safe and assume all integer operations can overflow.

For floating point, there is -fp-model precise. It assumes that reordering expressions may cause precision errors.

-Qstrict-overflow- option was actually added for GCC compatibility, and was put in the MS version of the compiler to match the feature set.
That is why documentation is not there, we assumed that most people would be GCC users.
GCC is probably the best source for documentation:

https://gcc.gnu.org/wiki/FAQ

It's -fno-strict-overflow.

 

Thanks,
Reddy

Hi Eugene,

Engineering team says that this kind of transformation doesn't overflow, but if you face any issue please let us know.

Thanks,

Reddy

 

I don't see how you can argue that the replacement

a * (b + c) => a * b + a * c

is safe for signed arithmetic.

Since the floating point analogy was introduced in the thread above, it may be worth pointing out that Fortran standard for floating point expressions (since 1966) disallows this replacement, but specifically encourages replacement the other way:

a * b + a * c => a * (b + c)

gfortran makes such replacements, while gcc does not.  So I'd be surprised if gcc would distribute signed integer arithmetic, or that a bugzilla would not have been filed if it did.

Also specifically provided by standard Fortran and C is the option to set parentheses to block such a replacement:

(a * b) + (a * c)

must not be associated, nor is a fused multiply-add permitted.  Intel compilers violate such rules when /fp:fast is set, and that is the default. So it seems icl takes similar chances on signed integer arithmetic, but with a different option to control it.

Intel Fortran provides options such as -standard-semantics to comply with the standard other than by knowing a bunch of options such as -fp:source.  Customers don't use the option much, in part because there were unexpected performance implications (improved upon in 15.0 release).  Making standard-compliant observance of parentheses available by command line option has been proposed for Intel C++ but seems to have been turned down each time.  At one time it was stated that no Intel C++ customer should want standard compliance in this respect unless they were willing to discard all other optimizations which might violate the standard.

Leave a Comment

Please sign in to add a comment. Not a member? Join today