Wrong code or wrong code-optimization?

Wrong code or wrong code-optimization?

Hi,

in contrast to the Debug-Mode the following code-snippet gets in the Release-Mode broken.
The previous used compilers (integrated one in Visual Studio of Mircosoft and the gcc in various versions on linux machines) do neither complain nor break the code.

I use Visual Studio 2010 (Professional, OS is Windows 7) with mainly the default settings according "Release" and "Debug" modes. Target platform is x64.
Visual Studio -> About ... -> Intel Composer gives me:

Intel® C++ Composer XE 2013 Update 1 Integration for Microsoft* Visual Studio* 2010, Version 13.0.1194.2010,

But this problem also occurred on linux.

Visual Studio -> Project preferences -> C/C++ -> "Command Line" gives me: (Thats mainly the default stuff of Visual Studio.)
Release:

/Zi /nologo /W3 /O2 /Oi /Qipo /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /GS /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Fp"x64Releaseintel_test.pch" /Fa"x64Release" /Fo"x64Release" /Fd"x64Releasevc100.pdb" /Qvec-report2 /Qopt-report:3

Debug:
 /Zi /nologo /W3 /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /RTC1 /GS /fp:precise /Zc:wchar_t /Zc:forScope /Fp"x64Debugintel_test.pch" /Fa"x64Debug" /Fo"x64Debug" /Fd"x64Debugvc100.pdb" 

The output during compiling in Release mode (see vec-report / opt-report compiler option) says:


	1> <;-1:-1;IPO UNREFERENCED VAR REMOVING;raiseError;0>

	1> UNREFERENCED VAR REMOVAL:ROUTINE SYMTAB (raiseError):

	1>

	1> REMOVED VAR rij_helper.933.0_V$d

	1> REMOVED PACK rij_helper.933.0

	1>

	1> <;-1:-1;IPO UNREFERENCED VAR REMOVING;;0>

	1> UNREF VAR REMOVAL ROUTINE-SYMTAB (raiseError):VARS(1),PACKS (1)

	

But this seems to be wrong in my eyes!
I DO reference this array, but as/via pointer (see line 48 and 70 in the code below).
Is that kind of referring an array not legal?


	#include <stdio.h>

	#include <stdio.h>

	#include <stdlib.h>

	#include <string.h>
	/*

	  tolerance to check, if two atoms have the same coordinates (needed for error check only):

	*/

	#define TOL_SAME_COORDINATES 1.e-8

	typedef struct {

	    /* a pointer to the first item of this vector. */

	    double* firstItem;

	    /* the number of items this vector contains. */

	    int numberOfItems;

	} RealVector;

	/* this function allocates a "numberOfItems" big array for double values.

	    the pointer to the memory region and the number of values is stored in the given "RealVector" structure.

	    All entries in the new allocated memory are set to zero. */

	void allocateRealVectorInitWithZero(RealVector *theVectorToAllocate, const int numberOfItems) {

	    theVectorToAllocate->firstItem = (double*)malloc( numberOfItems * sizeof(double) );

	    theVectorToAllocate->numberOfItems = numberOfItems;

	    memset(theVectorToAllocate->firstItem, 0, theVectorToAllocate->numberOfItems * sizeof(double)); 

	}
	/*

	    "accessor" function to the "positionOfValue"-th value of the given RealVector struct.

	    Notice: a pointer to the position is returned.

	        over this pointer the caller function is able to change the value

	        inside of the RealVector */

	double* getValueOfRealVector(RealVector theVector, const int positionOfValue) {

	    // here could be placed some checks, when compiling in DEBUG-mode.... removed to keep the example small.

	    return theVector.firstItem + positionOfValue;

	}
	/* simple dummy function to demonstrate the problem. */

	int raiseError(RealVector input_dof_np1) {
	    int        k;

	    double       qSum = 0.;

	    double       tol_rel, characteristic_length = 0.;
	    /*

	        at this point the size of the array is known at compile time.

	        No need to allocate heap-memory (e.g. via malloc or via the own helper function above, allocateRealVector..)

	        (Static initialization in this way only done to avoid uninitialized memory access in the first output-loop)

	    */

	    double rij_helper[3] = { 0, 0, 0 };
	    /*

	        A "RealVector" structure is needed - e.g. to be able to pass this values(/array) into other functions.

	        this is not shown in this example, but required in the original project.

	        here the double-array rij_helper is implicit converted into a double-pointer.

	    */

	    RealVector rij = { rij_helper, 3 };

	    /*

	        #######################

	        just some output to keep track of the values

	    */

	    printf("Initial values:n qSum=%+e characteristic_length=%+en",qSum, characteristic_length);

	    for (k=0; k < 3; k++) {

	        printf("k=%2d   RIJ(k)=%+e = RJ(k)=%+e + RI(k)=%+e;n", k,

	            (*getValueOfRealVector( rij, k )),

	            (*getValueOfRealVector(  input_dof_np1, 1 * 3 + k)),

	            (*getValueOfRealVector(  input_dof_np1, 0 * 3 + k)));

	    }
	    /*

	        #######################

	        the problematic loop - inside here the problematic part if code is placed!

	    */

	    for (k=0; k < 3; k++) {
	        /*

	            Here is the problem: using "(*getValueOfRealVector( rij, k ))" results in a wrong result (for qSum)

	            but when using "(*(rij.firstItem + k))" its correct.

	            But, according the contents of "getValueOfRealVector" this should be the same!

	        */

	        (*getValueOfRealVector( rij, k )) =         //(*(rij.firstItem + k)) =  //==> this instead and it would work...?!

	              (*getValueOfRealVector(  input_dof_np1, 1 * 3 + k))

	            - (*getValueOfRealVector(  input_dof_np1, 0 * 3 + k));
	        qSum +=

	              (*getValueOfRealVector( rij, k ))

	            * (*getValueOfRealVector( rij, k ));
	        characteristic_length +=  

	              (*getValueOfRealVector(  input_dof_np1, 0 * 3 + k))

	            * (*getValueOfRealVector(  input_dof_np1, 0 * 3 + k))

	            + (*getValueOfRealVector(  input_dof_np1, 1 * 3 + k))

	            * (*getValueOfRealVector(  input_dof_np1, 1 * 3 + k));  

	    }
	    /*

	        #######################

	        just some output to keep track of the values

	    */

	    printf("final values:n qSum=%+e characteristic_length=%+en",qSum, characteristic_length);

	    for (k=0; k < 3; k++) {

	        printf("k=%2d   RIJ(k)=%+e = RJ(k)=%+e + RI(k)=%+e;n", k,

	            (*getValueOfRealVector( rij, k )),

	            (*getValueOfRealVector(  input_dof_np1, 1 * 3 + k)),

	            (*getValueOfRealVector(  input_dof_np1, 0 * 3 + k)));

	    }
	    /*

	        #######################

	        further work with the calculated values

	            at this point we recognise the inpact of the miscalculated qSum variable
	            "tol_.." stands for tolerance - this is taken from the original code to demonstrate the further usage of the variables.

	            Please do not wonder about variable names.

	    */
	    tol_rel = TOL_SAME_COORDINATES * characteristic_length;
	    if (qSum <= tol_rel) {

	        printf("qSum=%e <= tol_rel=%e -- TOL_SAME_COORDINATES= %+e  characteristic_length=%+en", qSum, tol_rel, TOL_SAME_COORDINATES, characteristic_length);

	        return -1;

	    }
	    return 1;

	}

	int main(int argc, char* argv[]) {
	    RealVector theInput;

	    int res;
	    //allocate memory, values will be initalized with 0

	    allocateRealVectorInitWithZero(&theInput, 2*3);
	    //set some specific values:

	    *getValueOfRealVector(theInput, 1) = -1.39000000000000010000e-001;
	    //call the function with the problematic code inside:

	     res = raiseError(theInput);
	     if(res != 1) {

	         printf("function is NOT correct!n");

	         do {

	             printf("Please press 'Enter' to continue.n");

	         } while(getchar() != 'n');

	     } else {

	         printf("function is correct!n");

	     }

	     printf("function is finishedn");
	     //free the allocated memory (inside allocateRealVector..) to avoid memory leak

	     free(theInput.firstItem);
	     return res;

	}

	

"Footnote":
I am definitely no "C-expert" and I am completely new to the intel compiler/composer.
The original code/project is written in plain C and is more or less old and big.
So the stuff here are no ideas of me "in the green countryside" - its integreated in a bigger project I have to work with.
Therefore this problem occurs at multiple places. Just rewriting the Code at all in C++ is NO option.
Also the "RealVector"-struct and the accessor/getter function(s) are "given" from before and could not be changed in general in simply manner.
But nevertheless - all kinds of suggestions are welcome.
 

10 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I'd like to provide updated sources and compilation log for reference. Please take a look at attached files.

Attachments: 

AttachmentSize
Download test5.cpp7.64 KB
Download test5.log24.94 KB

I reproduced the compilation issue with Intel C++ Compiler XE 12.1.7.371 [ IA-32 ] ( Update 7 ). However, outputs in both configurations are identical:

** DEBUG configuration OUTPUT **

Initial values:
qSum=+0.000000e+000 characteristic_length=+0.000000e+000
k= 0 RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
k= 1 RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=-1.390000e-001;
k= 2 RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
final values:
qSum=+1.932100e-002 characteristic_length=+1.932100e-002
k= 0 RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
k= 1 RIJ(k)=+1.390000e-001 = RJ(k)=+0.000000e+000 + RI(k)=-1.390000e-001;
k= 2 RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
function is correct!
function is finished

** RELEASE configuration OUTPUT **

Initial values:
qSum=+0.000000e+000 characteristic_length=+0.000000e+000
k= 0 RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
k= 1 RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=-1.390000e-001;
k= 2 RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
final values:
qSum=+1.932100e-002 characteristic_length=+1.932100e-002
k= 0 RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
k= 1 RIJ(k)=+1.390000e-001 = RJ(k)=+0.000000e+000 + RI(k)=-1.390000e-001;
k= 2 RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
function is correct!
function is finished

My question is do you have a different output in Release configuration?

Thanks for your answer.
Yes I have different output in Release configuration!

My output for x64-Release-Mode:

Initial values:
 qSum=+0.000000e+000 characteristic_length=+0.000000e+000
k= 0   RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
k= 1   RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=-1.390000e-001;
k= 2   RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
final values:
 qSum=+0.000000e+000 characteristic_length=+1.932100e-002
k= 0   RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
k= 1   RIJ(k)=+1.390000e-001 = RJ(k)=+0.000000e+000 + RI(k)=-1.390000e-001;
k= 2   RIJ(k)=+0.000000e+000 = RJ(k)=+0.000000e+000 + RI(k)=+0.000000e+000;
qSum=0.000000e+000 <= tol_rel=1.932100e-010 -- TOL_SAME_COORDINATES= +1.000000e-008  characteristic_length=+1.932100e-002
function is NOT correct!
Please press 'Enter' to continue.

The output for the Debug-Mode is simply this one you got.

Double checking to have default settings from Visual Studio and Intel Composer I created a complete new solution with a new project in
Visual Studio.
Things done additional:
* switched to/enabled "x64" as target platform.
* Right-click on project "Use Intel Composer"

Because I am neither familiar with generation of compilation logs nor compiling from comand line (in Windows) I attached the whole new
generated solution-folder.

Hopefully thats helps to reproduce the problem.

Attachments: 

AttachmentSize
Download intel-test-vector.zip2.5 MB

I reproduced the issue on a linux machine.

[myname@icluster intel_test]$ uname -a
Linux icluster 2.6.32-358.6.1.el6.x86_64 #1 SMP Tue Apr 23 16:15:13 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux
[myname@icluster intel_test]$ icc -V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.0.1.117 Build 20121010
Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

## Debug - works

[myname@icluster intel_test]$ icc -O0 -DDEBUG -o test_Debug test.c
[myname@icluster intel_test]$ ./test_Debug
Initial values:
 qSum=+0.000000e+00 characteristic_length=+0.000000e+00
k= 0   RIJ(k)=+0.000000e+00 = RJ(k)=+0.000000e+00 + RI(k)=+0.000000e+00;
k= 1   RIJ(k)=+0.000000e+00 = RJ(k)=+0.000000e+00 + RI(k)=-1.390000e-01;
k= 2   RIJ(k)=+0.000000e+00 = RJ(k)=+0.000000e+00 + RI(k)=+0.000000e+00;
final values:
 qSum=+1.932100e-02 characteristic_length=+1.932100e-02
k= 0   RIJ(k)=+0.000000e+00 = RJ(k)=+0.000000e+00 + RI(k)=+0.000000e+00;
k= 1   RIJ(k)=+1.390000e-01 = RJ(k)=+0.000000e+00 + RI(k)=-1.390000e-01;
k= 2   RIJ(k)=+0.000000e+00 = RJ(k)=+0.000000e+00 + RI(k)=+0.000000e+00;
function is correct!
function is finished

## Release - does NOT work

[myname@icluster intel_test]$ icc -O3 -DNDEBUG -o test_Rel test.c
[myname@icluster intel_test]$ ./test_Rel
Initial values:
 qSum=+0.000000e+00 characteristic_length=+0.000000e+00
k= 0   RIJ(k)=+0.000000e+00 = RJ(k)=+0.000000e+00 + RI(k)=+0.000000e+00;
k= 1   RIJ(k)=+0.000000e+00 = RJ(k)=+0.000000e+00 + RI(k)=-1.390000e-01;
k= 2   RIJ(k)=+0.000000e+00 = RJ(k)=+0.000000e+00 + RI(k)=+0.000000e+00;
final values:
 qSum=+0.000000e+00 characteristic_length=+1.932100e-02
k= 0   RIJ(k)=+0.000000e+00 = RJ(k)=+0.000000e+00 + RI(k)=+0.000000e+00;
k= 1   RIJ(k)=+1.390000e-01 = RJ(k)=+0.000000e+00 + RI(k)=-1.390000e-01;
k= 2   RIJ(k)=+0.000000e+00 = RJ(k)=+0.000000e+00 + RI(k)=+0.000000e+00;
qSum=0.000000e+00 <= tol_rel=1.932100e-10 -- TOL_SAME_COORDINATES= +1.000000e-08  characteristic_length=+1.932100e-02
function is NOT correct!
Please press 'Enter' to continue.
function is finished

The file test.c contains the code from my intial post.
But with fixed newlines in output prints ('\n' was corrputed to 'n').
I attached this code to this post, too.

Attachments: 

AttachmentSize
Download test.c5.89 KB

I will verify the original Test5.cpp you've posted ( just for consistency with my previous verification ) on a 64-bit Windows 7 Professional with Intel C++ compiler version 13 Update 2. Unfortunately, I won't be able to verify it with Update 1.

As you can see 32-bit codes work properly and I'll let you know results later.

Hi are there any news?

Currently I am only working with a workaround:

I defined "getValueOfRealVector" as Macro instead as a function, than it works properly.

Hi,

I was just curious about that - and I can reproduce the problem (C++ Compiler XE 13.1.2.190 [IA-32]). However, I believe this is not the compiler's fault but it originates from how the function getValueOfRealVector() is being used:

The compiler is free to reorder any instructions to generate optimized code as long as he may assume that it does not change the meaning and results of an algorithm. In your code, the function hides the actual data access, and the static code analysis of the compiler cannot see any dependencies between reading and writing data access, which is caused be reading from and writing multiple times to the identical sets of data. Thus the compiler has two choices: Either assume a possible dependency and forget about optimization, or assume no dependency and optimize at its best. The first is safe but maybe not optimal, the second is the opposite. Moreover, it seems quite hard to me (as a compiler) to assume a dependency by just analyzing the function calls because I cannot know what happens inside with the function's arguments. Of course, the compiler could also try to disassemble the function and try to analyze it - but this would maybe be a too deep and complicated dependency analysis to implement it in a general manner in the compiler.

Composer v12 obviously prefers the safer way or does less aggressive optimization (or knows better how to disassemble functions), and Composer v13 prefers better or another optimization which implies some instruction reordering which leads to incorrect results. By replacing the function with direct data access (or a macro), the compiler can clearly see the dependency because it is not anymore hidden by a function call and optimizes correctly. In this case, I would assume the compiler changes the order of reads, writes, and function calls in the calculation loop in an unlucky way - and unfortunately cannot know that it does. Actually, when I place "#pragma novector" before the calculation loop, the function works for me in release mode.

In debug mode, this problem does not show up because all optimization is disabled.

Greetings, Toni

I would like to suggest submitting the test case to premier support. This is not the first time that Intel Compiler removes needed things in a bit overzealous attempt at optimization.

-- Regards, Igor Levicki If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you.

Toni Z:

RealVector rij = { rij_helper, 3 };

Is definately using rij_helper.

rij_helper is not const so the compiler cannot hoist the literal value (point to the const initializor for rij_helper).

What the compiler did was wrong. To assume it was correct is to assume it is ok to have a program that mabe works.

Hopefully the included code will produce the same error for the Intel support persons.

Jim Dempsey

www.quickthreadprogramming.com

Leave a Comment

Please sign in to add a comment. Not a member? Join today