Problem with a variable put on the stack when it should not be

Problem with a variable put on the stack when it should not be

Hi everyone

I had a problem on a project, and since it's very complicated, i tried to make a very simple test program which may have no sense at all (and may seem to be very trivial), but shows what is my problem with the Intel C++ compiler optimizer.

#include 
#include 

inline  void  DoLine(__m128 & InOutput, float const * __restrict Input, 
				   __m128 const & SomeConstant, int iColumn, int iCount)
{
	while (iCount > 0)
	{
		InOutput = 
			_mm_add_ps(InOutput, 
				_mm_mul_ps(
					_mm_load_ps(&Input[iColumn]), 
					SomeConstant));

		iCount  -= 4;
		iColumn += 4;
	}
}

void __declspec(noinline) __declspec(noalias) SomeFunction(__m128 & res)
{
	res = _mm_mul_ps(res, res);
}

__m128 TestFunction(void * pool)
{
	__m128 res = _mm_load_ps((float const *)pool);
	
	SomeFunction(res);
	DoLine(res, (float const *)pool, _mm_set1_ps(2.f), 0, 512);

	return res;
}


int main()
{
	 void * pool = _mm_malloc(1024*16, 16);

	__m128 res = TestFunction(pool);

	_MM_ALIGN16 float result[4];
	_mm_store_ps(result, res);
	printf("Result is: %f %f %f %fn", result[0], result[1], result[2], result[3]);

	_mm_free(pool);
	return 0;
}

I do use default compilers options (/O2) and what i'm intrested in is the assembler code generated for the DoLine function.

In TestFunction, if I comment out "SomeFunction", basically the code of main loop look like this:

main+34h:
00401034  movaps      xmm2,xmmword ptr [esi+edx*4] 
00401038  mulps       xmm2,xmm0 
0040103B  addps       xmm1,xmm2 
0040103E  add         edx,4 
00401041  add         eax,0FFFFFFFCh 
00401044  test        eax,eax 
00401046  jg          main+34h (401034h) 
00401048  movaps      xmmword ptr [result],xmm1 

Which is what I expect:

* The constant "SomeConstant" is kept in a register "xmm0"

* The accumulator "InOutput" is kept in a register "xmm1" until the end of the loop, where it is put back on the stack (actually it's directly put in "result" in main, which is even better).

But if for some reason, before that loop, my accumulator "falls" in a non inlined function call "SomeFunction", the generated code becomes:

main+47h:
00401047  movaps      xmm2,xmmword ptr [ebx+edx*4] 
0040104B  mulps       xmm2,xmm1 
0040104E  addps       xmm0,xmm2 
00401051  movaps      xmmword ptr [esp+40h],xmm0 
00401056  add         edx,4 
00401059  add         eax,0FFFFFFFCh 
0040105C  test        eax,eax 
0040105E  jg          main+47h (401047h) 
00401060  movaps      xmmword ptr [result],xmm0 

as you can see before looping, the compilers put backs xmm0 on the stack because I guess, he is afraid that my "Input" pointer aliases with my accumulator.

Neither attribute __declspec(noalias) nor the restrict keyword used on "Input" help the compiler to find out that he does not have to put xmm0 back on [esp+40h].

If anyone can help.

Thanks in advance,

Best Regards

Edit: I tried to compile with /Oa (assume no aliasing on the whole program) and i get the same generated code.

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.