Hello Sir,

I have a precision issue with the below code. If I do the calculations for the same input in my calculator I get -13421772.8

Whereas with compiler I get -13421773.0, and this is a considerable difference for us.

The variable used for the above observation is ‘tmp’.

Please help us in resolving this.

Thanks in-advance.

void convert(__m128 &vrz /*inout*/, int art)

{

unsigned int _rounding_mode;

if(1)

{

_rounding_mode = _MM_GET_ROUNDING_MODE();

_MM_SET_ROUNDING_MODE(_MM_ROUND_TOWARD_ZERO);

}

__m128 tmp, scale_vr;

const float scale = (float)((unsigned int)1<<(31-(art)));

scale_vr = _mm_set1_ps(scale);

tmp = _mm_mul_ps(vrz, scale_vr);

vrz = _mm_insert_ps(vrz, _mm_castsi128_ps(_mm_cvtps_epi32(tmp)) , ((1)<<6) | ((1)<<4));

if(1)

{

_MM_SET_ROUNDING_MODE(_rounding_mode);

}

}

void main()

{

float a =( float) -0.8;

m128 vrz;

vrz = _mm_set1_ps(a);

Convert(vrz,7)

}

Thanks,

Eswar Reddy K