precision issues

precision issues

Hello Sir,
I have a precision issue with the below code. If I do the calculations for the same input in my calculator I get -13421772.8
Whereas with compiler I get -13421773.0, and this is a considerable difference for us.
The variable used for the above observation is ‘tmp’.
Please help us in resolving this.
Thanks in-advance.

void convert(__m128 &vrz /*inout*/, int art)
{
unsigned int _rounding_mode;
if(1)
{
_rounding_mode = _MM_GET_ROUNDING_MODE();
_MM_SET_ROUNDING_MODE(_MM_ROUND_TOWARD_ZERO);
}
__m128 tmp, scale_vr;
const float scale = (float)((unsigned int)1<<(31-(art)));
scale_vr = _mm_set1_ps(scale);
tmp = _mm_mul_ps(vrz, scale_vr);
vrz = _mm_insert_ps(vrz, _mm_castsi128_ps(_mm_cvtps_epi32(tmp)) , ((1)<<6) | ((1)<<4));
if(1)
{
_MM_SET_ROUNDING_MODE(_rounding_mode);
}
}

void main()
{
float a =( float) -0.8;
m128 vrz;
vrz = _mm_set1_ps(a);
Convert(vrz,7)
}

Thanks,
Eswar Reddy K

16 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

I'll take a look at the issue. Could you provide some additional technical details, like:

- OS version? 32-bit or 64-bit?
- Compiler version and a complete set of command line options?

64-bit OS, Visual Studio 2010 (32-bit mode)
I am running from VS 2010, debug mode, these are my compiler options:
Disabled
AVX2
Default
false
Precise
Default

sorry display proble... below are my compiler options:

WarningLevel: Level3
Optimization: Disabled
UseProcessorExtensions:AVX2
BasicRuntimeChecks : Default
AdditionalOptions : /fp:precise
FlushDenormalResultsToZero : false
FloatingPointModel: Precise
FloatingPointExpressionEvaluation: Default

Eswar,

At issue here may be:

float a = (float)-0.8;

Where a does not use the same rounding mode (round down). As a quick test, compile as Debug build. After setting a=, open a Memory window and examine "&a". View as unsigned 1-byte integer. You should see "205 204 76 191". Subtract 1 from the 205 to undo the round up. Had this been zero, then 0-1 produces 255 with borrow propigating to next byte (i.e. subtract 1 from next byte). There will be some cases where the exponent will need to be adjusted, but this is not necessary for this experiment.

Once the value of a has been adjusted, continue and check the result.

For a formal fix, you will have to be careful as to how you preset your parameters that contain fractional values that cannot be precisely represented in binary. 0.1 is one such fraction as is 0.8.

Jim Dempsey

www.quickthreadprogramming.com

Sorry for some delay with my investigation.

Hi Eswar & Jim,

I just completed tests and reproduced the problem in several configurations, like Debug and Release, 32-bit and 64-bit, with Intel C++ compilers ( versions 12.x & 13.x ) and Microsoft C++ compilers ( VS 2005 & VS 2008 ), with rounding and without rounding, with Floating Point Model set to Precise ( /fp:precise ) or Fast ( /fp:fast ) or Strict ( /fp:strict ).

In essence, it doesn't matter what configuration ( or settings ) is selected the _mm_mul_ps ( actually, MULPS instruction ) rounds the results (!). I've created my own test-case and debugged it. Here are some details:

Note: 16777216 = 2^24

Correct Result ( True ): 16777216 * 0.8 = 13421772.8 - everything is correct / _mm_mul_ps is Not used

Incorrect Result: 16777216 * 0.8 = 13421773.0 - something is wrong / _mm_mul_ps is used / rounding is done by MULPS instruction

I will spend some additional time this week however I would consider a workaround since I really do not expect that Intel will release a microcodes patch for the MULPS instruction unless we understand what is wrong.

...
[ Debug ]

Test-Case 1 ( 16777216 * -0.8 )
Expected Values : -13421772.800000 -13421772.800000 -13421772.800000 -13421772.800000
Calculated Values: -13421773.000000 -13421773.000000 -13421773.000000 -13421773.000000

Test-Case 2 ( 16777216 * 0.8 )
Expected Values : 13421772.800000 13421772.800000 13421772.800000 13421772.800000
Calculated Values: 13421773.000000 13421773.000000 13421773.000000 13421773.000000
...
[ Release ]

Test-Case 1 ( 16777216 * -0.8 )
Expected Values : -13421772.800000 -13421772.800000 -13421772.800000 -13421772.800000
Calculated Values: -13421773.000000 -13421773.000000 -13421773.000000 -13421773.000000

Test-Case 2 ( 16777216 * 0.8 )
Expected Values : 13421772.800000 13421772.800000 13421772.800000 13421772.800000
Calculated Values: 13421773.000000 13421773.000000 13421773.000000 13421773.000000
...

Note: Intrinsic function _mm_mul_ps is used for Calculated Values.

Thanks Sergey & Jim !

I have obsrved same behaviour irrespective of the configuration.

Eswar, your results are Absolutely correct and there is Nothing wrong.

Also, I've done another set of tests and here are results without rounding issues:

...
Test-Case 5
Expected Values : 13421772.800000 13421772.800000 13421772.800000 13421772.800000
Calculated Values: -13421772.800000 -13421772.800000 -13421772.800000 -13421772.800000

Test-Case 6
Expected Values : 13421772.800000 13421772.800000 13421772.800000 13421772.800000
Calculated Values: 13421772.800000 13421772.800000 13421772.800000 13421772.800000
...

Sergey Kostrov,

The results looks ok for test cases 5 & 6.

Can please provide compiler options and other options if any for the test cases 5 & 6

Thanks,

Eswar Reddy K

Best Reply

The issue you've experienced is Not related to any C++ compiler or command line options, etc. It is related to limitations of Single-Precision arithmetics. In order to improve the precision of your calculations a change to Double-Precision arithmetics needs to be done.

Try these simple tests:

16777216.0f + 1.0f = 16777216.0f - !!! - It is Not 16777217.0 due to limitation of Single-Precision arithmetics
16777216.0f + 2.0f = 16777218.0f
16777216.0f + 3.0f = 16777220.0f - !!! - It is Not 16777219.0 due to limitation of Single-Precision arithmetics

Actually rounding is probably done by micro-operation control signal (mulps decoded into corresponding uop).It is interesting what triggers the execution of rounding mode(some control bit being set when mulps is decoded)by SIMD FPU.

Thank you!

>>...Actually rounding is probably done by micro-operation control signal (mulps decoded into corresponding uop). It is interesting
>>what triggers the execution of rounding mode(some control bit being set when mulps is decoded)by SIMD FPU...

Take into account that there are only 24 bits to hold a mantissa value and it is not enough to represent 13421772.8 exactly.

IEEE 754 Standard describes all that stuff and take a look at it. The most accurate representation of 13421772.8 is 13421773.0.

In a binary form both numbers look like:

13421772.8 = 13421773.0 = 0x4B4CCCCD = 0 10010110 10011001100110011001101

Note 1: 1st digit is a Sign ( 0 is for positive ), followed by Exponent, followed by Mantissa.
Note 2: Use Debugger to verify it.

Eswar,

Since you will need to do some processing using Double-Precision arithmetics take a look at a collection of very useful threads related to that subject:

Forum topic: Support of 'long double' floating point data type on Intel CPUs ( A collection of threads )
Web-link: http://software.intel.com/en-us/node/375459

Thanks for the insight.

Connectez-vous pour laisser un commentaire.