Mixing of Floating-Point Types ( MFPT ) when performing calculations. Does it improve accuracy?

Mixing of Floating-Point Types ( MFPT ) when performing calculations. Does it improve accuracy?

imagem de Sergey Kostrov

Background

Some developers are Mixing Floating-Point Types ( let's call it as MFPT ) when performing calculations in order to improve accuracy of results.

For example,

- There is a data set of Single-Precision Floating-Point values ( that is, 'float' / precision 24-bit ) and some calculations have to be done

- Some intermediate results are saved in variables ( accumulators ) declared as Double-Precision Floating-Point type ( that is, 'double' / precision 53-bit )

Here is a question: Does it improve accuracy of results?

That depends on many factors and it is simply impossible to take all of them into account and to answer with a simple Yes or No. However, when MFPT is applied for a very simple algorithm it really improved accuracy.

Here are results of my evaluation of a classic matrix multiplication algorithm:

Matrix A
0101.0 0201.0 0301.0 0401.0 0501.0 0601.0 0701.0 0801.0
0901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0
1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0
2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0
3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0
4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0
4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0
5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0

Matrix B
0101.0 0201.0 0301.0 0401.0 0501.0 0601.0 0701.0 0801.0
0901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0
1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0
2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0
3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0
4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0
4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0
5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0

Matrix C = Matrix A * Matrix B ( 8x8 - 'float' type )

[ Example of Correct Results ]

Matrix C
013826808.0 014187608.0 014548408.0 014909208.0 015270008.0 015630808.0 015991608.0 016352408.0
032393208.0 033394008.0 034394808.0 035395608.0 036396408.0 037397208.0 038398008.0 039398808.0
050959608.0 052600408.0 054241208.0 055882008.0 057522808.0 059163608.0 060804408.0 062445208.0
069526008.0 071806808.0 074087608.0 076368408.0 078649208.0 080930008.0 083210808.0 085491608.0
088092408.0 091013208.0 093934008.0 096854808.0 099775608.0 102696408.0 105617208.0 108538008.0
106658808.0 110219608.0 113780408.0 117341208.0 120902008.0 124462808.0 128023608.0 131584408.0
125225208.0 129426008.0 133626808.0 137827608.0 142028408.0 146229208.0 150430008.0 154630808.0
143791608.0 148632408.0 153473208.0 158314008.0 163154808.0 167995608.0 172836408.0 177677208.0

Note: There are No any incorrect values in the Matrix C

As you can see the Matrix C has all correct values, that is, last two digits of any value in
the Matrix C are '...08'.

[ Example of Incorrect Results due to Rounding ]
[ All variables are Single-Precision FP type ( float ) / MFPT Not used ]

Matrix C
013826808.0 014187608.0 014548408.0 014909208.0 015270008.0 015630808.0 015991608.0 016352408.0
032393208.0 033394008.0 034394808.0 035395608.0 036396408.0 037397208.0 038398008.0 039398808.0
050959604.0 052600404.0 054241204.0 055882004.0 057522804.0 059163604.0 060804404.0 062445204.0
069526008.0 071806808.0 074087608.0 076368408.0 078649208.0 080930008.0 083210808.0 085491608.0
088092408.0 091013208.0 093934008.0 096854808.0 099775608.0 102696408.0 105617208.0 108538008.0
106658808.0 110219608.0 113780408.0 117341208.0 120902008.0 124462808.0 128023608.0 131584408.0
125225208.0 129426008.0 133626808.0 137827616.0 142028400.0 146229216.0 150430000.0 154630816.0
143791600.0 148632416.0 153473200.0 158314016.0 163154800.0 167995616.0 172836416.0 177677200.0

Note: There are 21 incorrect values in the Matrix C

[ Example of Incorrect Results due to Rounding ]
[ All variables are Single-Precision FP type ( float ) except for a Double-Precision FP type ( double ) ]
[ variable for accumulated sum / MFPT used ]

Matrix C
013826808.0 014187608.0 014548408.0 014909208.0 015270008.0 015630808.0 015991608.0 016352408.0
032393208.0 033394008.0 034394808.0 035395608.0 036396408.0 037397208.0 038398008.0 039398808.0
050959608.0 052600408.0 054241208.0 055882008.0 057522808.0 059163608.0 060804408.0 062445208.0
069526008.0 071806808.0 074087608.0 076368408.0 078649208.0 080930008.0 083210808.0 085491608.0
088092408.0 091013208.0 093934008.0 096854808.0 099775608.0 102696408.0 105617208.0 108538008.0
106658808.0 110219608.0 113780408.0 117341208.0 120902008.0 124462808.0 128023608.0 131584408.0
125225208.0 129426008.0 133626808.0 137827616.0 142028416.0 146229216.0 150430016.0 154630816.0
143791616.0 148632416.0 153473216.0 158314016.0 163154816.0 167995616.0 172836416.0 177677216.0

Note: There are 13 incorrect values in the Matrix C

Conclusion

As you can see application of MFPT improved accuracy, that is 13 incorrect values vs. 21 incorrect values in the Matrix C.

When MFPT is applied for some algorithm a developer could get improved accuracy of results, however some additional tests always have to be done in order to confirm that results are more accurate.

12 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de iliyapolak

Hi Sergey

Very intetresting results.I would like to add that testing for example some values which can not be exactly and accurately represented by the binary computer can be also insightful.Also catastrophic cancellelation impact on the accuracy of floating point addidtion(with different sign) or subtraction of very close values would be also interested to test.Finally I would like to recommend you a great book about the accuracy of floating point calculation "Real computing made real".

imagem de Sergey Kostrov

>>... I would like to recommend you a great book about the accuracy of floating point calculation "Real computing made real"...

Thanks. There are so many good books around and, as usual, there is not enough time to read all of them.

I admit that results are very interesting and, honestly, I really didn't expect to get some improvements in accuracy of calculations. Source codes will be posted some time later.

imagem de iliyapolak

>>>Thanks. There are so many good books around and, as usual, there is not enough time to read all of them.>>>

Completely agree with you.

That book has very interesting examples of mathematical calculations where floating point inaccuracy can be "catastrophic" to the results.If you want I could bring a few examples ffrom the book (there is need to code it).One of the examples is convergence failure of sine taylor expansion.During my tests I was able to obtain convergence up to radian value of 8.

imagem de Tim Prince

When using extra precision accumulation of dot products, it's usual to promote the multiplication as well:

http://www.netlib.org/blas/sdsdot.f

Although I sometimes try to push validation tests into x87 64-bit precision mode (where the extra precision multiplication is "free"), Kahan sum:

http://en.wikipedia.org/wiki/Kahan_summation_algorithm

works reliably (but even slower), if the compiler is set in a standards-compliant mode.  It's also a test of a compiler's parenthesis-eliding modes; if a compiler elides parentheses consistently, the result drops back to the same as a plain sum, rather than being destroyed.

imagem de Sergey Kostrov

[ Test-Case 1 - without MFPT ]
...
RTuint uiControlWordx87 = CrtControl87( _RTFPU_CW_DEFAULT, _RTFPU_MCW_PC );

// Matrix A - 8x8 - 'float' type
RTfloat fA[8][8] =
{
101.0, 201.0, 301.0, 401.0, 501.0, 601.0, 701.0, 801.0,
901.0, 1001.0, 1101.0, 1201.0, 1301.0, 1401.0, 1501.0, 1601.0,
1701.0, 1801.0, 1901.0, 2001.0, 2101.0, 2201.0, 2301.0, 2401.0,
2501.0, 2601.0, 2701.0, 2801.0, 2901.0, 3001.0, 3101.0, 3201.0,
3301.0, 3401.0, 3501.0, 3601.0, 3701.0, 3801.0, 3901.0, 4001.0,
4101.0, 4201.0, 4301.0, 4401.0, 4501.0, 4601.0, 4701.0, 4801.0,
4901.0, 5001.0, 5101.0, 5201.0, 5301.0, 5401.0, 5501.0, 5601.0,
5701.0, 5801.0, 5901.0, 6001.0, 6101.0, 6201.0, 6301.0, 6401.0
};

// Matrix B - 8x8 - 'float' type
RTfloat fB[8][8] =
{
101.0, 201.0, 301.0, 401.0, 501.0, 601.0, 701.0, 801.0,
901.0, 1001.0, 1101.0, 1201.0, 1301.0, 1401.0, 1501.0, 1601.0,
1701.0, 1801.0, 1901.0, 2001.0, 2101.0, 2201.0, 2301.0, 2401.0,
2501.0, 2601.0, 2701.0, 2801.0, 2901.0, 3001.0, 3101.0, 3201.0,
3301.0, 3401.0, 3501.0, 3601.0, 3701.0, 3801.0, 3901.0, 4001.0,
4101.0, 4201.0, 4301.0, 4401.0, 4501.0, 4601.0, 4701.0, 4801.0,
4901.0, 5001.0, 5101.0, 5201.0, 5301.0, 5401.0, 5501.0, 5601.0,
5701.0, 5801.0, 5901.0, 6001.0, 6101.0, 6201.0, 6301.0, 6401.0
};

// Matrix C - 8x8 - 'float' type
RTfloat fC[8][8] =
{
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
};

// All variables are Single-Precision type ( float )
for( RTint i = 0; i < 8; i++ )
{
for( RTint j = 0; j < 8; j++ )
{
fC[i][j] = 0.0f;
for( RTint k = 0; k < 8; k++ )
{
fC[i][j] += ( fA[i][k] * fB[k][j] );
}
}
}
...

imagem de Sergey Kostrov

[ Test-Case 2 - with MFPT ]
...
RTuint uiControlWordx87 = CrtControl87( _RTFPU_CW_DEFAULT, _RTFPU_MCW_PC );

// Matrix A - 8x8 - 'float' type
RTfloat fA[8][8] =
{
101.0, 201.0, 301.0, 401.0, 501.0, 601.0, 701.0, 801.0,
901.0, 1001.0, 1101.0, 1201.0, 1301.0, 1401.0, 1501.0, 1601.0,
1701.0, 1801.0, 1901.0, 2001.0, 2101.0, 2201.0, 2301.0, 2401.0,
2501.0, 2601.0, 2701.0, 2801.0, 2901.0, 3001.0, 3101.0, 3201.0,
3301.0, 3401.0, 3501.0, 3601.0, 3701.0, 3801.0, 3901.0, 4001.0,
4101.0, 4201.0, 4301.0, 4401.0, 4501.0, 4601.0, 4701.0, 4801.0,
4901.0, 5001.0, 5101.0, 5201.0, 5301.0, 5401.0, 5501.0, 5601.0,
5701.0, 5801.0, 5901.0, 6001.0, 6101.0, 6201.0, 6301.0, 6401.0
};

// Matrix B - 8x8 - 'float' type
RTfloat fB[8][8] =
{
101.0, 201.0, 301.0, 401.0, 501.0, 601.0, 701.0, 801.0,
901.0, 1001.0, 1101.0, 1201.0, 1301.0, 1401.0, 1501.0, 1601.0,
1701.0, 1801.0, 1901.0, 2001.0, 2101.0, 2201.0, 2301.0, 2401.0,
2501.0, 2601.0, 2701.0, 2801.0, 2901.0, 3001.0, 3101.0, 3201.0,
3301.0, 3401.0, 3501.0, 3601.0, 3701.0, 3801.0, 3901.0, 4001.0,
4101.0, 4201.0, 4301.0, 4401.0, 4501.0, 4601.0, 4701.0, 4801.0,
4901.0, 5001.0, 5101.0, 5201.0, 5301.0, 5401.0, 5501.0, 5601.0,
5701.0, 5801.0, 5901.0, 6001.0, 6101.0, 6201.0, 6301.0, 6401.0
};

// Matrix C - 8x8 - 'float' type
RTfloat fC[8][8] =
{
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
};

// All variables are Single-Precision type ( float ) except for a Double-Precision type ( double )
// variable dSum for accumulated sum
for( RTint i = 0; i < 8; i++ )
{
for( RTint j = 0; j < 8; j++ )
{
fC[i][j] = 0.0f;
RTdouble dSum = 0.0L;
for( RTint k = 0; k < 8; k++ )
{
dSum += ( RTdouble )( fA[i][k] * fB[k][j] );
}
fC[i][j] = ( RTfloat )dSum;
}
}
...

imagem de iliyapolak

@Sergey

Will you be interested in test case of arbitrary precision arithmetics which is based on Java Big Decimal library?

imagem de Sergey Kostrov

>>...Will you be interested in test case of arbitrary precision arithmetics which is based on Java Big Decimal library?

Unfortunately No because sometimes I don't have time to test my own codes.

imagem de Sergey Kostrov

>>...Will you be interested in test case of arbitrary precision arithmetics which is based on Java Big Decimal library?

Iliya,

If you continue asking me similar questions ( and you've asked me at least three or four times already in the past ) I will be forced to report it to IDZ management.

Sorry, but you've "crossed the line" already.

Best regards,
Sergey

imagem de iliyapolak

Quote:

Sergey Kostrov wrote:

>>...Will you be interested in test case of arbitrary precision arithmetics which is based on Java Big Decimal library?

Iliya,

If you continue asking me similar questions ( and you've asked me at least three or four times already in the past ) I will be forced to report it to IDZ management.

Sorry, but you've "crossed the line" already.

Best regards,
Sergey

I simply wanted to create a different thread solely for the purpose of comparision of accuaracy of arbitrary precision arithmetics vs single and double floating point arithmetics.My intention was to post my test cases and to get an input from the users.

Sorry if you have misunderstood my post.I did not want "to force" anyone to test my code.

imagem de Sergey Kostrov

A message to a moderator of the Intel Forum(s)

Could you delete last 5 posts, including this one, since they are not related to the subject of the thread. First post to be deleted was done on Wed, 01/23/2013 - 09:45 by the user iliyapolak.

Thank you in advance.

Faça login para deixar um comentário.