Concerns on using AVX double floating point instructions for integer data

Concerns on using AVX double floating point instructions for integer data

Hi all,

As you might know, AVX does not provide instructions for integer types, which are planned to arrive with AVX2. I have a code written using AVX instructions, which basically use _mm256_*_pd() variants of instructions that operate on double-precision floating-point values (the instructions I use are min, max, shuffle, blend, load, loadu, etc.). However my data is actually integers, which I load by casting integer pointers to double pointers, i.e. __m256d reg = _mm256_loadu_pd((double*)intPtr) etc. Functionality wise the code seems to do what I expect, i.e. sorts the data. However, as I haven't tested with all sorts of different data, I'm concerned whether the output will always be correct. What corner cases should I be concerned with? Would the comparisons will always be correct or will there be some integer values where the AVX floating point comparison would not work?

Thanks for comments and suggestions

38 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

From IEEE Std 754-2008, section 5.11:

Four mutually exclusive relations are possible: less than, equal, greater than, and unordered. The last case arises when at least one operand is NaN. Every NaN shall compare unordered with everything, including itself.

Thus, comparisons involving integers whose bit pattern matches that of a floating-point NaN would be problematic.

Of course you can do int-to-double cast in order to use AVX, however...

>>...Would the comparisons will always be correct or will there be some integer values where the AVX floating point comparison
>>would not work?

I would be very carefull because your processing will be dependent on limitation of IEEE 754 Standard and, as recommended in many-many sources, a comparison with an Epsilon could be added ( expect a performance impact ). If your tests are deterministic ( No Random data ) an accuracy of processings, I mean based in integers and then based on doubles, could be verified as soon as both outputs are saved.

There are single- and double-precision binary format viewers on the web and you could look / verify how some integer values will look like after conversion to double type.

>>...Thus, comparisons involving integers whose bit pattern matches that of a floating-point NaN would be problematic...

That looks interesting and could you give us at least one example when some integer value could be converted to a double-precision NaN value?

>>...Thus, comparisons involving integers whose bit pattern matches that of a floating-point NaN would be problematic...

I'm very surprized when Intel engineers make some statements without any real verification(s) ( sometimes very simple ), like:

[ Test-case ]
...
int iIsNan = 0;

double dValue = -1.0;
double dValueLn = 0.0L;
unsigned __int64 iValue = 0U;
printf( "dValue = %f\n", dValue );
printf( "dValueLn = %f\n", dValueLn );
printf( "iValue = %I64d\n", iValue );

dValueLn = CrtLog( dValue );
printf( "dValueLn = %f\n", dValueLn );
iValue = ( __int64 )dValueLn;
printf( "iValue = %I64d\n", iValue );
iIsNan = _isnan( dValueLn );
if( iIsNan == 0 )
printf( "dValueLn is Not NaN\n" );
else
printf( "dValueLn is NaN\n" );
dValue = ( double )iValue;
printf( "dValue = %f\n", dValue );

iValue = 9223372036854775800i64;
dValue = 0.0L;
printf( "iValue = %I64d\n", iValue );
printf( "dValue = %f\n", dValue );

dValue = ( double )iValue;
printf( "dValue = %f\n", dValue );
iIsNan = _isnan( dValue );
if( iIsNan == 0 )
printf( "dValue is Not NaN\n" );
else
printf( "dValue is NaN\n" );
...

[ Output ]

dValue = -1.000000
dValueLn = 0.000000
iValue = 0
dValueLn = -1.#IND00
iValue = -9223372036854775808
dValueLn is NaN
dValue = 9223372036854775800.000000
iValue = 9223372036854775800
dValue = 0.000000
dValue = 9223372036854775800.000000
dValue is Not NaN

Please let me know if you find any problems with the test-case.

Best regards,
Sergey

{ UPDATED }Fixed:
printf( "iValue = %f\n", iValue );
to
printf( "iValue = %I64d\n", iValue );

Hello cagribal,
I assume when you say 'integers' you do mean 4 byte signed variables... so 32bit and includes one sign bit.
The double precision IEEE mantissa is 53 bits plus one sign bit.
If the question is, can every 32bit integer value be converted to double and, when I convert back to integer, will I get back the original integer?
The answer to this is yes.
If you are just doing compares (that is, not changing the value of your converted 32bit INTs) in your AVX code, you will not get NANs, and you will get the compare results you expect (there will be no unordered results).
Pat

Hi everybody,

There are cases ( I detected 3 so far ) wheh 64-bit Integer ( boundary signed & unsigned ) and Double-Precision values do not match. Please take a look at cases 2.x:

[ Output ]

Test-Case 1
dValue = -1.000000
dValueLn = 0.000000
iValue = 0
dValueLn = -1.#IND00
iValue = -9223372036854775808
dValueLn is NaN
dValue = 9223372036854775800.000000

Verifications for Boundary values ( signed and unsigned ) of 64-bit range:
Test-Case 2.1
iValueS = 9223372036854775807
dValue = 0.000000
dValue = 9223372036854775800.000000
dValue is Not NaN

Test-Case 2.2
iValueS = -9223372036854775808
dValue = 0.000000
dValue = -9223372036854775800.000000
dValue is Not NaN

Test-Case 2.3
iValueU = 9223372036854775807
dValue = 0.000000
dValue = 9223372036854775800.000000
dValue is Not NaN

Test-Case 2.4
iValueU = 0
dValue = 0.000000
dValue = 0.000000
dValue is Not NaN

I'll post source codes of my quick test later after additional verification.

64bit integers (if the span of non-zero bits in the 64bit integer is more than 53 bits) cannot be represented without a loss of precision.
That is, converting a 64bit integer to double and back to 64bit may or may not give you back the original 64bit integer, depending on how many bits are used in the original 64bit integer.
But 32bit integers will be okay.

>>...64bit integers (if the span of non-zero bits in the 64bit integer is more than 53 bits) cannot be represented without a loss of precision...

Exactly and this is how it looks like:

>>...
>>Test-Case 2.1
>>iValueS = 9223372036854775807
>>...
>>dValue = 9223372036854775800.000000
>>...

Thanks Patrick for the comment!

>>...What corner cases should I be concerned with?

Look for a Patrick's post for a case with 32-bit integers.

There are 2 generic cases wheh 64-bit Integer ( boundary signed & unsigned ) and Double-Precision values do not match ( 64-bit is converted to 53-bit DP as Patrick mentioned in his post ). You need to verify some range of boundary integer values ( next to min and max values ).

>>...Would the comparisons will always be correct or will there be some integer values where the AVX floating point comparison
>>would not work?

Yes if a precision of the source integer value is not lost during the conversion.

Does it make sense?

Hi all,

thanks for your replies.

@Patrick: Actually, as integer I meant 64-bit signed integers. So as I understood, it is possible that some 64-bit integer might have bit pattern of NaN and might result in an incorrect result.

Here are small test cases that I'm using:


    double NaN;

    *(uint64_t *)(&NaN) = 0x7FF0000000000001;
    // Test.1) Prints "NEQ : nan" , as NaN != NaN

    if(NaN == NaN)

        printf("EQ : %.20fn", NaN);

    else

        printf("NEQ : %.20fn", NaN);
    double x = 87.0d;

    // Test.2) Prints Unordered as comparison with a NaN is always Unordered

    if(NaN < x)

        printf("LTn");

    else if(NaN > x)

        printf("GTn");

    else if(NaN == x)

        printf("EQn");

    else

        printf("Unorderedn");
    // Test.3) Comparisons with AVX, basically min(NaN, 10) returns NaN (?)

    int64_t arr1[4] = {10, 20, 30, 40};

    int64_t arr2[4] = {50, 20, 40, 10};

    *(double *)(&arr2[0]) = NaN;
    __m256d a = _mm256_loadu_pd((double *) arr1);

    __m256d b = _mm256_loadu_pd((double *) arr2);

    printf("A = "); p256i(a);   // A = AVXVector: {10 ; 20 ; 30 ; 40}

    printf("B = "); p256i(b);   // B = AVXVector: {9218868437227405313 ; 20 ; 40 ; 10}

    __m256d ret = _mm256_min_pd (a, b);

    printf("MIN = "); p256i(ret);   // MIN = AVXVector: {9218868437227405313 ; 20 ; 30 ; 10}

Hello Cagribal,
Yes, one can certainly generate double precision NANs from 64bit bit patterns.
And one can generate 64bit ints which won't convert to doubles without loss of precision (such as bigint = (1LL << 55) + 1.)
From my old PhD days, there were whole sections dedicated to what can/can't be represented/converted and back.
You will need to check that your 64bit integer ranges do not exceed the 53 bit mantissa of the double precision value.
Pat

Hi everybody,

>>...I'll post source codes of my quick test later after additional verification...

Here it is:
...
int iIsNaN = 0;

// Test-Case 1
printf( "Test-Case 1\n" );
double dValue = -1.0;
double dValueLn = 0.0L;
unsigned __int64 iValue = 0U;
printf( "\tdValue = %f\n", dValue );
printf( "\tdValueLn = %f\n", dValueLn );
printf( "\tiValue = %I64d\n", iValue );

dValueLn = CrtLog( dValue );
printf( "\tdValueLn = %f\n", dValueLn );
iValue = ( unsigned __int64 )dValueLn;
printf( "\tiValue = %I64d\n", iValue );
iIsNaN = _isnan( dValueLn );
if( iIsNaN == 0 )
printf( "\tdValueLn is Not NaN\n" );
else
printf( "\tdValueLn is NaN\n" );
dValue = ( double )iValue;
printf( "\tdValue = %f\n", dValue );

printf( "Verifications for Boundary values ( Signed and UnSigned ) of 64-bit range:\n" );

__int64 iValueS = 0LL;
unsigned __int64 iValueU = 0ULL;

// Test-Case 2.1
printf( "Test-Case 2.1\n" );
iValueS = ( 9223372036854775807LL );
dValue = 0.0L;
printf( "\tiValueS = %I64d\n", iValueS );
printf( "\tdValue = %f\n", dValue );

dValue = ( double )iValueS;
printf( "\tdValue = %f\n", dValue );
iIsNaN = _isnan( dValue );
if( iIsNaN == 0 )
printf( "\tdValue is Not NaN\n" );
else
printf( "\tdValue is NaN\n" );

// Test-Case 2.2
printf( "Test-Case 2.2\n" );
iValueS = ( -9223372036854775807LL - 1 );
dValue = 0.0L;
printf( "\tiValueS = %I64d\n", iValueS );
printf( "\tdValue = %f\n", dValue );

dValue = ( double )iValueS;
printf( "\tdValue = %f\n", dValue );
iIsNaN = _isnan( dValue );
if( iIsNaN == 0 )
printf( "\tdValue is Not NaN\n" );
else
printf( "\tdValue is NaN\n" );

// Test-Case 2.3
printf( "Test-Case 2.3\n" );
iValueU = ( 9223372036854775807ULL );
dValue = 0.0L;
printf( "\tiValueU = %I64d\n", iValueU );
printf( "\tdValue = %f\n", dValue );

dValue = ( double )iValueU;
printf( "\tdValue = %f\n", dValue );
iIsNaN = _isnan( dValue );
if( iIsNaN == 0 )
printf( "\tdValue is Not NaN\n" );
else
printf( "\tdValue is NaN\n" );

// Test-Case 2.4
printf( "Test-Case 2.4\n" );
iValueU = ( 0ULL );
dValue = 0.0L;
printf( "\tiValueU = %I64d\n", iValueU );
printf( "\tdValue = %f\n", dValue );

dValue = ( double )iValueU;
printf( "\tdValue = %f\n", dValue );
iIsNaN = _isnan( dValue );
if( iIsNaN == 0 )
printf( "\tdValue is Not NaN\n" );
else
printf( "\tdValue is NaN\n" );
...

>>...it is possible that some 64-bit integer might have bit pattern of NaN and might result in an incorrect result...

I'll do a couple of tests and I'll be back. Thanks guys for that really nice discussion!

>>>Exactly and this is how it looks like:

>>...
>>Test-Case 2.1
>>iValueS = 9223372036854775807
>>...
>>dValue = 9223372036854775800.000000
>>...

Please bear in mind that exact implementation of printf()(I mean here some kind of formatting performed by this function) should be also taken into account when the same primitive types are converted from one type to other.The best example of such a conversion,albeit not applicable to your case is reduction of long double 80-bit type to 64-bit which is performed by MSVCRT printf() function.

I'm very surprized when Intel engineers make some statements without any real verification(s)...

Perhaps you missed this part of the original post: However my data is actually integers, which I load by casting integer pointers to double pointers...

If one of those "doubles" now points to 64 bits which has the long int value 92211202370041090560 (= 0x7ff8000000000000), it will be intepreted as a (quiet) NaN, and it will compare as "unordered" with any other value.

>>...in mind that exact implementation of printf()(I mean here some kind of formatting performed by this function) should be also taken into account...

It affects only how the value is displayed not as how it is stored.

Make that 9221120237041090560 and not 92211202370041090560.

>>...I'll do a couple of tests and I'll be back...

Here is a small Test-Case 1.2

...
// Test-Case 1.2
printf( "Test-Case 1.2\n" );

unsigned __int64 iNaNIntValue = 0ULL;

// iNaNIntValue = 0x1020304050607080;

dValueLn = 0;
iNaNIntValue = 18444492273895866368ULL; // 0xfff8000000000000 = NaN-raw-value ( binary representation )
dValueLn = ( double )iNaNIntValue;
iIsNaN = _isnan( dValueLn );
if( iIsNaN == 0 )
printf( "\tdValueLn is Not NaN\n" );
else
printf( "\tdValueLn is NaN\n" );
...

When debugging this is how variables look like in a Visual Studio 'Memory' window:

[ 'double' with NaN value ]
...
00 00 00 00 00 00 f8 ff
...

[ '__int64' after assignment from 'double' with NaN value ]
...
00 00 00 00 00 00 00 80
...

So, it looks like a developer should watch out for a 0xfff8000000000000 or 18444492273895866368 value. No and let me continue. Next, if a developer converts it back to 'double' it will get 0x43efff0000000000 or 4895411695440101376 and that is done by a C++ compiler (!). It looks like a magic but actually there are No any uncertanties here because only 53 bits (!) will be copied into mantissa and a part of 64-bit integer which is "responsible" for a NaN-code won't be re-created in the 'double'.

So, this is not possible to create a NaN value in a double precision variable from a 64-bit integer variable by doing a simple cast, like:

...
dValueLn = 0;
iNaNIntValue = 18444492273895866368ULL;
dValueLn = ( double )iNaNIntValue;
...

unless a developer copies these 8 bytes with a 'memcpy' CRT function directly.

>>...
>>unless a developer copies these 8 bytes with a 'memcpy' CRT function directly.

Something like that:

...
// Test-Case 1.3
printf( "Test-Case 1.3\n" );

void *pdValueLn = &dValueLn;
void *piNaNIntValue = &iNaNIntValue;
memcpy( ( void * )pdValueLn, ( const void * )piNaNIntValue, 8 );
iIsNaN = _isnan( dValueLn );
if( iIsNaN == 0 )
printf( "\tdValueLn is Not NaN\n" );
else
printf( "\tdValueLn is NaN\n" );
...

[ Output ]

...
Test-Case 1.3
dValueLn is NaN
...

Once again, this is not possible to create a NaN value in a double precision variable from a 64-bit integer variable by doing a simple cast.

>>>It affects only how the value is displayed not as how it is stored.>>>
Yes , but the stored value is encoded by the compiler and/or hardware so the compiler's vendor can implement it differently.Look at case of Intel primitive long double type and its truncation to 64-bit double precision type.

>>...Yes , but the stored value is encoded by the compiler and/or hardware so the compiler's vendor can implement it differently...

No, when it comes to conversion from int to double in accordance with IEEE754 Standard unless some vendor violates that standard.

Only when the IEEE754 Standard is concerned.Moreover you must also take into account unpredictable possibility of the hardware units clock inaccuraccies and/or
data(memory) bus timing errors which could pollute the results with the random values.I know that I'm to rigorous here:) ,but such a hardware related errors could be quite possible to occur .

If you are getting errors like random memory values or hw units clock inaccuracies (not sure what that means), then the hardware has bigger problems than can be addressed here.

Once again, this is not possible to create a NaN value in a double precision variable from a 64-bit integer variable by doing a simple cast.

Of course. The original question however involved casting pointers, not data values: _mm256_loadu_pd((double*)intPtr).

>>>If you are getting errors like random memory values or hw units clock inaccuracies (not sure what that means), then the hardware has bigger problems than can be addressed here.>>>
Yes in the past I experienced such a behaviour with the faulty CPU.
>>>hw units clock inaccuracies (not sure what that means)>>>
I mean miniscule shifts in the phase of the clock frequency.
>>>then the hardware has bigger problems than can be addressed here.>>>
I know that pretty well.My intention was to emphasize the fact that sometimes the wrong result while converting between the primitive types could stem from the hardware error.

[ From Jeff ]
>>...Of course. The original question however involved casting pointers, not data values...

Jeff, sorry for repeating that statement made by cagribal:

>>...my data is actually integers, which I load by casting integer pointers to double pointers...

and after data loaded cagribal does some processing and his concern is related to, I would say, "unsafe" comparisons or correctness of comparisons of double-precision data values, not pointers.

Best regards,
Sergey

There are 2 cases:
1) casting a int64 to a double.
This always works and never generates a NAN, but you can lose precision.
2) casting an int64 pointer to a double pointer (which is basically a memcpy(&double_var, &int64_var, 8); ).
This also always 'works' but can generate a NAN. Basically you are not converting an int64 to a double, you are just copying bits.
I say 'not converting an int64 to double' because, unless your int64 bit pattern just happens to also be the correct 64bit double encoding, then you are not going to get the correct double encoding for your int64 number.
Does that make sense?
Pat

The

Hi all,

Thanks for the comments. Patrick has clearly summarized all the cases. However, questions I still have are:

a) Why AVX _mm256_min_pd() or _mm256_max_pd() return NaN for comparisons with an NaN number? (Please see the Test 3 in the code snippet I posted above)

b) My understanding is, if integers do not contain all 1's in the exponent field, i.e. bits 63-52, then all double comparisons over the raw bits (treated as double by copying or so) will be always correct. The implication is that by restricting my integers to use at most 62 bits (i.e. by leaving MSB exponent bit always 0), I can assure that comparisons will always be correct. Any comments on this?

Thanks,

a) IEEE754 defines a comparison against NaN to return NaN, These are floating point operations.
b) I suppose, you must assure correct setting of DAZ bit to use comparisons with zero exponent bits.

According to the instruction set reference manual (see the description of the VMINPD instruction which is what the documentation says the _mm256_min_pd intrinsic generates):

If a value in the second operand is an SNaN, that SNaN is forwarded unchanged to the destination (that is, a QNaN version of the SNaN is not returned). If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result.

In your case, one of the elements of the 2nd operand is a NaN, so that NaN is forwarded to the destination operand.

As to your second point: depending on your floating-point environment, subnormals (exponent == 0, significand != 0; i.e., non-zero integers with "small" absolute value) might cause exceptions to be raised. I don't know what would happen if you have "flush-to-zero" enabled and you compare two vectors of small, non-zero integers. I'm sure the behavior is defined; I just don't know what it is.

Hello cagribal,
Adding a little more to
For a), I assume that it is part of the IEE floating point (754?) standard to return a NAN if you are comparing NANs.
For b), it depends on what you mean by 'correct'.
1) If you are just casting int64 to double and you test that the int64 value isn't > 52 bits then the value will be correct.
2) If you are copying (instead of casting) then your result will probably be wrong even if you are only using bits 0-52.

Here is an example of 'copying an int64 to double' not working... not working in the sense that the number in the double does not equal the number in the int64.
Using msvc:

C:\tst>type fltpt.c


#include

#include 
int main(int argc, char **argv)

{

        double x, y;

        long long int myll;

        myll = ( 0x3LL << 40) + 1;

        printf("myll = 0x%llx, %lldn", myll, myll);

        x = (double)myll;

        printf("dbl x val by casting= %f, in hex= 0x%llxn", x, x);

        memcpy(&y, &myll, sizeof(y));

        //y = *(double *)(long long int *)&myll; // this line is same as memcpy above

        printf("dbl y val by copying= %f, in hex= 0x%llxn", y, y);

        return 0;

}



C:tst>fltpt.exe

myll = 0x30000000001, 3298534883329

dbl x val by casting= 3298534883329.000000, in hex= 0x4288000000000800

dbl y val by copying= 0.000000, in hex= 0x30000000001

You can see a description of what happens during the 'int64->dbl' casting at http://www.cs.binghamton.edu/~reckert/220/floatpt.htm
Pat

cagribal,

What is a chance that your number(s) will be greater than 2^53? Please ask yourself. Then, if you're not counting number of atoms in the Universe ( ~10^80 ) than a definition of two ranges, that is a safe and not-safe, for my numbers should bring clarity to your uncertanties.

Let's say you have a data set. Define safe and not-safe ranges. Pre-scan the data set and verify that all numbers are in the safe range and only after that do all the rest processing. If some numbers are not-safe than create a vector of not-safe numbers and save all indexes of these numbers for additional analysis. If you don't need to do the additional analysis than simply truncate all unsafe numbers to a max or min values of the safe range. This is what I would do and I use that solution in a real implementation of a Pigeonhole Sorting algorithm to sort only positive integer numbers.

I would move ahead with practical implementation of a needed processing and, as I already menrioned, I would define safe and not-safe ranges first of all. Also, if your software is a mission critical ( healthcare, finance, defense, aerospace, etc ) then the problem has to be treated seriously with as many as possible verifications by different software developers. If your software is not mission critical ( R&D, thesis, do-it-because-have-nothing-else-to-do, etc ) some number of simple verifications will provide everything you need.

Best regards,
Sergey

@Sergey
Great post.

Not to beat a horse to death but...


What is a chance that your number(s) will be greater than 2^53? Please ask yourself. Then, if you're not counting number of atoms in the Universe ( ~10^80 ) than a definition of two ranges, that is a safe and not-safe, for my numbers should bring clarity to your uncertanties.


In a double precision number you have about 15 digits of precision.
The US GDP is $15 trillion (14 digits). In Indian rupees, the number exceeds the precision of a double.
So it is actually not too hard to exceed the number of significant digits in a double... depending on the area in which one is working.
The rest of the advice is pretty good.
I was assuming that cagribal was just loading the INTs into AVX for sorting (so no modification of the data... pure-read access). If this is true then he can do simple range checking when gets ready to sort the data.
Pat

Patrick,

I did a search in Intel(R) AVX compiler intrinsics header immintrin.h and I wonder if another intrinsic function could be used instead of:

...
__m256d reg = _mm256_loadu_pd( ( double * )intPtr );
...

Since there is a union __m256i then I would expect an intrinsic function that does a similar operation like:

...
__m256i reg = _mm256_load?_???64?( ( __int64 * )intPtr );
...

Could a _mm256_set1_epi64x ( or some similar intrinsic function ) do the same without all issues & problems related to __int64-to-double cast?

Hey Sergey,
I'm not sure quite sure I understand... the __int64-to-double cast is working as expected (as far as I can tell).
Other than having int64 AVX instructions, what would you like the new intrinsic to do?
Pat

>>... what would you like the new intrinsic to do?

Exactly the same operation, that is to load 4 __int64 values into the reg variable of type __m256i:

Instead of

__m256d reg = _mm256_loadu_pd( ( double * )intPtr );

to use this

__m256i reg = _mm256_load?_???64?( ( __int64 * )intPtr );

Could you take a look at declarations of __m256d and __m256i C unions in immintrin.h header file?

Leave a Comment

Please sign in to add a comment. Not a member? Join today