cast __m512 to __m512d

cast __m512 to __m512d

Hey all,


simple question:


How does the cast operation _mm512_castps_pd work?

A __m512 data type holds 16 floats i.e. 16 elements. Contrary to that a __m512d data type can only hold 8 elements -- so what happens if I use the following instructions

__m512   a_ = _mm512_set1_ps( 2.0 );
__m512d b_ = _mm512_castps_pd( a_ );


Is it possible to load data from memory with _mm512_load_ps and then do a "cast operation" from float to double precision into two __m512d registers.




5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

In the case that this specific cast is not possible how can load data from a 64-byte aligned float array into a __m512d register. I want to perform my FLOPs in double precision, but store/load the data in single precision. I have tried _mm512_extload_pd, but there is no corresponding _MM_UPCONV_PD_ENUM.

Cast intrinsics are the equivalent of a C++ reinterpret_cast. They do not correspond to any actual assembly instruction: all they do is inhibit C's type checking. So _mm512_castps_pd reinterprets the binary representation of each pair of floats as a double.

What you need is a conversion: _mm512_cvtpslo_pd (and _mm512_cvtpd_pslo).

Since there is no _mm512_cvtpshi_pd instruction, you will have to use some swizzle or permute operation to extract the high-order part of your float vector.

Echoing Sylvain's reply, guidance I received from our instrinsic developer is:

512-bit vectors are represented in a C/C++ program by one of the following types: __m512, __m512i and __m512d.

There is a set of “cast” intrinsics, and _mm512_castps_pd is one of them, which do not do anything except that they allow to treat a 512-bit vector as one of these types.
These intrinsics do not change any values in the vector. So, if you write:

__m512   a_ = _mm512_set1_ps( 2.0 );
__m512d b_ = _mm512_castps_pd( a_ );

then the vectors a_ and _b will be bitwise identical, but the vector a_ will treat 512 bits as 16 single precision floating point values, while the vector b_ will treat the same 512 bits as 8 double precision floating point values.

Additional note – if a user wants real cast of vector elements from float to double then the following intrinsic should be used on KNC:

extern __m512d  _mm512_cvtpslo_pd(__m512);

This intrinsic returns 8 double precision elements (low 8 single precision elements of the source vector are casted to double precision)

Thanks for that information. The instruction _mm512_cvtpslo_pd works fine!

Leave a Comment

Please sign in to add a comment. Not a member? Join today