__m128/256 are neither class nor fundamental types on ICC Linux

__m128/256 are neither class nor fundamental types on ICC Linux

Portrait de Matthias Kretz

Hi,

I just tried whether I can make use of the scalar access members defined for all __m128/256 types in the ICC intrinsics headers. But ICC thinks those types are classes while at the same time they are not classes. See for yourself:

#include <xmmintrin.h>
int main() {
    __m128 v;
    v.m128_f32[0] = 0.f;
    v.foo[0] = 0.f;
    return 0;
}

When I compile this with 'icpc main.cpp' I get:

main.cpp(4): error: expression must have class type
      v.m128_f32[0] = 0.f;
      ^
main.cpp(5): error: class "__m128" has no member "foo"
      v.foo[0] = 0.f;
        ^
main.cpp(5): error: expression must have class type
      v.foo[0] = 0.f;
      ^
compilation aborted for main.cpp (code 2)

Note how ICC says 'class "__m128"...' and for the same object 'must have class type'. If __m128 is not class type, what else could it be? A fundamental type. I'd be happy if it were, but it's not. Something is not as it was meant to be, I'd say...

Vc: SIMD Vector Classes for C++ http://code.compeng.uni-frankfurt.de/projects/vc
21 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de Georg Zitzlsberger (Intel)

Hello,

yes, that's ambivalent information. This line


main.cpp(5): error: class "__m128" has no member "foo"


...is the problem here. Claiming that "__m128" is a class type is incorrect. The other two are correct.

I've filed a defect ticket for engineering to improve the error message in question (or drop it). (edit: DPD200240493)

Thank you for your feedback!

Best regards,

Georg Zitzlsberger

Portrait de Matthias Kretz

OK, that's a bit unexpected. So what is __m128 then? And what are the m128_f32 and friends members for if they can't be accessed?

Vc: SIMD Vector Classes for C++ http://code.compeng.uni-frankfurt.de/projects/vc

>>...that's a bit unexpected. So what is __m128 then?

It is declared as a union ( a fundamental type in C and C++ languages ).

All the rest SIMD ( Single Instruction Multiple Data ) types, declared in xxxintrin,h header files, are also declared as unions.

Example of declaration

[ xmmintrin.h ]
...
typedef union __declspec(intrin_type) _CRT_ALIGN(16) __m128 {
float m128_f32[4];
unsigned __int64 m128_u64[2];
__int8 m128_i8[16];
__int16 m128_i16[8];
__int32 m128_i32[4];
__int64 m128_i64[2];
unsigned __int8 m128_u8[16];
unsigned __int16 m128_u16[8];
unsigned __int32 m128_u32[4];
} __m128;
...

Example of application
...
// Single-Precision
__m128 a1 = { -1.388539f, 0.0f, 0.0f, 0.0f };
__m128 b1 = { 57.29578f, 0.0f, 0.0f, 0.0f };
__m128 mmResult1 = _mm_mul_ps( a1 , b1 );

// Double-Precision
__m128d a2 = { -1.388539L, 0.0L };
__m128d b2 = { 57.29578L, 0.0L };
__m128d mmResult2 = _mm_mul_pd( a2 , b2 );
...

>>...If __m128 is not class type, what else could it be?...

If you're interested in class wrappers around SIMD types please take a look at fvec.h and ivec.h header files.

Portrait de Matthias Kretz

Quote:

Sergey Kostrov wrote:

>>...that's a bit unexpected. So what is __m128 then?

It is declared as a union ( a fundamental type in C and C++ languages ).

But it doesn't behave like one. Which is the whole point of the discussion. Anyway, if you look closely at the xmmintrin.h file: on Linux __m128 is a struct. A union, btw, is not a fundamental type. Fundamental types are: float, double, int, etc. .

Vc: SIMD Vector Classes for C++ http://code.compeng.uni-frankfurt.de/projects/vc
Portrait de Matthias Kretz

Quote:

Sergey Kostrov wrote:

If you're interested in class wrappers around SIMD types please take a look at fvec.h and ivec.h header files.

Maybe you want to look at my signature. :) You think I wouldn't know about those classes...?

Vc: SIMD Vector Classes for C++ http://code.compeng.uni-frankfurt.de/projects/vc

A duplicate / Removed. Sorry about this.

>>...on Linux __m128 is a struct...

I'd like to see that header file. Could you attach it, please?

Also, what version of Linux do you mean?

By "design" struct members do not share the same memory block ( they located in some memory block ), union members do share the same memory block.

If, for example, __m128 type is declared as a struct then sizeof( __m128 ) won't be equal to 16 ( that is 128 bits / that is why it is named as __m128 ).

A message to Intel software engineers / developers:

Matthias is absolutely right and I'm absolutely frustrated that you've done that:

[ xmmintrin.h ]
...
#if defined(__INTEL_COMPILER) && defined(_MM_FUNCTIONALITY)
# include "xmm_func.h"
#else
# if defined(_MSC_VER) && _MSC_FULL_VER >= 140040310
typedef union _MMINTRIN_TYPE(16) __m128 {
/*
* Although we do not recommend using these directly, they are here
* for better MS compatibility.
*/
float m128_f32[4];
unsigned __int64 m128_u64[2];
__int8 m128_i8[16];
__int16 m128_i16[8];
__int32 m128_i32[4];
__int64 m128_i64[2];
unsigned __int8 m128_u8[16];
unsigned __int16 m128_u16[8];
unsigned __int32 m128_u32[4];

/*
* This is what we used to have here alone.
* Leave for backward compatibility.
*/
float f[4];
} __m128;
# else
typedef struct _MMINTRIN_TYPE(16) __m128 {
float m128_f32[4];
} __m128;
# endif
#endif
...

In another words, compatibility of members is broken for SIMD type __m128 on Windows and Linux platforms.

Portrait de iliyapolak

__m128 data type by design should be mapped directly to XMMn registers so for example the natural component of such a data type is primitive type array.By using union there is the same spot(address) of the memory and only one of those types can be used at  same time.

>>...In another words, compatibility of members is broken for SIMD type __m128 on Windows and Linux platforms...

A simple workaround could be applied:

Declare your own union-based declaration for __m128 SIMD type controlled by a macro, for example, _FULLY_COMPATIBLE_SIMD_TYPES_:

[ In xmmintrin.h for Linux ]
...
#if defined ( __INTEL_COMPILER ) && defined( _MM_FUNCTIONALITY )

# include "xmm_func.h"

#else
# if defined ( _MSC_VER ) && _MSC_FULL_VER >= 140040310

typedef union _MMINTRIN_TYPE(16) __m128
{
...
// All the rest union members
...
} __m128;

# else

#if defined ( _FULLY_COMPATIBLE_SIMD_TYPES_ )

typedef union _MMINTRIN_TYPE(16) __m128
{
...
// All the rest union members
...
} __m128;

#else

typedef struct _MMINTRIN_TYPE(16) __m128
{
float m128_f32[4];
} __m128;

#endif

#endif
#endif
...
[ Test-case ]
...
#define _FULLY_COMPATIBLE_SIMD_TYPES_

#include "xmmintrin.h"
...
void main( void )
{
__m128 mmVar = { 0 };

mmVar.m128_i32[0] = 0;
mmVar.m128_i32[1] = 1;
mmVar.m128_i32[2] = 2;
mmVar.m128_i32[3] = 3;
}

Portrait de iliyapolak

My approach is to use custom typedef based on aligned 16 sttructure of double or float arrays.

Portrait de Matthias Kretz

I already have solved the whole problem. What I was after was to improve the implementation because ICC claims it can do better. So I tried and ICC failed. Which is what I reported.

If you're interested to see how to portably and efficiently access scalar components of SSE/AVX types then look at common/storage.h in Vc (see signature).

Vc: SIMD Vector Classes for C++ http://code.compeng.uni-frankfurt.de/projects/vc
Portrait de Georg Zitzlsberger (Intel)

Hello Sergey,

I cannot entirely follow your argumentation:

1. Intrinsics are available on both GNU GCC* and Microsoft Windows* (more precisely: Microsoft Visual Studio*). Both implementations are different, which we have to follow. Just compare the *mmintrin.h headers on both platforms!
2. Direct use of the (union) members is not supported by the Intel Compiler for Linux because GCC does neither. It's only possible on Windows* for compatibility reasons.
3. "Portability" is not an issue here because no one will ever exchange binaries/objects between different OSes.

So, the only remaining question is: Why can one not access the struct member m128_f32[4] (type __m128) directly on Linux? Same for the other intrinsic types.
Answer: GNU GCC does not offer this either. So, that's not a compatibility issue. However, I agree that it's not as elegant as it could be and provide this feedback to engineering. We might remove the member and replace it by a vector_size attribute to avoid confusion here.

Best regards,

Georg Zitzlsberger

>>...3. "Portability" is not an issue here because no one will ever exchange binaries/objects between different OSes...

It is the big issue, Georg. I was talking about portability of source codes, not binaries.

There are several issues with portability of intrinsics between linux and Windows.  Intel compilers mask some but not all, so it seems necessary to test all compilers which are to be supported.  Usually there is some least common denominator version which is de facto portable, but the intrinsics clearly aren't portable in the sense of being covered by any standard.

Portrait de Georg Zitzlsberger (Intel)

Sergey,

as mentioned above, the different OSes have their own ways to implement intrinsics. This obviously is not portable in source code and nothing we can do about it. We have to follow the existing frameworks.
Hence, I don't see any justification of this, regarding portability:

>> A message to Intel software engineers / developers:
>> Matthias is absolutely right and I'm absolutely frustrated that you've done that:

Best regards,

Georg Zitzlsberger

I don't consider that as a matter of different OSes.

>>>>...I don't see any justification of this, regarding portability:
>>
>> A message to Intel software engineers / developers:
>> Matthias is absolutely right and I'm absolutely frustrated that you've done that:

Georg, once again: Matthias is absolutely right.

I could only assume that you don't do real programming in a real world and don't work on a project that requires support for many platforms ( 5 or more ), C/C++ compilers ( 5 or more ) and different Intel CPUs ( starting with Intel 486 ). Because of this you simple do know what problems we're dealing with almost every week when it comes to portability of source codes. Should I submit my report of different issues / problems I've personally detected during last 15+ years? I don't think so and nobody interested in that.

SSE technology and SIMD types are Intel's inventions as a response to AMD's 3D Now! technology. So, Intel is a complete owner of that stuff and if GCC developers did not implement a support for SSE technology and SIMD types in a right way according to Intel's specs it was in Intel's interest to contact leading GCC developers and convice them that a full compatibility for SIMD types must be provided.

By the way, MinGW C/C++ compiler ( a GCC for Windows in another words ) could compile union types. Here is a Test-case:

[ Test-case ]
...
typedef union tagTESTTYPE
{
int iA[4];
unsigned int uiA[4];
float fA[4];
double dA[2];
long double ldA[1];
} TESTTYPE;
...

[ Compiler output ]
...
Performing Makefile project actions
*** ScaLib Message: Compiling with MinGW v3.4.2 ***
*** ScaLib Message: Configuration - Desktop - _WIN32_MGW - DEBUG ***
MgwTestApp - 0 error(s), 0 warning(s)
...

So, it is not just a matter of union type. It is a matter of staying firm when it comes to standards invented by one company and convincing another company to fix problems if standards are broken.

Best regards,
Sergey

This is a follow up...

Georg, please take a look at my findings and I finally understood origins of the problem. In essence, the "wrong" piece of code we were talking about has to be wrapped with _MM2_FUNCTIONALITY macro. Here is a quote:

...the m128 datatype provided using _MM2_FUNCTIONALITY mode is implemented as struct...

[ Complete Technical Details ]

On November 7th 1996 the following comment was made by Intel software developer in xmmintrin.h header file ( the header was a part of a Processor Pack update for Visual Studio 98 ):

[ Visual C++ v6 ( also knwon as VS98 / xmmintrin.h attached ) ]
...
/*
* xmmintrin.h
*
* Principal header file for Streaming SIMD Extensions intrinsics
*
* The intrinsics package can be used in 2 ways, based whether or not
* _MM_FUNCTIONALITY is defined; if it is, the C/x87 implementation
* will be used (the "faux intrinsics").
*
*
* Note that the m128 datatype provided using _MM2_FUNCTIONALITY mode is
* implemented as struct, will not be 128b aligned, will be passed
* via the stack, etc. MM_FUNCTIONALITY mode is not intended for
* performance, just semantics.
*
* 07 Nov 96 [mpg]
*/
...

The same comments could be found in xmmintrin.h files in the following Visual Studios:

[ VS2005 & VS2008 & VS2010 & VS2012 ]
...
/*
* xmmintrin.h
*
* Principal header file for Streaming SIMD Extensions intrinsics
*
* The intrinsics package can be used in 2 ways, based whether or not
* _MM_FUNCTIONALITY is defined; if it is, the C/x87 implementation
* will be used (the "faux intrinsics").
*
*
* Note that the m128 datatype provided using _MM2_FUNCTIONALITY mode is
* implemented as struct, will not be 128b aligned, will be passed
* via the stack, etc. MM_FUNCTIONALITY mode is not intended for
* performance, just semantics.
*
*/
...

Fichiers joints: 

Fichier attachéTaille
Télécharger xmmintrin.vc6.h16.92 Ko
Portrait de Georg Zitzlsberger (Intel)

Hello,

the original feature request (DPD200240493) will be implemented with the next major compiler version (should be 15.0 towards end of 2014).
The changes were quite complex with a high risk of jeopardizing stability, hence we did not provide a solution for 14.x.

Best regards,

Georg Zitzlsberger

Connectez-vous pour laisser un commentaire.