__m128xxx data types within a class in C++

__m128xxx data types within a class in C++

Good Morning,

We are trying to use __m128xxx data types within a class in C++, and we see that it does not work unless we define the below code first in our header files. The compiler assumes that all load/stores on these data types are aligned (unless explicit unaligned store/load intrinsic is used). For C++, the objects in a struct are not aligned, unless you use your overloaded dynamic allocation.

Now, is this always a safe workaround, or are there any better ways of doing this?

Best Regards,

Lars Petter Endresen

//==================================================================

// Fix to make _mm_ functions work within classes in Microsoft C++

#include

#define _aligned_free(a) _mm_free(a)

#define _aligned_malloc(a, b) _mm_malloc(a, b)

void* operator new(size_t bytes) { return _mm_malloc(bytes,16); }

void* operator new[](size_t bytes) { return _mm_malloc(bytes,16); }

void operator delete(void* ptr) { _mm_free(ptr); }

void operator delete[](void* ptr) { _mm_free(ptr); }

//==================================================================

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

A struct (which is declared not allocated) can be aligned

__declspec( align(16) )
struct FOO
{
...
};

Alignment within the structure can be attained using the same technique.

Now if you have an array of FOO, and if FOO is not of a size equal to a multiple of the alignment, and if #pragma pack(1) is in effect, you may have a problem.

Allocation of this structure via the default new may present a problem if new does not honor the alignment for single FOO or for array of FOO if structure sizenot multiple of size of alignment requirements.

overloading the new operator is a suitable work arround. This alignment problem may have been addressed in the latest version of VS (and/or ICC) together with updates.

Jim Dempsey

www.quickthreadprogramming.com

Hello,

Does this mean that overloading the new operator this way guarantees only the start of the class is aligned? If so, one could declare all the 16 byte __m128xxx data types in the beginning of the class to ensure that each of these is properly aligned.

Lars Petter

Lars,

The __m128xxx, at least on MC VC++, implicitly have the __declspec( align(16) ).

If you are observing them not aligned on the 16 byte boundary with respect to the start of the structure then placing them at the front of the structure is a good alternative. Also you may create a derrived type with the __declspec( align(16) ) that contains only the __m128xxx. Note, the mention of "with respect to the start of the structure". If the structure is not aligned then this is a different issue.

MSDN states for static or stack declared structures (and classes) with alignment restrictions that the desired alignment is maintained. However, MSDN also states the default new and malloc do not assure alignment is provided. Instead MSDN recommends using _aligned_malloc, and if desired overloading new and delete as you have done.

It is worth noting the operator __alignof

typedef __declspec(align(32)) struct { int a; double b; } S;
int n = 50; // array size
S* p = (S*)aligned_malloc(n * sizeof(S), __alignof(S));
This looks like a candidate for a template or for
#define NEW(T,n) (T*)aligned_malloc(n * sizeof(T), __alignof(T))
S* p = NEW(S,n);

An alternative is when defining structures containing__mm128xx variables to define the new and delete operators for the specific function.

Jim Dempsey
www.quickthreadprogramming.com

Thanks Jim,

If I do not redefine _aligned_malloc and _aligned_free, I see that calls to _mm_malloc and _mm_free are are redirected to _aligned_malloc and _aligned_free by some MS header file. In certain cases I have seen that the Intel Compiler performs better with _mm_malloc and _mm_free, that are part of the Intel Compiler installation, than with _aligned_malloc and _aligned_free that are foreign to the Intel Compiler.

Lars Petter

Overrule _aligned_malloc and _aligned_free

//==================================================================
// Fix to make _mm_ functions work within classes in Microsoft C++
#include
#define _aligned_free(a) _mm_free(a)
#define _aligned_malloc(a, b) _mm_malloc(a, b)
void* operator new(size_t bytes) { return _mm_malloc(bytes,16); }
004011B4 push ebp
004011B5 mov ebp,esp
004011B7 push esi
004011B8 push eax
004011B9 push edi
004011BA push ecx
004011BB mov edi,ebp
004011BD sub edi,4
004011C0 mov ecx,1
004011C5 mov eax,0CCCCCCCCh
004011CA rep stos dword ptr es:[edi]
004011CC pop ecx
004011CD pop edi
004011CE pop eax
004011CF add esp,0FFFFFFF8h
004011D2 mov eax,dword ptr [bytes]
004011D5 mov dword ptr [esp],eax
004011D8 mov dword ptr [esp+4],10h
004011E0 call __mm_malloc (402F94h)
004011E5 add esp,8
004011E8 mov dword ptr [ebp-4],eax
004011EB mov eax,dword ptr [ebp-4]
004011EE add esp,4
004011F1 cmp ebp,esp
004011F3 call _RTC_CheckEsp (401590h)
004011F8 leave
004011F9 ret
Do not overrule _aligned_malloc and _aligned_free
//==================================================================
// Fix to make _mm_ functions work within classes in Microsoft C++
#include
//#define _aligned_free(a) _mm_free(a)
//#define _aligned_malloc(a, b) _mm_malloc(a, b)
void* operator new(size_t bytes) { return _mm_malloc(bytes,16); }
004011B4 push&nbsp ; ebp
004011B5 mov ebp,esp
004011B7 sub esp,8
004011BA push eax
004011BB push edi
004011BC push ecx
004011BD mov edi,ebp
004011BF sub edi,8
004011C2 mov ecx,2
004011C7 mov eax,0CCCCCCCCh
004011CC rep stos dword ptr es:[edi]
004011CE pop ecx
004011CF pop edi
004011D0 pop eax
004011D1 mov eax,dword ptr [__imp___aligned_malloc (408350h)]
004011D6 mov edx,esp
004011D8 mov dword ptr [ebp-4],edx
004011DB add esp,0FFFFFFF8h
004011DE mov edx,dword ptr [bytes]
004011E1 mov dword ptr [esp],edx
004011E4 mov dword ptr [esp+4],10h
004011EC call eax
004011EE add esp,8
004011F1 mov edx,dword ptr [ebp-4]
004011F4 cmp esp,edx
004011F6 call _RTC_CheckEsp (401630h)
004011FB mov dword ptr [ebp-8],eax
004011FE mov eax,dword ptr [ebp-8]
00401201 add esp,8
00401204 cmp ebp,esp
00401206 call _RTC_CheckEsp (401630h)
0040120B leave
0040120C ret
0040120D nop

Lars,

>>If I do not redefine _aligned_malloc and _aligned_free, I see that calls to _mm_malloc and _mm_free are are redirected to _aligned_malloc and _aligned_free by some MS header file.

This redirection will not assure you get the alignment that you desired unless the alignment of the object being allocated is compatible with that requested. Note in your code example that the _mm_malloc is only passed the number of bytes desired for the allocation. From the code shown, it is unknown just what allignment will be requested once alligned_malloc is called. I would assume only word allignment is requested by default. Depending on compiler options and/or linker options (implementation dependent) there may be an option to permit you to specify the alignment of unspecified mallocs.

Note, if this option is not available then you can make it available yourself. You can writefunctions named malloc and freewhich replaces the C Runtime Library functions but which call the C Runtime Library functions aligned_malloc and aligned free and which provide a default alignment of your choice (8 or 16 or ??). This would be a one line function, something you could quite easily do.

You still may have the problem of arrays of structures if you were not careful with the compiler options when compiling the structures. The best way to assure the correct options for the sensitive structures is by use of #pragma the save the current packing method (push), set the desired packing methon (pack), define the structure, restore the command line compiler options (pop).

Jim Dempsey

www.quickthreadprogramming.com

Hello Jim,

I am a little confused Jim, could you please provide a small compilable code example where _mm_malloc fails and _aligned_malloc works? This to ensure that using _mm_malloc in our code does notcause any problems.We have not seen any 16 byte alignement problemsby redirecting _aligned_malloc to _mm_malloc in our code, so I assume that it is safe in our special case, but I am not completely sure after reading your posting Jim.

Best Regards

Lars Petter

Lars,

You said in yourprior post

malloc calls_aligned_malloc

You did not say what alignment was used in _aligned_malloc when _aligned_malloc is called via malloc.

When you used the debugger and Dissassembly window to observe the call to _aligned_malloc, were you able to descern what was the alignment factor?

My guess is on x64 platform malloc(n) calls _aligned_malloc(n,8). And that won't necessarily work for you. You want _aligned_malloc(n,16).

Note, even if the _aligned_malloc(n,8) is returning a 16 byte aligned pointer today, the next revision of the runtime system tomorrow might not be so kind to do so. It is always best to explicitly state what you want.

Jim Dempsey

www.quickthreadprogramming.com

Hi Jim.

I did not say that malloc calls _aligned_malloc as you state, but that _mm_malloc is redirected to _aligned_mallocin a MS header file. As written in malloc.h:

#define _mm_free(a) _aligned_free(a)

#define _mm_malloc(a, b) _aligned_malloc(a, b)

Now, when I write an _mm_malloc statement in my software or redirect new to

to _mm_malloc, I intend to place _mm_malloc in the .asm and .exe files.

Therefore I prefer to redirect _aligned_malloc back to _mm_malloc again, before

malloc.h is included directly or indirectly during compilation. Then I see the

_mm_malloc instruction all over my .asm files and then I am very happy. Get it?

So my resolution to this problem is to use the workaround below and declare all

my __m128i variables in the beginning of each class/struct.

//==================================================================

// Fix to make _mm_ functions work within classes in Microsoft C++

#include

#define _aligned_free(a) _mm_free(a)

#define _aligned_malloc(a, b) _mm_malloc(a, b)

void* operator new(size_t bytes) { return _mm_malloc(bytes,16); }

void* operator new[](size_t bytes) { return _mm_malloc(bytes,16); }

void operator delete(void* ptr) { _mm_free(ptr); }

void operator delete[](void* ptr) { _mm_free(ptr); }

//==================================================================

As you see, I write 16 here, and this is required to make instructions like "pand"

work, data must be aligned.

Best Regards

Lars Petter Endresen

Leave a Comment

Please sign in to add a comment. Not a member? Join today