ICPC 14 (Composer XE 2013 SP1) only honors __attribute__(align(n))) for first function parameter

ICPC 14 (Composer XE 2013 SP1) only honors __attribute__(align(n))) for first function parameter

I have a set of computation kernels that are templated on the data type and the variable alignment. As ICC does not accept a template parameter as the alignment value in __assume_aligned() (it insists on a literal), I use the new support for __attribute__((align())) in ICPC 14. Unfortunately, the compiler sometimes ignores that attribute. More specifically, it seems to only apply the attribute to the first function argument.

Here is a reduced test case with a simple three argument kernel:

template<typename T, int align_>
using aligned_ptr = T* __attribute__((align(align_)));

template<typename T, int align_>
void add(aligned_ptr<T,align_> __restrict__ a,
         aligned_ptr<T,align_> __restrict__ b,
         aligned_ptr<T,align_> __restrict__ c,
         int n)
{
  for (int i = 0; i < n; ++i)
    a[i] += b[i] * c[i];
}

// example instantiation
template
void add<double,32>(aligned_ptr<double,32> __restrict__ a,
                    aligned_ptr<double,32> __restrict__ b,
                    aligned_ptr<double,32> __restrict__ c,
                    int n);

I used a template alias for the pointers to improve readability, but I verified that the problem also exists if the attribute is added explicitely to every single parameter.

My compiler is "icpc version 14.0.0 (gcc version 4.7.3 compatibility)" on OS X, but I also checked GCC 4.6 and 4.8. Compiling the example above yields the following output:

$ icc -no-use-clang-env -gcc-name=gcc-mp-4.7 -std=c++11 -c -xCORE-AVX2 -O3 align.cc -vec-report6 -S
align.cc(9): (col. 5) remark: vectorization support: reference a has aligned access
align.cc(9): (col. 5) remark: vectorization support: reference a has aligned access
align.cc(9): (col. 5) remark: vectorization support: reference b has unaligned access
align.cc(9): (col. 5) remark: vectorization support: reference c has unaligned access
align.cc(9): (col. 5) remark: vectorization support: unaligned access used inside loop body
align.cc(8): (col. 3) remark: vectorization support: unroll factor set to 4
align.cc(8): (col. 3) remark: LOOP WAS VECTORIZED
$
 

 As you can see, only references to variable a (the first parameter) are treated as aligned. This behavior is not restricted to three-argument functions, but also appears with only two arguments. Is this a known optimizer problem?

11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I was present at a customer meeting some months ago where the request to support such syntax was made, so I understand the attraction of it.

If this syntax were consistently effective in asserting alignment, it could become more popular than the recommendation I've seen of placing __assume_aligned assertions as close as possible to the critical loops or CEAN assignments.  I see a fairly good success rate for this practice on 64-bit linux, not so good on 64-bit Windows.  Still, I'm looking at a case where I see one of 5 arguments being reported both aligned and unaligned in consecutive assignments, and not optimizing fully even on linux, even though it's asserted both by __assume_aligned and by #pragma vector aligned.  It's not the first nor the last argument.

I'm sure I'm not the only one whose normal limit for scanning compiler reports is -opt-report.  I hadn't looked into this particular case of mine until I saw your report.

My preference would be for the aligned attribute to be part of the function signature:

double fooSum(double* vec __attribute__(align(64)), size_t n); // known aligned
double fooSum(double* vec, size_t n); // unknown alignment

Declaring a template to declare a type (with alignment) may work, but it is a lot of work to setup and use.

Jim Dempsey

www.quickthreadprogramming.com

That's a matter of preference, of course, but ICC unfortunately ignores the explicit variant as well (I tried). The release notes for ICPC 14.0 state that the compiler should automatically insert an __assume_aligned() statement at the beginning of the function for each parameter with an alignment attribute, but that somehow only seems to work for the first one. I would be perfectly fine with having to explicitely spell out the alignment attribute in the function signature...

OTOH, GCC 4.8 is perfectly fine with those attributes. That's what makes them really attractive, as we have to support both GCC and ICC and this mechanism avoids having to fiddle with two different mechanisms (ICC-style __assume_aligned and GCC-style __builtin_assume_aligned, which requires an extra cast and assignment). Moreover, ICC doesn't let me use a template parameter value in __assume_aligned.

>>That's a matter of preference

C++ is supposidly stongly typed. Wouldn't you think that if you write a function that requires aligned arguments that the compiler would assert that the calls to the function have aligned arguments? And, if you are not particularly interested in enforcement then use __assume_aligned.

Jim Dempsey

www.quickthreadprogramming.com

Hello Steffen,

I think you may have gone down a bit of a false path here in an effort to workaround a problem with __assume_aligned which is what you should use to do what I think you're trying to do. You say you had a problem passing a template parameter to __assume_aligned. Trying to do that seems to work for me. Can you provide a test case that shows your problem using __assume_aligned?

$ cat test-assume-aligned.cpp
template <int a>
void foo(double * d) {
   __assume_aligned(d, a);
}

int main() {
   double x[100];
   foo<47>(x);
   return(0);
}

$ icc -V -c test-assume-aligned.cpp
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.1.106 Build 20131008
Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

Edison Design Group C/C++ Front End, version 4.6 (Oct  8 2013 19:20:39)
Copyright 1988-2013 Edison Design Group, Inc.

test-assume-aligned.cpp(3): error: invalid alignment value
     __assume_aligned(d, a);
                         ^

compilation aborted for test-assume-aligned.cpp (code 2)

 

Brandon Hewitt
Technical Consulting Engineer

For 1:1 technical support: http://premier.intel.com

Software Product Support info: http://www.intel.com/software/support

Hi Brandon,

thank you for looking into this. Your example pretty much already makes for a perfect test case, apart from the weird alignment value.I would like the following code to work:

template <int a>
void foo(double *d)
{
  __assume_aligned(d,a);
}
void bar(double* d)
{
  foo<64>(d);
}

Unfortunately, that yields exactly the same error you have posted above (I'm on icpc 14.0.1.106 on OS X 10.9). The following code works on the other hand:

void foo(double *d)
{
  __assume_aligned(d,64);
}

So it looks to me like the compiler doesn't correctly flatten the template parameter value to a constant inside __assume_aligned(). When I discovered that ICPC 14 supports __attribute__((align(...))), I was excited that the attribute did indeed accept a template value without a compilation failure, but in that case the alignment is only honored for the first function argument. So in total I would like to have one of two things:

  1. Allow template parameter values inside __assume_aligned().
  2. Have ICPC correctly honor __attribute__((align(...))) for all function arguments. I think that problem should be fixed anyway because the release notes of the compiler create the impression that __attribute__((align(...))) is really equivalent to placing a corresponding __assume_aligned() at the start of the function body for each function parameter that carries the attribute, which doesn't happen for all parameters when you look at my initial post.

Steffen, My main concern is your initial statement:

>>...
>>Unfortunately, the compiler sometimes ignores that attribute. More specifically, it seems to only
>>apply the attribute to the first function argument.
>>...

So, I simply want you to verify alignments for ALL pointers a, b and c inside (!) the template-based function add.

However, in order to get as much as possible benefits from vectorization ( Is that what you want? ) a different declaration could be used and I provide more details later.

Note: I see that there is some degree of over-complexity in the declaration of the template-based function add.

>>...we have to support both GCC and ICC...

Since you need to support two versions and some differences already detected I wouldn't rely on one universal version related to using __restrict__ or __aligned__-liked keywords. I've solved a similar problem in order to support five different C++ compilers by implementing five different core processing parts for some algorithms since all these five C++ compilers use different techniques when it comes to optimization and vectorization. You could try to create a universal version by wrapping all the differences in some smart-macros and it is not too difficult in your case ( this is actually what I would do ).

I reproduced that problem recently with MinGW version 3.4.2 and here is a compilation output:
...
Test046.cpp: In function `void AddVectorsA(float*, float*, size_t)':
Test046.cpp:43: error: `__builtin_assume_aligned' undeclared (first use this function)
Test046.cpp:43: error: (Each undeclared identifier is reported only once for each function it appears in.)
Test046.cpp: In function `void AddVectorsB(float*, float*, float*, size_t)':
Test046.cpp:83: error: `__builtin_assume_aligned' undeclared (first use this function)
...

By the way, MinGW C++ compiler version 4.8.1 Release 4 successfully compiles codes when '__builtin_assume_aligned' is used.

Leave a Comment

Please sign in to add a comment. Not a member? Join today