Compiler warning: remark: unroll pragma will be ignored due to unrolling factor mismatched

Compiler warning: remark: unroll pragma will be ignored due to unrolling factor mismatched

Аватар пользователя Richard H.

Hi guys,

Do you know how to disable the following warning? It's annoying. The example is quite similar to the the one as in http://software.intel.com/sites/products/documentation/studio/composer/e...

void unroll(int a[], int b[], int c[], int d[])
{
int i;
#pragma unroll(4)
for (i = 0; i < 16; i++) {
b[i] = a[i] + 1;
d[i] = c[i] + 1;
}
}

$ icc --version
icc (ICC) 13.1.2 20130514

$ icc -c test.c -o test.o
test.c(6): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists.
test.c(6): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.
test.c(6): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.
test.c(6): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists.
test.c(6): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.

This warning will not be there if the loop size is changed to 32.
void unroll(int a[], int b[], int c[], int d[])
{
int i;
#pragma unroll(4)
for (i = 0; i < 32; i++) {
b[i] = a[i] + 1;
d[i] = c[i] + 1;
}
}
Actually we have a pragma abstraction in our projects. So having unroll(4) makes sense for other compilers (e.g. TI cl6x).
Could you help to give a solution / suggestion on this? At least, could you tell me how to disable it since there's no remark number for it.

Thanks,
Richard

18 сообщений / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя QIAOMIN Q. (Intel)

Due to potential OUTPUT dependence at code

b[i] = a[i] + 1;

d[i] = c[i] + 1;

So please addd "#pragma simd" under the #pragma unroll(4) if you are sure that there are no aliasing among these four arrays/pointers.

Then ---The world is quiet.

 

Thank you. -- QIAOMIN.Q

Intel Developer Support

Аватар пользователя Richard H.

Hi QIAOMIN,

Thanks for your feedback. However, it doesn't help after I tried to add "#pragma simd" under the #pragma unroll(4) . I added "-vec-report2" to compiler option and the output is as follows.
$ icc -c -vec-report2 test.c -o test.o
test.c(7): (col. 3) remark: loop was not vectorized: low trip count.
test.c(7): (col. 3) warning #13379: loop was not vectorized with "simd"
test.c(7): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists.
test.c(7): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.
test.c(7): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.
test.c(7): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists.
test.c(7): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched.

Do you have other suggestions?

Thanks,
Richard

Аватар пользователя QIAOMIN Q. (Intel)

Hello,  there is no problem with the compiler 14.0.1

$ icc -V

Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.1.106 Build 20131008

Part of the optimization report as below:

//

<488977u.c;-1:-1;hpo_vectorization;unroll;0> HPO Vectorizer Report (unroll)

488977u.c(6:7-6:7):VEC:unroll:  LOOP WAS VECTORIZED

<488977u.c;6:6;hlo_linear_trans;unroll;0>

//

Thanks,

Qiaomin

Аватар пользователя Richard H.

Hi Qiaomin,

Thanks for your reply. I've verified that after getting icc 14.0.1. However, as you can imagine, I can't add "#pragma simd" in all cases if "unroll" can be used. What do you think? Are there other solutions?

Thanks,
Richard

Аватар пользователя QIAOMIN Q. (Intel)

Hello Richard

Actually you don't need to add "#pragma simd" and "#pragma unroll" in all cases ,the compiler will unroll loops based on default heuristics ,in this specific sample code ,there are vector dependence among the four pointers -(int a[], int b[], int c[], int d[]) ,so you can see 'loop was not vectorized' in the vectorization report. adding "#pragma simd" or "#pragma vector always" only whenever you are sure about of no pointer aliasing and no calculation dependences in the loop .

The unroll pragma is supported only when option O3 is set. and adding -unroll-aggressive enables more aggressive unrolling heuristics .

However ,you should add explicit simd&unroll pragma when needed ,because in most cases the compiler does a good default job on these two things.unrolling a loop also may increase register pressure and code size in some cases.

Regards,

Qiao

 

Аватар пользователя iliyapolak

>>>,there are vector dependence among the four pointers -(int a[], int b[], int c[], int d[]) >>>

Do you mean pointer aliasing which cannot be known at compile time?

Аватар пользователя Tim Prince

Quote:

QIAOMIN Q. (Intel) wrote:

Hello Richard

Actually you don't need to add "#pragma simd" and "#pragma unroll" in all cases ,the compiler will unroll loops based on default heuristics ,in this specific sample code ,there are vector dependence among the four pointers -(int a[], int b[], int c[], int d[]) ,so you can see 'loop was not vectorized' in the vectorization report. adding "#pragma simd" or "#pragma vector always" only whenever you are sure about of no pointer aliasing and no calculation dependences in the loop .

The unroll pragma is supported only when option O3 is set. and adding -unroll-aggressive enables more aggressive unrolling heuristics .

However ,you should add explicit simd&unroll pragma when needed ,because in most cases the compiler does a good default job on these two things.unrolling a loop also may increase register pressure and code size in some cases.

Regards,

Qiao

 

Intel compilers in the past haven't always unrolled automatically as much as is desirable.  Prior to core-i7 "Nehalem," aggressive unrolling (more than 4) could often be useful.  Even with core-i7-2 and -3, non-vectorizable loops frequently benefited from unrolling by 4, even though the compiler chose not to unroll.  With corei7-4 "Haswell" I don't see benefit for unrolling by more than the Intel compiler chooses on many cases.  I didn't see documentation on why this would be.  For corei7-2 and 3 the combined working of loop stream detector and micro-op cache had been improved so as to reduce need for unrolling at compile time and produce full performance across a range of loop counts and instruction and data alignments, so maybe these have been improved further.  As Qiao said, there is less importance of unroll directives.

14.0.1 compiler more frequently takes advantage of __restrict pointer definitions than previous icc did.  If the compiler reports dependence, it often means simply that there isn't sufficient information (such as __restrict qualifier) to support disambiguation.  #pragma omp simd is one of several ways to over-rule the compiler's finding of potential aliasing.

Аватар пользователя QIAOMIN Q. (Intel)

when coding like this

 #pragma unroll(4)
 #pragma ivdep               //when array a,b,c,d point to non-alising memory location ,or restrict keyword can be used.
#pragma vector aligned  //when array a,b,c,d are aligned
 for (i = 0; i < 16; i++) {
 b[i] = a[i] + 1;
 d[i] = c[i] + 1;
and compile using $ icc 488977u.c -c -vec-report6 -O3

See the output

488977u.c(8): (col. 2) remark: vectorization support: reference b has aligned access
488977u.c(8): (col. 2) remark: vectorization support: reference a has aligned access
488977u.c(9): (col. 2) remark: vectorization support: reference d has aligned access
488977u.c(9): (col. 2) remark: vectorization support: reference c has aligned access
488977u.c(7): (col. 2) remark: loop was completely unrolled
488977u.c(7): (col. 2) remark: vectorization support: unroll factor set to 4
488977u.c(7): (col. 2) remark: LOOP WAS VECTORIZED

when with no ivdep pragma specified ,you get warning like
488977u.c(6): (col. 2) remark: loop was not vectorized: existence of vector dependence
488977u.c(8): (col. 2) remark: vector dependence: assumed FLOW dependence between d line 8 and a line 7
488977u.c(7): (col. 2) remark: vector dependence: assumed ANTI dependence between a line 7 and d line 8

488977u.c(7): (col. 2) remark: vector dependence: assumed OUTPUT dependence between b line 7 and d line 8
488977u.c(8): (col. 2) remark: vector dependence: assumed OUTPUT dependence between d line 8 and b line 7

The compiler cannot safely vectorize a loop if there is even a potential dependency. Consider the following example:

for (i = 0; i < size; i++) {  c[i] = a[i] * b[i];}

In the above example, the compiler needs to determine whether, for some iteration i, c[i] might refer to the same memory location as a[i] or b[i] for a different iteration. (Such memory locations are sometimes said to be “aliased”). For example, if a[i] pointed to the same memory location as c[i-1], there would be a read-after-write dependency(FLOW dependence) as in the earlier example. If the compiler cannot exclude this possibility, it will not vectorize the loop unless you provide the compiler with hints.

Аватар пользователя iliyapolak

Unrolling by more than four can increase register usage pressure and as @Tim mentioned  probably for small loops which fit LSD which is coupled with micro-ops cache can do a better job than aggressive unrolling.

Аватар пользователя Richard H.

Hi all,

Thanks for all your inputs. From the above description, I think I may get a conclusion that unroll() is not that useful but restrict keyword does.
As a result, I checked again for my code. It seems icc won't complain if there's restrict keyword there.
However, I think I might find a icc bug. You can have a look at my following code.

void test01(float *__restrict a, float *__restrict b)
{
  int i;
  #pragma unroll(2)
  for (i = 0; i < 8; i++)
  {
    b[2*i] = a[2*i];
    b[2*i + 1] = a[2*i + 1];
  }
}

typedef float * __restrict DLB_CLVEC;
void test02(DLB_CLVEC a, DLB_CLVEC b)
{
  int i;
  #pragma unroll(2)
  for (i = 0; i < 8; i++)
  {
    b[2*i] = a[2*i];
    b[2*i + 1] = a[2*i + 1];
  }
}

$icc --version
icc (ICC) 14.0.1 20131008
$ icc -O3 -vec-report2 -c test.c
test.c(19): (col. 3) remark: LOOP WAS VECTORIZED
test.c(31): (col. 3) remark: loop was not vectorized: existence of vector dependence
test.c(31): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists
test.c(31): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched
test.c(31): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched
test.c(31): (col. 3) remark: unroll pragma will be ignored due to unroll factor exists
test.c(31): (col. 3) remark: unroll pragma will be ignored due to unrolling factor mismatched

I expect that test01 and test02 are exactly the same. However, it seems that icc ignores the restrict keyword in the typedef.
What do you think?

Thanks,
Richard

Аватар пользователя iliyapolak

While not using typedef as shown in your code was your loop vectorized?

Аватар пользователя Richard H.

Yes. It was correctly vectorized as "test.c(19): (col. 3) remark: LOOP WAS VECTORIZED". It seems the difference is only due to typedef.

Richard

 

Аватар пользователя QIAOMIN Q. (Intel)

Thanks ,this bug of "typedef float * __restrict DLB_CLVEC; doesn't take effect" has been inputed in our bug-tracking system ,i will keep you posted whenever there are progress on this.

 

Thank you.
--
QIAOMIN.Q
Intel Developer Support

User forums:                   http://software.intel.com/en-us/forums/

Аватар пользователя Richard H.

Hi QIAOMIN, 

Great!

Thanks,
Richard

Аватар пользователя Richard H.

Hi QIAOMIN, 

Great!

Thanks,
Richard

Аватар пользователя QIAOMIN Q. (Intel)

The problem only happen with icc not icpc .The fix for this will be shipped in an upcoming release of Compiler 14.0 .Thanks for your issue submission.

 

Thank you.
--
QIAOMIN.Q
Intel Developer Support
Please participate in our redesigned community support web site:

User forums:                   http://software.intel.com/en-us/forums/

Аватар пользователя Richard H.

That's great!

Thanks,

Richard

Зарегистрируйтесь, чтобы оставить комментарий.