Problem with the compilation on the Linux cluster

Problem with the compilation on the Linux cluster

I'm compiling a code, already tested on Windows on out cluster (CentOS 7, with Intel Xeon CPUs). Everything compiles fine but the code gives this error out:

LLL.f90(3927): (col. 38) remark: unroll pragma will be ignored due to loop cannot be unrolled
LLL.f90(4825): (col. 13) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4811): (col. 13) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4862): (col. 13) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4886): (col. 9) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4913): (col. 13) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4940): (col. 9) remark: nounroll pragma will be ignored due to nounroll_and_jam pragma expected
LLL.f90(4943): (col. 17) remark: unroll pragma will be ignored due to unroll_and_jam pragma expected
LLL.f90(4889): (col. 17) remark: unroll pragma will be ignored due to loop cannot be unrolled

 

I used the same exact flag of the Windows code and I never received such a warning:

ifort -O3 -ipo-separate -unroll=50 -parallel -threads -qopt-prefetch=3 -qopt-matmul  -assume byterecl -qopenmp  -c LLL.f90

Since I don't want to to change the entire code if it's not necessary, would it be possible to still use !DIR$ NOUNROLL  and !DIR$ UNROLL=n instead of !DIR$ NOUNROLL_AND_JAM. Should I add a specific flag to enforce it? I tried to look in the documentation, but I couldn't find anything. 

For information I used ifortran 15 and ifortran 17 to compile on windows and ifortran 13/15/17 to try to compile on linux, and I recieved the same error for all 3 compilers on CentOS.

Thanks,

Marco

 

PS: NOUNROLL in those particular loops is necessary for how the code has been written and to compare the performance. Here are the compilation flags I used on windows:
 

/nologo /MP /O3 /Qunroll:20 /Qparallel /Qopt-prefetch=3 /Qipo /Qopt-matmul /I"..\lib" /Qopenmp /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc120.pdb" /libs:static /threads /c

/MP is windows specific, but the rest are the same. And no warning or error has been generated.

 

**EDIT** 

I tried to change to NOUNROLL_AND_JAM and UNROLL_AND_JAM = 10. Now the error becomes:

LLL.f90(3397): (col. 25) remark: unroll_and_jam pragma will be ignored due to (null)
LLL.f90(3397): (col. 25) remark: unroll_and_jam pragma will be ignored due to (null)
LLL.f90(4687): (col. 17) remark: unroll_and_jam pragma will be ignored due to (null)
LLL.f90(4125): (col. 25) remark: unroll_and_jam pragma will be ignored due to (null)
LLL.f90(4101): (col. 25) remark: unroll_and_jam pragma will be ignored due to (null)
LLL.f90(4183): (col. 25) remark: unroll_and_jam pragma will be ignored due to (null)

 

publicaciones de 6 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Those aren't errors.  They are simply warnings that a pragma you have entered will have no effect because it is redundant.  There would be no unroll in those positions even without your directive.  Automatic unroll_and_jam rarely happens in my experience, it is only for situations which are the same as ones which the compiler has been taught to recognize for major benchmarks.  As the warnings indicate, you would not want additional unrolling where unroll_and_jam happens.

It seem unlikely that so much unroll would be useful, even if the compiler should implement it.  The default unroll for vectorized sum reduction is excessive in many situations but doesn't get reduced by any options permitting vectorization, in my experience.  Setting Qunroll:4 can be beneficial in many cases where the default unroll is less and there is no parallelization.  I assume you don't have an old CPU (one which doesn't support SSE4.2) which might benefit from more unrolling than the current CPUs do.

According to the compiler the warning the limit is 16 for the unroll and I choose 10 and now 2 to test both on the directive of the code and the compiler flag.

The problem is that the code seems to do the unrolling from the compiler flag (the only warnings are for the explicit directives in the code).

 

Moreover, in my code ignoring the NOUNROLL WILL result in a wrong code due to how the OMP directive is written to take advantage of the locality of the memory. So I need the code not to ignore that part.

Also the CPU on the cluster are 6-10 years old depending on the node, and the one on my desktop is a non Intel are architecture, so I might benefit from more unrolling.

Marco

The "due to (null)" bothers me - the IPO processing (which is generating these messages) should do a better job with its diagnostics.

Steve (aka "Doctor Fortran") - Retired from Intel

That makes two of us... :)

Do you have any idea why the compiler should expect unroll_and_jam instead of unroll? I tried to find an answer in the documentation but I couldn't find any

Maybe Tim will have some idea - all that stuff is a mystery to me.

Steve (aka "Doctor Fortran") - Retired from Intel

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya