Why the same loop is not vectorized and vectorized inside itself?

Why the same loop is not vectorized and vectorized inside itself?

I would like to understand why the report of the loop starts with the message "... was not vectorized" and right inside the loop it there's another loop, with the message "loop was vectorized". As I understand, the inner loop is treating itself... Does anyone have a clue?

Is it really that the compiler nested the loop (530, 6) inside itself?

...

         LOOP BEGIN at suktmig2d_OpenMP.c(530,6) inlined into suktmig2d_OpenMP.c(302,3)
            remark #25399: memcopy generated
            remark #15542: loop was not vectorized: inner loop was already vectorized
            remark #25015: Estimate of max trip count of loop=8

            LOOP BEGIN at suktmig2d_OpenMP.c(530,6) inlined into suktmig2d_OpenMP.c(302,3)
               remark #15389: vectorization support: reference datalo[k-?] has unaligned access   [ suktmig2d_OpenMP.c(531,7) ]
               remark #15389: vectorization support: reference *(*(lowpass+nc*8)+(k+?-1)*4) has unaligned access   [ suktmig2d_OpenMP.c(531,21) ]
               remark #15381: vectorization support: unaligned access used inside loop body
               remark #15305: vectorization support: vector length 8
               remark #15309: vectorization support: normalized vectorization overhead 1.000
               remark #15300: LOOP WAS VECTORIZED
               remark #15450: unmasked unaligned unit stride loads: 1 
               remark #15451: unmasked unaligned unit stride stores: 1 
               remark #15475: --- begin vector cost summary ---
               remark #15476: scalar cost: 4 
               remark #15477: vector cost: 0.750 
               remark #15478: estimated potential speedup: 4.000 
               remark #15488: --- end vector cost summary ---
               remark #25015: Estimate of max trip count of loop=3
            LOOP END

            LOOP BEGIN at suktmig2d_OpenMP.c(530,6) inlined into suktmig2d_OpenMP.c(302,3)
            <Remainder loop for vectorization>
               remark #25015: Estimate of max trip count of loop=24
            LOOP END
         LOOP END

...

 

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I don't think a definitive answer could be given without at least a working example. With inlining even then it may be obscure.
It is usual that an outer loop doesn't vectorize when the more useful inner loop vectorization is achieved. Memset is taken where the compiler judges it preferable to inline vectorization.
If this loop takes enough time to be worth further effort at optimizing, the comments about alignment may be the most important hints to be taken from the report. For example, you might be able to assert alignment if you can be assured of it.

I have this behavior even without inlining. It is like a loop inside the same loop.

Alignment is not possible because sometimes the loop is accessed beginning at index 0 others at 1 and so forth.

         LOOP BEGIN at suktmig2d_OpenMP.c(530,6)
            remark #25399: memcopy generated
            remark #15542: loop was not vectorized: inner loop was already vectorized
            remark #25015: Estimate of max trip count of loop=8

            LOOP BEGIN at suktmig2d_OpenMP.c(530,6)
               remark #15389: vectorization support: reference datalo[k-?] has unaligned access   [ suktmig2d_OpenMP.c(531,7) ]
               remark #15389: vectorization support: reference *(*(lowpass+nc*8)+(k+?-1)*4) has unaligned access   [ suktmig2d_OpenMP.c(531,21) ]
               remark #15381: vectorization support: unaligned access used inside loop body
               remark #15305: vectorization support: vector length 8
               remark #15309: vectorization support: normalized vectorization overhead 1.000
               remark #15300: LOOP WAS VECTORIZED
               remark #15450: unmasked unaligned unit stride loads: 1 
               remark #15451: unmasked unaligned unit stride stores: 1 
               remark #15475: --- begin vector cost summary ---
               remark #15476: scalar cost: 4 
               remark #15477: vector cost: 0.750 
               remark #15478: estimated potential speedup: 4.000 
               remark #15488: --- end vector cost summary ---
               remark #25015: Estimate of max trip count of loop=3
            LOOP END

            LOOP BEGIN at suktmig2d_OpenMP.c(530,6)
            <Remainder loop for vectorization>
               remark #25015: Estimate of max trip count of loop=24
            LOOP END
         LOOP END

The source code is in my github:
https://github.com/rodrigo-prado/kirchhoff-ccpe-2018/blob/master/CodigoK...

The report is in:

https://github.com/rodrigo-prado/kirchhoff-ccpe-2018/blob/master/CodigoK...

Thanks for your answer!

Leave a Comment

Please sign in to add a comment. Not a member? Join today