Improving Performance with Interprocedural Optimization

The compiler may be able to perform additional optimizations if it is able to optimize across source line boundaries. These may include, but are not limited to, function inlining. This is enabled with the /Qipo option.

Rebuild the program using the /Qipo option to enable interprocedural optimization.

Select Optimization [Intel C++] > Interprocedural Optimization > Multi-file(/Qipo).

Note that the vectorization report now appears in ipo_out.optrpt.

LOOP BEGIN at Driver.c(152,9)
Driver.c(152,9):remark #15542: loop was not vectorized: inner loop was already vectorized

LOOP BEGIN at Multiply.c(37,5) inlined into Driver.c(150,9)
Multiply.c(37,5):remark #15542: loop was not vectorized: inner loop was already vectorized

LOOP BEGIN at Multiply.c(49,9) inlined into Driver.c(150,9)
Multiply.c(50,13):remark #15388: vectorization support: reference a[0][i][j] has aligned access
Driver.c(150,9):remark #15388: vectorization support: reference x[j] has aligned access
Multiply.c(49,9):remark #15305: vectorization support: vector length 2
Multiply.c(49,9):remark #15399: vectorization support: unroll factor set to 4
Multiply.c(49,9):remark #15309: vectorization support: normalized vectorization overhead 0.594
Multiply.c(49,9):remark #15300: LOOP WAS VECTORIZED
Multiply.c(49,9):remark #15448: unmasked aligned unit stride loads: 2 
Multiply.c(49,9):remark #15475: --- begin vector cost summary ---
Multiply.c(49,9):remark #15476: scalar cost: 9 
Multiply.c(49,9):remark #15477: vector cost: 4.000 
Multiply.c(49,9):remark #15478: estimated potential speedup: 2.000 
Multiply.c(49,9):remark #15488: --- end vector cost summary ---
LOOP END

LOOP BEGIN at Multiply.c(49,9) inlined into Driver.c(150,9)
Remainder loop for vectorization
Multiply.c(50,13):remark #15388: vectorization support: reference a[0][i][j] has aligned access
Driver.c(150,9):remark #15388: vectorization support: reference x[j] has aligned access
Multiply.c(49,9):remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or /Qvec-threshold0 to override
Multiply.c(49,9):remark #15305: vectorization support: vector length 2
Multiply.c(49,9):remark #15309: vectorization support: normalized vectorization overhead 2.417
LOOP END
LOOP END
LOOP END

Note

Your line and column numbers may be different.

Now, run the executable and record the execution time.