The compiler may be able to perform additional optimizations if it is able to optimize across source line boundaries. These may include, but are not limited to, function inlining. This is enabled with the /Qipo option.
Rebuild the program using the /Qipo option to enable interprocedural optimization.
Select Optimization [Intel C++] > Interprocedural Optimization > Multi-file(/Qipo).
Note that the vectorization report now appears in ipo_out.optrpt.
LOOP BEGIN at Driver.c(152,9) Driver.c(152,9):remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at Multiply.c(37,5) inlined into Driver.c(150,9) Multiply.c(37,5):remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at Multiply.c(49,9) inlined into Driver.c(150,9) Multiply.c(50,13):remark #15388: vectorization support: reference a[0][i][j] has aligned access Driver.c(150,9):remark #15388: vectorization support: reference x[j] has aligned access Multiply.c(49,9):remark #15305: vectorization support: vector length 2 Multiply.c(49,9):remark #15399: vectorization support: unroll factor set to 4 Multiply.c(49,9):remark #15309: vectorization support: normalized vectorization overhead 0.594 Multiply.c(49,9):remark #15300: LOOP WAS VECTORIZED Multiply.c(49,9):remark #15448: unmasked aligned unit stride loads: 2 Multiply.c(49,9):remark #15475: --- begin vector cost summary --- Multiply.c(49,9):remark #15476: scalar cost: 9 Multiply.c(49,9):remark #15477: vector cost: 4.000 Multiply.c(49,9):remark #15478: estimated potential speedup: 2.000 Multiply.c(49,9):remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at Multiply.c(49,9) inlined into Driver.c(150,9) Remainder loop for vectorization Multiply.c(50,13):remark #15388: vectorization support: reference a[0][i][j] has aligned access Driver.c(150,9):remark #15388: vectorization support: reference x[j] has aligned access Multiply.c(49,9):remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or /Qvec-threshold0 to override Multiply.c(49,9):remark #15305: vectorization support: vector length 2 Multiply.c(49,9):remark #15309: vectorization support: normalized vectorization overhead 2.417 LOOP END LOOP END LOOP END
Now, run the executable and record the execution time.