VecAnalysis Python* Script for Annotating Intel C++ & Fortran Compilers Vectorization Reports

 

This is the Python* script used to annotate Intel® C++ and Fortran compiler 13.1 (Intel® C++/Fortran/Visual Fortran Composer XE 2013 Update 2 and later) vectorization reports produced at -vec-report7.  The attached zip file contains:

  • vecanalysis.py 
  • vecmessages.py
  • README-vecanalysis.txt

NOTE: You will need Python* version 2.6.5 or higher. For more information, and download instructions please click here.

The new -vec-report7 (for Linux*) (/Qvec-report7 for Windows*) compiler option available in Intel® C++ and Fortran compilers version 13.1 allows the compiler to emit vector code quality messages and the corresponding message ID, and data values for vectorized loops.  The messages provide information such as the expected speedup, memory access patterns, and the number of vector idioms for vectorized loops.  Below is a sample of the type of messges the compiler will emit at -vec-report7:

  • loop was vectorized (with peel / with remainder)
  • unmasked aligned unit stride loads: 4
  • unmasked aligned unit stride stores: 2
  • saturating add/subtract: 3
  • estimated potential speedup: 6.270000

The attached Python* script takes the message IDs produced by the compiler as input and produces a .txt file that includes the original source code annotated with -vec-report7 messages.  The information gives more insight into the generated vector code quality without the need to analyze the assembly code. The naming convention for the output file is (filename_extension_vr.txt).  For example the output file corresponding to satSub.c would be satSub_c_vr.txt. The compiler does not invoke the Python script automatically.  The user needs to apply the Python script manually to the output file produced by the compiler as shown below.  The below command assumes the vecanalysis Python script files are located in the "vecanalysis" directory:

Example: icc -c -vec-report7 satSub.c 2>&1 | ./vecanalysis/vecanalysis.py --list

For more information please see the README.vecanalysis.txt file provided.

$ python
Python 2.6.5 (r265:79063, Jul  5 2010, 11:46:13)
[GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2
Type "help", "copyright", "credits" or "license" for more information.

$ icc -c -vec-report7 satSub.c 2>&1 | ./vecanalysis/vecanalysis.py --list
satSub.c(9): (col. 3) remark: SIMD LOOP WAS VECTORIZED.
satSub.c(9): (col. 3) remark: VEC#00001WPWR 1.
satSub.c(9): (col. 3) remark: VEC#00052 1.
satSub.c(9): (col. 3) remark: VEC#00101UASL 4.
satSub.c(9): (col. 3) remark: VEC#00101UASS 2.
satSub.c(9): (col. 3) remark: VEC#00101UUSL 2.
satSub.c(9): (col. 3) remark: VEC#00101UUSS 1.
satSub.c(9): (col. 3) remark: VEC#00201 5.
satSub.c(9): (col. 3) remark: VEC#00202 0.310000.
satSub.c(9): (col. 3) remark: VEC#00203 6.270000.
satSub.c(9): (col. 3) remark: VEC#00204 15.
satSub.c(9): (col. 3) remark: VEC#00405 3.
Writing satSub_c_vr.txt ... done
Statistics for all files

// Below is the vectorization summary for satSub.c
                                                                   Source Locations
Message                                                                 Count     %

// This line says there were 3 saturating add/subtract operations.  
// 100% means 
the message refers to a single location/loop in the program.
//  (Count = 1) means there is one instance of this message for the loops in the program.
saturating add/subtract: 3.                                            1 100.0%    
unmasked unaligned unit stride loads: 2.                     1 100.0%    
loop was vectorized (with peel/with remainder)            1 100.0%     
unmasked aligned unit stride stores: 2.                        1 100.0%     

// 100% of all loops (in this case a single loop) in the program were vectorized
// If there were 10 loops out of which 6 got vectorized, the % would be 60%

SIMD LOOP WAS VECTORIZED.                               1 100.0%    
unmasked aligned unit stride loads: 4.                         1 100.0%
scalar loop cost: 5.                                                       1 100.0%
lightweight vector operations: 15.                                 1 100.0%
vector loop cost: 0.310000.                                           1 100.0%
loop inside vectorized loop at nesting level: 1.              1 100.0%
unmasked unaligned unit stride stores: 1.                     1 100.0%
estimated potential speedup: 6.270000.                        1 100.0%
Total Source Locations:                                                 1

$ more satSub_c_vr.txt
VECRPT satSub.c
VECRPT                                                                    Source Locations
VECRPT Message                                                                 Count     %
VECRPT saturating add/subtract: 3.                                             1 100.0%
VECRPT unmasked unaligned unit stride loads: 2.                      1 100.0%
VECRPT loop was vectorized (with peel/with remainder)             1 100.0%
VECRPT unmasked aligned unit stride stores: 2.                         1 100.0%
VECRPT scalar loop cost: 5.                                                         1 100.0%
VECRPT unmasked aligned unit stride loads: 4.                           1 100.0%
VECRPT SIMD LOOP WAS VECTORIZED.                                 1 100.0%
VECRPT lightweight vector operations: 15.                                   1 100.0%
VECRPT vector loop cost: 0.310000.                                            1 100.0%
VECRPT loop inside vectorized loop at nesting level: 1.               1 100.0%
VECRPT unmasked unaligned unit stride stores: 1.                      1 100.0%
VECRPT estimated potential speedup: 6.270000.                         1 100.0%
VECRPT Total Source Locations:                                                               1

   1: #define SAT_U8(x) ((x) < 0 ? 0 : (x))
   2: void satsub(
   3:   unsigned char *a,
   4:   unsigned char *b,
   5:   int n
   6: ){
   7:   int i;
   8: #pragma simd
VECRPT (col. 3) SIMD LOOP WAS VECTORIZED.
VECRPT (col. 3) estimated potential speedup: 6.270000.
VECRPT (col. 3) lightweight vector operations: 15.
VECRPT (col. 3) loop inside vectorized loop at nesting level: 1.
VECRPT (col. 3) loop was vectorized (with peel/with remainder)
VECRPT (col. 3) saturating add/subtract: 3.
VECRPT (col. 3) scalar loop cost: 5.
VECRPT (col. 3) unmasked aligned unit stride loads: 4.
VECRPT (col. 3) unmasked aligned unit stride stores: 2.
VECRPT (col. 3) unmasked unaligned unit stride loads: 2.
VECRPT (col. 3) unmasked unaligned unit stride stores: 1.
VECRPT (col. 3) vector loop cost: 0.310000.
   9:   for (i=0; i<n; i++){
  10:     a[i] = SAT_U8(a[i] - b[i]);
  11:   }
  12: }
$

Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.