Intel® Advanced Vector Extensions

Processing of data in SSE/AVX/AVX2

Hello!

Im working on my project and Im looking for the answer:

When Im processing 256-bits of data, is better to use (in one core) for this one whole YMMx register or to split them for 2x128-bits and process them through 2 XMMx registers at different ports, hence on different SSE/AVX unit (in Sandy Bridge there are 3 ports per core for AVX)?  Which option is faster?

Processing of data in SSE/AVX/AVX2

Hello!

Im working on my project and Im looking for the answer:

When Im processing 256-bits of data, is better to use (in one core) for this one whole YMMx register or to split them for 2x128-bits and process them through 2 XMMx registers at different ports, hence on different SSE/AVX unit (in Sandy Bridge there are 3 ports per core for AVX)?  Which option is faster?

Diagnostic 15527: loop was not vectorized: function call to xxx cannot be vectorized

Product Version: Intel(R) Visual Fortran Compiler XE 15.0 or a later version

Cause:

The vectorization report generated when using Visual Fortran Compiler's optimization options ( /O2 /Qopt-report:2 /Qopt-report-phase:vec states that loop was not vectorized since loop with function call cannot be vectorized.

Example:

An example below will generate the following remark in optimization report:

  • Entwickler
  • Partner
  • Professoren
  • Studenten
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • Windows*
  • Fortran
  • Experten
  • Fortgeschrittene
  • Intel® Parallel Studio XE
  • Intel® Parallel Studio XE Cluster Edition
  • Intel® Parallel Studio XE Composer Edition
  • Intel® Parallel Studio XE Professional Edition
  • vectorization
  • Intel Compilers Vectorization Reports Optimization Reports
  • Optimization Reports
  • vec-report vectorization reports
  • diagnostic messages
  • Intel® Advanced Vector Extensions
  • OpenMP*
  • Entwicklungstools
  • Optimierung
  • Vektorisierung
  • Intel® Advanced Vector Extensions abonnieren