Resolve Issue

In the Source pane, you identified that in the multiply1 function the code line 39 resulted in the highest values for the Clockticks event. To solve this issue, do the following:

  1. Change the multiplication algorithm and, if using the Intel® compiler, enable vectorization.

  2. Re-run the analysis to verify optimization.

Change Algorithm

Note

The proposed solution is one of the multiple ways to optimize the memory access and is used for demonstration purposes only.

  1. Open the matrix.c file from the Source Files of the matrix project.

    For this sample, the matrix.c file is used to initialize the functions used in the multiply.c file.

  2. In line 90, replace the multiply1 function name with the multiply2 function.

    This new function uses the loop interchange mechanism that optimizes the memory access in the code.

    For the proposed optimization, use the Intel C++ Compiler to build the code. Intel compiler helps vectorize the data, which means that it uses SIMD instructions that can work with several data elements simultaneously. If only one source file is used, the Intel compiler enables vectorization automatically. The current sample uses several source files, that is why the multiply2 function uses #pragma ivdep to instruct the compiler to ignore assumed vector dependencies. This information lets the compiler enable the Supplemental Streaming SIMD Extensions (SSSE).

  3. Save files and rebuild the project using the compiler of your choice.

    If you have the Intel Composer XE installed, you may use it to build the project with the Intel C++ Compiler XE. To do this, select Intel Composer XE > Use Intel C++... from the Visual Studio Project menu and then Build > Rebuild matrix.

Verify Optimization

  1. Re-run the General Exploration analysis: .

    • From Visual Studio IDE: From the VTune Amplifier toolbar, click the down arrow next to the New Analysis button and select General Exploration - Nehalem / Westmere Analysis from the drop-down menu.

    • From the Standalone GUI: From the File menu, select New > General Exploration - Nehalem / Westmere Analysis.

    VTune Amplifier reruns the General Exploration analysis for the updated matrix target and creates a new result, r001ge, that opens automatically.

  2. In the r001ge result, click the Summary tab to see the Elapsed time value for the optimized code:

    You see that the Elapsed time has reduced from 56.740 seconds to 9.122 seconds and the VTune Amplifier now identifies only two types of issues for the application performance: high CPI Rateand Retire Stalls.

Next Step

Resolve Next Issue

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione