Resolve Issue

In the Source pane, you identified that in the multiply1 function the code line 51 resulted in the highest values for the Clockticks event. To solve this issue, do the following:

  1. Change the multiplication algorithm and, if using the Intel® compiler, enable vectorization.

  2. Re-run the analysis to verify optimization.

Change Algorithm


The proposed solution is one of the multiple ways to optimize the memory access and is used for demonstration purposes only.

  1. Open the multiply.h file from the Source Files of the matrix project.

    For this sample, the multiply.h file is used to define the functions used in the multiply.c file.

  2. In line 36, replace the multiply1 function name with the multiply2 function.

    This new function uses the loop interchange mechanism that optimizes the memory access in the code.

    For the proposed optimization, use the Intel C++ Compiler to build the code. Intel compiler helps vectorize the data, which means that it uses SIMD instructions that can work with several data elements simultaneously. If only one source file is used, the Intel compiler enables vectorization automatically. The current sample uses several source files, that is why the multiply2 function uses #pragma ivdep to instruct the compiler to ignore assumed vector dependencies. This information lets the compiler enable the Supplemental Streaming SIMD Extensions (SSSE).

  3. Save files and rebuild the project using the compiler of your choice.

    If you have the Intel C++ Compiler installed and integrated into Visual Studio, select Project > Intel Compiler > Use Intel C++... and then Build > Rebuild matrix.

Verify Optimization

  1. Re-run the General Exploration analysis: .

    • From Visual Studio IDE: From the VTune Amplifier toolbar, click the down arrow next to the New Analysis button and select General Exploration from the drop-down menu.

    • From the Standalone GUI: From the File menu, select New > General Exploration.

    VTune Amplifier reruns the General Exploration analysis for the updated matrix target and creates a new result, r001ge, that opens automatically.

  2. In the r001ge result, click the Summary tab to see the Elapsed time value for the optimized code:

    You see that the Elapsed time has reduced from 3.425 seconds to 1.471 seconds. CPI Rate and LLC Miss count is still an issue though has reduced significantly.

Next Step

Resolve Next Issue

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Per informazioni più dettagliate sulle ottimizzazioni basate su compilatore, vedere il nostro Avviso sull'ottimizzazione.