For Intel® System Studio 2016, find the corresponding article here
In this article, we are enabling and using Intel® Integrated Performance Primitives (Intel® IPP), Intel® Threading Building Blocks (Intel® TBB) and Intel® C++ Compiler on Linux ( Ubuntu 14.04 LTS 64-bit). We will build and run one of the examples that comes with Intel IPP and apply Intel TBB and Intel C++ Compiler on the example to observe the performance improvement of using Intel® System Studio features.
Intel System Studio used for this article is Intel System Studio 2015 Update 2 Ultimate Edition for Linux host. The components contained in the tool suite are the following:
- Intel Integrated Performance Primitives 8.2 Update 1
- Intel Threading Building Blocks 4.3 Update 3
- Intel C++ Compiler 15.0 Update 2
This example was tested on an Intel® Core™ i5 dual core platform.
Building the Intel IPP example with Intel TBB libraries and Intel C++ Compiler
STEP 1. Set up the environment variables for Intel IPP, Intel TBB and Intel C++ Compiler
We need to set up environment variables for Intel IPP, Intel TBB and Intel C++ Compiler to work appropriately. Use the following 3 commands in the command line, then the variables will be set. It is needed to input the right target architecture when you execute them. ex) 'ia32' IA-32 target and 'intel64' for Intel®64 target. Additionally, for the compiler, you also need to insert a platform type. ex) 'linux' for Linux* target and 'android' for Android* target. Finally, do not forget to type a dot and a space at the beginning which is '. '
- . /opt/intel/system_studio_2015.x.xxx/ipp/bin/ippvars.sh <arch type>
- . /opt/intel/system_studio_2015.x.xxx/tbb/bin/tbbvars.sh <arch type>
- . /opt/intel/system_studio_2015.x.xxx/bin/iccvars.sh -arch <arch type> -platform <platform type>
To verify if the above commands were executed correctly, type 'printenv' and check if 'IPPROOT' and 'TBBROOT' are listed and indicating Intel IPP and Intel TBB install directories, and 'PATH' is indicating '/opt/intel/system_studio_2015.x.xxx/bin/<arch type>'. For the future usage, it is recommended to write a bash script to enable multiple features of Intel System Studio.
STEP 2. Find the example
First, we will go find the IPP example and prepare to build with additional Intel System Studio features applied such as Intel TBB and Intel C++ Compiler.
When you install Intel System Studio 2015 with default settings, its installation directory is the following:
and the Intel IPP example archive file is located at
You will find 'ipp-examples.tgz' in the location. Extract the examples where you like, and find 'ipp_resize_mt' example folder. That isthe example we are using here. You can find additional documents when you extract the examples at '<Extracted Eamples>/documentation/ipp-examples.html'.
STEP 3. Build the Example
If you want to build the example without Intel TBB and the Intel C++ Compiler, just try 'make' at '<Extracted Eamples>/ipp_resize_mt' and save the binary for the future comparison. Since Intel IPP environment setup has been done already, the example should build without any problem.
Now we need to add Intel TBB and the Intel C++ Compiler to build a faster version of the original example. In 'Makefile' of the example, we can see comments that let us know how to enable Intel TBB and Intel C++ Compiler while building.
Type 'export CC=icc && export CXX=icpc && CXXFLAGS=-DUSE_TBB' .Now run 'make'at the 'ipp_resize_mt' folder to build the example.
Simple Performance Comparison
The Intel IPP example simply shows the performance of itself, as how long in average it spends on resizing one image.
Refer the following as the options and arguments that can be used to execute the resize sample.
When the resize example works without Intel TBB, resize function will be utilizing a single thread which results in not fully exploiting multiple cores. The following is the result of the resize example with a command : './ipp_resize_mt -i ../../lena.bmp -r 960x540 -p 1 -T AVX2 -l 5000'. This command means 'resize ../../lena.bmp into 960x540 using linear interpolation method and Intel® Advanced Vector Extensions 2 (Intel® AVX2) 5000 times'.
As we can see above, the average duration resizing a single image takes about 2.189ms in average. Given this result, we will test the same example with Intel TBB exploiting 2 cores. If Intel TBB has been successfully enabled, the thread option gets included in the help page.
When the resize example works with Intel TBB, resize function will be run on 2 threads simultaneously. The following is the result of the resize example with a command : './ipp_resize_mt -i ../../lena.bmp -r 960x540 -p 1 -T AVX2 -t 2 -l 5000'
Utilizing 2 threads at the same time resulted in exploiting both two cores and the performance increased about 76%.
To verify if the example technically exploit two cores simultaneously, we can use Intel® VTune™ Amplifier to investigate. The following picture shows the number of CPUs utilized during each execution. (Blue = Resize example without Intel TBB, Yellow = Resize example with Intel TBB)
A yellow bar on 2.00 tells us that 2 CPUs had been running simultaneously about 4.4s.
Intel VTune Amplifier results also shows how threads were working for specific tasks. Extracted results of functions used for resizing are listed below.
We can see only a single thread is used to handle the resize function and it is a heavy load. If this sort of circumstance happens, we should consider multi-parallelizing.The following is results of the one with Intel TBB.
As expected, 2 threads where running simultaneously for about 4.4s during the task and that increased the performance.
We saw how easily an Intel IPP example can be built and tested with other features of Intel System Studio. It is recommended to take a close look into the Intel IPP example to learn how to program with Intel IPP and Intel TBB. Intel TBB here parallelizes for the dual core processor and increase the performance.
Talking about the Intel C++ Compiler for this example in fact, just changing the compiler from GCC to the Intel C++ Compiler did not bring a big benefit in this case, since IPP resize function already is optimized with SIMD instructions and the loops were parallelized by Intel TBB. So there are not many other tasks that could be optimized by the Intel C++ Compiler in this example. If there were additional functions and loops that can be vectorized or parallelized so SIMD instructions or OpenMP* or Intel Cilk™ Plus could be used with the compiler, there would have been further chances to optimize the application.