Solution

The following Intel® System Studio tools considerably improve performance of the reference code.

Using the Intel® Integrated Performance Primitives (Intel® IPP), the optimized building block libraries

Delivering the efficient multiple-input and multiple-output (MIMO) code is an important part of the application. This algorithm spreads the same total transmit power over the antennas to achieve an array gain, to enable reliable operation, low energy consumption, and high data rates within the limited bandwidth.

Intel® IPP provides ready-to-use and efficient functions for the MIMO algorithm. The Intel® IPP MIMO functions are simple interfaces that take a receiver signal as the input and return an estimated transmit signal minimizing a mean square error. By calling such functions, you do not need to implement low-level code tuning to acquire the high efficient code.
media/image11.png

Intel® IPP also provides a diverse range of other signal processing functions including Discrete Fourier Transform, single filtering, convolution, and sampling. These functions provide easy and efficient ways to process the signal data for the embedded applications. You only need to focus on the implementation of the high level functionality, while Intel® IPP provides the low level high optimized building blocks for high performing code.

Using the Intel® C++ Compiler – a leading compiler for code vectorizaton

High performance wireless communication code needs to utilize Single Instruction Multiple Data (SIMD) functions to process data in parallel. The Intel® C++ Compiler offers a rich set of machine-independent and machine-specific optimizations to maximize performance. The Intel® C++ Compiler generates the vectorised code automatically and supports programming for intrinsics, which enables you to write your own implementation with SIMD instructions directly.

Using the Intel® C++ Compiler to compile the Intel wireless eNodeB signal processing reference design code provides a substantial performance increase due to autovectorization. For example, the major data type in the reference design code is 16-bit fixed point. With the Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions, each operation can handle 16 data values instead of 1 data value for scalar, as shown below.


media/image12.png

Using the Intel® VTune™ Amplifier for Systems for system performance and power analysis

The Intel® VTune™ Amplifier for Systems identifies and locates performance bottlenecks in the embedded applications code. It enables you to analyze in-depth CPU, GPU and System-on-Chip (SOC) activities and events. To understand how power is consumed in the application, the Intel® Energy Profiler can identify wake-up causes, timers triggered by the application, and interrupts. Intel® VTune™ Amplifier shows problems with the related source code, so it is easy to quickly understand the performance problem of the application.

For more complete information about compiler optimizations, see our Optimization Notice.