Code Example of Power/Performance Optimization on Android* Using Intel® Intrinsics

View Video

Introduction

It goes without saying that battery life, especially of mobile devices, is critically important for users. We’ve all been in situations where we lose power right when we need it the most—navigating a new city, mid-conversation on an important call, and so on. It may not be completely intuitive, but by optimizing application performance, developers reduce power consumption and that helps users.

Analyzing Apps with a Combination of Intel® Graphics Performance Analyzers  + Intel® VTune™ Amplifier

What is the first step to improve the power/performance of your application? First, you have to understand whether your app is CPU or GPU bound. And you can do it using a combination of Intel® tools:

Intel® Graphics Performance Analyzers or Intel® GPA is a tool for graphics analysis and optimization of Microsoft DirectX* applications and Android* OpenGL ES* applications. You can find more about it here: https://software.intel.com/en-us/articles/gpa-which-version

For purposes of Android optimization I prefer the Intel® GPA console client. You can read about it here: https://software.intel.com/en-us/android/articles/using-intel-graphics-performance-analyzers-console-client-for-android-application

VTune™ Amplifier helps you analyze the algorithm choices and identify where and how your application can benefit from available hardware resources. Use VTune Amplifier to locate or determine the following:

  • The most time-consuming functions (hotspots) in your application and/or on the whole system
  • Sections of code that do not effectively utilize available processor time
  • The best sections of code to optimize for sequential performance and for threaded performance
  • Synchronization objects that affect the application performance
  • Whether, where, and how your application spends time on input/output operations
  • The performance impact of different synchronization methods, different numbers of threads, or different algorithms
  • Thread activity and transitions
  • Hardware-related bottlenecks in your code

Configure the data collection on the host system (Linux*, OS X*, or Windows*) and run the analysis on a remote system (Linux or Android). Remote analysis on Android and embedded Linux systems is supported by the VTune Amplifier for systems only.

The figure below shows how to use a combination of Intel GPA and Intel® VTune™ Amplifier to analyze and optimize your application.

What are Intel® Intrinsics?

Intel® intrinsics are assembly-coded functions that allow you to use C/C++ function calls and variables instead of assembly instructions. Intrinsics provide access to instructions that cannot be generated using the standard constructs of the C and C++ languages.

Intrinsics are expanded inline, eliminating function call overhead. Providing the same benefit as using inline assembly, intrinsics improve code readability, assist instruction scheduling, and help reduce debugging.

You can read more here: User and Reference Guide for the Intel® C++ Compiler 15.0 - Intrinsics

How to find and connect Intel® C++ Compiler for Android* OS to your project?

Intel® C++ Compiler for Android* integrates in Android NDK and provides an optimized alternative to compile x86 libraries.

Download and install Intel C++ Compiler for Android. Provide a path to NDK directory during the installation to integrate Intel C++ Compiler for Android into Android NDK.

After the successful installation, the Intel® C++ Compiler for Android will be automatically integrated into the Android NDK toolchain and will compile optimized libraries for x86 architecture.

Example

To demonstrate the usage of Intel intrinsics, let’s look at the C++ code:

Float x = 1.0f / sqrtf( y );

This type of code (especially in physics algorithms) often takes place in hotspots.

By analyzing this string in the VTune Amplifier, the profile  will show you that the compiler generates sqrt + div instead of rsqrt.

The way to fix it is using Intel intrinsics:

Float x = rsqrt( y );

Where rsqrt is:

         #include 

         …

         inline float rsqrt(const float x)
         {
             float r;
             _mm_store_ss(&r, _mm_rsqrt_ss( _mm_load_ss(&x)));
             return r;
         }

References

For more information, watch my video: https://videoportal.intel.com/media/0_qgvcof5s

 

About the Author

Stanislav Pavlov works in the Software & Service Group at Intel Corporation. He has 10+ years of experience in technologies. His main interest is optimization of performance, power consumption, and parallel programming. In his current role as a Senior Application Engineer providing technical support for Intel®-based devices, Stanislav works closely with software developers and SoC architects to help them achieve the best possible performance on Intel® platforms. Stanislav holds a Master's degree in Mathematical Economics from the National Research University Higher School of Economics. He is currently pursuing an MBA in the Moscow Business School.

For more complete information about compiler optimizations, see our Optimization Notice.

1 comment

Top

видео про Чернобыль и интринсики классное:)

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.