Case Study: Porting Stream to Android*

Published: 12/12/2013, Last Updated: 12/12/2013


This document demonstrates how to port the Stream benchmark app to an x86 platform via creating an Android* application that uses a native shared library. See for the latest app details.

This article serves as a guide for a more advanced use case of porting an app to Android* using the NDK. Specifically, the Stream benchmark will be ported. Stream has been around for quite some time and has set a standard in demonstrating "real world" memory bandwidth (as opposed to "theoretical bandwidth" metrics that serve to be more academic as opposed to what is typically seen in practice).

Stream provides the developer with an option for multithreading. If single-threading the app (ie: "Tuned"), a simple NDK compile will do without failure. However, by default, the app uses OpenMP* as its method for multithreading. As of October, 2011, Android* doesn't support OpenMP* at link time. The developer has to port the application to use POSIX* threads instead. The NDK will then be used to compile the app as a native library providing an infrastructure for using POSIX* threads (pthreads).


This document assumes that the Android* SDK is properly set up for use with the Eclipse* IDE. The document also assumes the latest NDK is installed and configured as well. The developer uses the NDK to compile native C/C++ code into a native library that the wrapper Android* application can then use. NDK compilation and linking will be demonstrated in this document for Stream.

Refer to other Intel® Developer Zone documents that describe how to procure and install Android* SDK and NDK.

Porting Steps: The File

Create a new project folder for Stream. This project will be used with the Android* NDK (NDK r6b was used for this exercise). Create a "jni" folder within it and place a template file in there (or create one from scratch). Here are the key changes shown in bold:

LOCAL_MODULE:= libstream

By convention, LOCAL_MODULE has a name starting with "lib", and "stream.c" is assumed to be the main application source file.

It is now time to enable OpenMP* for the app in the NDK build. This entails both compile time and link time enabling. Make the following changes in as well:

LOCAL_LDLIBS := -ldl -llog -lgomp
LOCAL_CFLAGS := -fopenmp 

Attempt to build the project with this command:

ndk-build APP_ABI=x86

You will notice that the above operation results in a build error, as the NDK doesn't understand the -lgomp flag. OpenMP* linking isn't enabled for Android* at the time. . Luckily, we can port the app to use POSIX* threads (pthreads) instead.

The build time and link time flags need to be changed as follows:

Figure 3.1: POSIX* Thread Flags

Porting Steps: Rewriting Stream to use POSIX* Threads

Note: It isn't within the scope of this document to fully unravel the details of the pthread – enabled code. A high level overview is simply given, with more focus on the NDK side.

As a robust starting point, it is a good idea to simply add a make flag for pthread support in Stream, rather than doing away with the OpenMP* infrastructure. Here, the make flag is assumed to have the name _PTHREADS. Then, any time a code block for OpenMP* is seen in the form of #pragma omp parallel { … }, the semantic equivalent form in the pthread implementation could appear as follows:

    //<NDK porting>
    //#pragma omp parallel for
    for (j=0; j<THREAD_OFFSET; j++)
        a[j + (THREAD_OFFSET * thread_ID)] = 1.0; 
        b[j+ (THREAD_OFFSET * thread_ID)] = 2.0;
        c[j+ (THREAD_OFFSET * thread_ID)] = 0.0;

Figure 4.1: Parallel Loop in POSIX*

In this case, THREAD_OFFSET is defined as N / MAX_THREADS, where N is the array problem size in the Stream source code. Thus, THREAD_OFFSET is used to allow multiple threads to work on different regions of an array concurrently. thread_ID is simply the ID of the thread entrant into this code, and the IDs are stored in an array after thread creation.

It may be appropriate in some cases to use mutex locks and unlocks. With mutex locks and unlocks I created my own syncing barrier for the threads, , since semantically, I wanted all threads to sync together after a parallel code section to mimic the parallel section of the OpenMP* implementation. Note: the implementation of a barrier is NOT trivial due to all of the timing nuances of the threads. In fact, the developer is oftentimes left with the implementation exercise as barriers are an option to maintain POSIX* compliance.

Finally, one thread was designated as the "master" thread. This thread was responsible for creating the pthreads, handling any non-parallel computation, and for interpreting / displaying the final benchmark results.

The developer can move on to the next section after the following:
- The POSIX* thread implementation has been completed
- The developer has verified the implementation
- The aforementioned NDK compilation is successful

After Compiling the Native Stream Code

Now, the typical process of calling the Native (compiled) code from the wrapper (Java*-based) Android* package can be used. In this case, the use of JNI was no more difficult than a basic "Hello World" example.

Stream's entry method simply is modified so that it has a typical JNI signature, as follows:

Figure 5.1: Modified Stream Entry Method

This header assumes "" was added to an Eclipse* Android* project, where the source file is used to call the Stream app via a System.Load() command. Of course, the developer may choose different nomenclature accordingly. Note also that in this simple example, none of the method parameters are used, but the developer can choose otherwise based on application.


This article provided a high-level overview on ensuring that the Stream application can properly build and link with multithreading support in the case of it being used as part of an Android* package. This guide discussed the process of porting the code to use POSIX* threads, rather than OpenMP*, in the case of building/linking with the Android* NDK.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804