How to use Renderscript on Intel® based devices

In this article I would like to give a brief description of Renderscript technology within Android™, and compare its performance with Dalvik* to solve a certain task on Intel based devices. I will discuss a brief method of Renderscript’s optimization.

The Renderscript API includes functions for 2D/3D rendering and mathematical calculations with high performance. It allows you to describe a task with the same type of independent calculations over a large volume of data and divide it into similar sub-tasks that can be executed quickly and in parallel on multi-core Android platforms.

This technology can improve the performance of a number of Dalvik applications related to image processing, image recognition, physical models, etc., which in turn stay machine independent.

1. Renderscript technology within Android

Let’s briefly consider how Renderscript works in Android, its advantages and disadvantages.

1.1 Renderscript offline compilation

Renderscript appeared in Honeycomb* / Android 3.0 (in API 11). In Android SDK directory platform-tools you can find llvm-rs-cc (offline compiler) to compile Renderscript (*.rs file) into byte code (*.bc file) and generate Java* classes of objects (*.java files) for structures, global variables within Renderscript and Renderscript itself. llvm-rs-cc is based on Clang compiler with a few changes for Android, which is a front-end for the LLVM compiler.



1.2 Renderscript run-time compilation

New framework in Android and built on the LLVM back-end is responsible for run-time compilation of byte code, linking with the right libraries, and launch and control of Renderscript. This framework consists of the following components: libbcc engaged in LLVM context initialization, parsing pragmas and other metadata in the byte code, bytecode’s compilation and dynamic linking with the needed libraries from libRS. libRS contains libraries (math, time, drawing, ref -counting, ...), structures and data types (Script, Type, Element, Allocation, Mesh, various matrices, ...).



Pic 2. Run-time compiler flow

Advantages:

  • Machine-independent application is obtained by the fact that Renderscript byte code included in the apk file in run-time will be compiled to native code for CPU of the platform where it will be launched.
  • High speed of execution is achieved due to parallelization of computing, run-time compiler optimization and native code execution.

Disadvantages:

  • A lack of detailed documentation for Renderscript complicates application development. It’s only limited by the proposed Renderscript run-time API shown here;
  • There is no support to execute Renderscript threads on GPU, DSP. You may encounter problems with the run-time balancing of threads in heterogeneous runs and using shared memory.

2. Dalvik vs. Renderscript in monochrome image conversion

2.1 Dalvik implementation

    private void DalvikFilter() {
    	float MonoMult[] = {0.299f, 0.587f, 0.114f};
    	int mInPixels[] = new int[mBitmapIn.getHeight() * mBitmapIn.getWidth()];
    	int mOutPixels[] = new int[mBitmapOut.getHeight() * mBitmapOut.getWidth()];
    	mBitmapIn.getPixels(mInPixels, 0, mBitmapIn.getWidth(), 0, 0,
    			mBitmapIn.getWidth(), mBitmapIn.getHeight());
    	for(int i = 0;i < mInPixels.length;i++) {
    		float r = (float)(mInPixels[i] & 0xff);
    		float g = (float)((mInPixels[i] >> 8) & 0xff);
    		float b = (float)((mInPixels[i] >> 16) & 0xff);

    		int mono = (int)(r * MonoMult[0] + g * MonoMult[1] + b * MonoMult[2]);

    		mOutPixels[i] = mono + (mono << 8) + (mono << 16) + (mInPixels[i] & 0xff000000);
    		}
    	mBitmapOut.setPixels(mOutPixels, 0, mBitmapOut.getWidth(), 0, 0,
    			mBitmapOut.getWidth(), mBitmapOut.getHeight());
    }

Let’s consider Dalvik function Dalvik_MonoChromeFilter to convert RGB-color to monochrome image:

What can we say? There is a simple loop with independent iterations processing a bunch of pixels. Let's see how fast it works!

For the experiment, we take the Megaphone* Mint on Intel® Atom™ Z2460 1.6GHz with Android ICS 4.0.4 and 600x1024 picture with a LEGO® robot carrying Christmas gifts.

The measurements of the time spent for processing will be done as follows:

private long startnow;
private long endnow;

startnow = Android.os.SystemClock.uptimeMillis();
Dalvik_MonoChromeFilter();
endnow = Android.os.SystemClock.uptimeMillis();
Log.d("Timing", "Execution time: "+(endnow-startnow)+" msec");

The message with the tag «Timing» can be received through ADB (Android Debug Bridge). We make dozens of measurements, for which the device should be reset before and make sure a dispersion of the results to be small.

The time for image processing by Dalvik is 353 msec.

Remark: Using multithreading (for example, class AsyncTask for creating tasks executed on separate threads) you can speed Dalvik version up to 2x at best due to the presence of two logical cores on the Intel Atom Z2460 1.6GHz.

Remark: the performance of the Renderscript implementation will be measured similar to Dalvik.

Now let’s consider function RS_MonoChromeFilter (Renderscript implementation taken from Android SDK licensed under the Apache 2.0 license) to convert RGB-color to monochrome image:

private Renderscript mRS;
private Allocation mInAllocation;
private Allocation mOutAllocation;
private ScriptC_mono mScript;
…
private void RS_MonoChromeFilter() {
mRS = Renderscript.create(this);//Renderscript context creating
mInAllocation = Allocation.createFromBitmap(mRS, mBitmapIn,
      		Allocation.MipmapControl.MIPMAP_NONE,
            	Allocation.USAGE_SCRIPT);/*allocation and initialization of shared memory */
      	mOutAllocation = Allocation.createTyped(mRS,
mInAllocation.getType());
      	mScript = new ScriptC_mono(mRS, getResources(), R.raw.mono); /*creating and binding Renderscript to the context */       
      	mScript.forEach_root(mInAllocation, mOutAllocation);/*Call root function by two SMP threads */
        
      	mOutAllocation.copyTo(mBitmapOut);
}

//mono.rs
//or our small Renderscript
#pragma version(1)
#pragma rs java_package_name(com.example.hellocompute)

//multipliers to convert a RGB colors to black and white
const static float3 gMonoMult = {0.299f, 0.587f, 0.114f};

void root(const uchar4 *v_in, uchar4 *v_out) {
  //unpack a color to a float4
  float4 f4 = rsUnpackColor8888(*v_in);
  //take the dot product of the color and the multiplier
  float3 mono = dot(f4.rgb, gMonoMult);
  //repack the float to a color
  *v_out = rsPackColorTo8888(mono);
}

The measured time is 112 msec.

We received performance gain equal 3.2x (compare the time of Dalvik and Renderscript: 353/112 = 3.2).

Remark: the measured time for Renderscript involves Renderscript context creating, allocation and initialization of memory required, creating and binding Renderscript to the context and execution of function root in mono.rs.

Remark: One of the points of interest for mobile developers is a size of final apk file. In my case Dalvik version size equals 404KB and Renderscript version size equals 406KB where 2KB is for Renderscript byte code (mono.bc) in apk file.

3. Renderscript optimization

The current performance of Renderscript can be improved to use more aggressive optimizations for floating-point operations. Add to Renderscript a pragma rs_fp_imprecise for these optimizations:

//mono.rs
//or our small Renderscript
#pragma version(1)
#pragma rs java_package_name(com.example.hellocompute)
#pragma rs_fp_imprecise
//multipliers to convert a RGB colors to black and white
const static float3 gMonoMult = {0.299f, 0.587f, 0.114f};

void root(const uchar4 *v_in, uchar4 *v_out) {
  //unpack a color to a float4
  float4 f4 = rsUnpackColor8888(*v_in);
  //take the dot product of the color and the multiplier
  float3 mono = dot(f4.rgb, gMonoMult);
  //repack the float to a color
  *v_out = rsPackColorTo8888(mono);
}

As a consequence we get an extra 10% performance increase in Renderscript implementation: 112 msec. -> 99 msec.

Remark: As the result we have visually the same monochrome image without artifacts and distortion.

Remark: There is no mechanism for explicit control of run-time compiler optimization Renderscript, in contrast to NDK, because compiler flags were previously declared inside Android for each platform (x86, ARM, etc.).

4. Dependence between Dalvik/Renderscript spent time for processing and image sizes

Let’s investigate the following question: what is the dependence between the main loop time of each implementation and a size of a processed image? We take 4 images of 300x512, 600x1024, 1200x1024, 1200x2048 sizes and make the appropriate time measurements for monochrome image processing. The results are shown in the graph and the table below.

300x512600x10241200x10241200x2048
Dalvik 85 353 744 1411
Renderscript 75 99 108 227
gain 1.13 3.56 6.8 6.2

Note the linear dependence between Dalvik’s time and image’s size unlike Renderscript. This difference can be explained by the time for Renderscript context initialization.

The gain isn’t significant for images of relatively small sizes, because the time for Renderscript context initialization is about 50-60 msec. However, we received 4-6x gain for images of medium sizes which are often used on Android devices.

Conclusion

This article covered Dalvik and Renderscript implementations for monochrome processing images of different sizes. The Renderscript for images of medium sizes looks like significantly better than Dalvik due to the parallelization, compiler’s optimization and native code execution. By this small example I have tried to show when Renderscript might become an assistant to increase performance of application, which will stay machine-independent.

Notices

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.


Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Intel, the Intel logo, Atom are trademarks of Intel Corporation in the U.S. and other countries.

*Other names and brands may be claimed as the property of others

Copyright© 2013 Intel Corporation. All rights reserved.

For more complete information about performance and benchmark results, visit www.intel.com/benchmarks

Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.