Performance Assessment of Android* Applications

by Tuan H. Bui

Overview

In order to optimize an application for best user experience, it is important to understand the performance demand the application placed on a particular platform. On the Linux* operating system, one can use vmstat to monitor many performance aspect of an application such as its memory footprint, CPU and IO demand. Windows* Task Manager provided similar capability for Windows* operating system. In this note, we discuss how to obtain similar performance data for the Android* operating system.

 

 

Discussion

Android* is essentially a Linux* operating system and thus provides similar performance data. Getting to those data is somewhat difficult since the phone/tablet usage model allow only one application running in the foreground.

The first task to monitor performance of an application running on Android* is to run performance monitoring utilities while application is running a workload of interest. For example, we want to monitor performance of a video playback application to understand why the application is not running at the required framerate. Since our application of interest, the video player in this case, is running in the foreground we need to find an alternate way to monitor performance while the video player is running. One way to get around this is to write an Android* service that continually ran in the background that record performance data to a file that can be examine later. A simpler approach is to use the Android* Debug Bridge (adb) utility. Android* ADB provides a debugging shell to the Android* operating system through either the USB cable or through a TCP/IP port. Use the following procedure to enable adb debugging through TCP/IP.

 

 

  1. Enable USB debugging in the Android* control panel.
  2. Connect device to host through USB cable. If using a Windows* host, you will likely need to obtain the USB device driver from the device manufacturer. A Linux* host usually does not need a special device driver.
  3. Make sure the device can be seen by the host by issuing the command 'adb devices'. The command should return a list of devices that are connected to the host.
  4. Enable TCPIP debugging by issuing the command 'adb tcpip 5555'. This command instructs adb to restart and listen to port 5555 for adb connection message.
  5. Disconnect the USB cable and reconnect to the device by issuing the command 'adb <device_ip_address>'

Once connected through adb, the user will have access to a Unix shell that can be used to run various background monitoring commands while the application of interest is running on the main screen of the device such as 'top' and 'vmstat'. Keep in mind running other command in parallel with the application may degrade performance of the application of interest. The 'top' command, for example, is fairly CPU intensive and should be used with care. The best approach to monitor performance at minimum overhead is to simply collect the data that is already produced by the OS at regular interval and store them in a file to be post process after the application has stopped. Android*, like Linux*, provides most performance statistics in the /proc file system. The performance data of interest is stored in /proc/stat. To monitor system performance at regular interval of 5 second, simply use a similar shell script

 

 

while :
do
	echo Date: `date +”%Y-%m-%d %H:%M:%S”`
	cat /proc/stat >> /data/local/tmp/myvmstat.out
	sleep 5
done

 


Here are two samples of /proc/stat at 5 seconds interval captured while Android* Media Player is playing a 720p H264 video on a Motorola Xoom*1 tablet. Data were imported into a spreadsheet for readability.



There are lots of data included in /proc/stat. The most pertinent information is in the first line reporting the sum of all CPU activities. Each subsequent line starting with ‘cpuN’ reports activities for that particular CPU. We can see that the example device here has two processors. The meaning of each field is explained as below:

1 * Other names and brands may be claimed as the property of others



Time units are given in USER_HZ, usually a hundredth of a second. Numerical values in these statistics are cumulative since the time the system was rebooted. To get performance data for each interval of time, one need to compute the difference between the data captured at the beginning and end of each interval.

 

 

User: time spent executing processes in user mode
Nice: time spend executing processes in ‘niced’ user mode. Niced processes are processes running at priority other than default priority.
System: time spent executing in kernel or supervisory mode.
Idle: time executing idle process. No runable process is available.
Iowait: time waiting of IO to complete
Irq: time servicing interrupts
Softirq: time servicing softIRQs
Steal, Guest Time, Guest Nice Time: time spent in virtualized guess OS. Usually these values are zero in Android*
 

The rest of data reported in /proc/stats are described below:

 

 

Intr: number of interrupts that are taken by the system. The first column after the tag ‘intr’ is the total number of interrupts taken since the system reboot. Value in subsequent columns indicated the number of interrupt taken for the interrupt indicated by the column number starting from interrupt 0.
Ctxt: number of context switches taken since the system reboot.
Btime: system boot time given as offset in seconds since Jan 1, 1970 GMT.
Processes: number of processes created by the system since boot time
Procs running: number of running processes
Procs blocked: processes blocked waiting for IO completion.
Softirq: total number of softirq processed. SoftIRQ are non time critical software interrupts.
 

Table below shows the difference between the two data samples above. The sum column shows the total of time units spent in various task. The CPU Util column is the sum of all non idle tasks. The row right below each CPU shows the same data as a percentage of total time spent. Note that the total time spent in various tasks added up to 5 seconds. Since there are two processors in the system, the total available CPU time for each sample is 2x5 = 10 seconds.



For a reasonably long running workload such as playing video, we can form a time chart showing the CPU utilization of the workload over time. Studying such time chart allow identification of different phases where the workload might be taxing processor resources. Figure 1 below shows such a time chart for a graphical oriented workload. We note that the maximum CPU utilization attained by the workload is only 50%. This is an indication that the workload was only able to utilize one CPU. If it was important to speed up this workload, it may be worthwhile to find ways to utilize the second CPU.



Figure 1 - An3DBenchXL CPU Utilization

CPU utilization doesn’t tell the whole performance story. Modern processors usually can operate at several different frequencies, call pstates. Operating system can change processor operating frequencies to minimize power consumption and extending battery life in the case of mobile devices.

Figure 2 below shows CPU utilization along with average processor operating frequency of the same tablet playing the same video clip encoded at 720p H264 and 360p H264. As the data shows, playing 720p H264 video is much more resource intensive than the difference between CPU utilization may indicate. Decoding 720p H264 video not only requires about 4X CPU utilization but also running the processor at ~ 2.4X higher frequency.



Figure 2 – Video Playback Performance

Android* reports its processor operating frequency or pstate in the /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state file. The script below is modified to include collection of processor pstate data.

 

 

while :
do
	echo Date: `date +”%Y-%m-%d %H:%M:%S”`
	cat /proc/stat >> /data/local/tmp/myvmstat.out
	echo CPU0 Pstate Residency
	cat /sys/devices/cpu/cpu0/cpufreq/stats/time_in_state
	echo CPU1 Pstate Residency
	cat /sys/devices/cpu/cpu1/cpufreq/stats/time_in_state
	sleep 5
done


The data returned from processor pstate information for Motorola Xoom is shown below.

 

CPU0 PState Residency
216000 4356448
312000 400626
456000 559897
608000 398231
760000 92081
816000 0
912000 0
1000000 1485891


Each data pair represents a pstate frequency, in KHz, and the amount of time, in hundredth of a second unit, spent in that state since the system boot time. The number of pstate available is different for each processor. For this case, the NVIDIA Tegra2 processor driving the Motorola Xoom provides 8 possible pstates ranging from 216 MHz to 1 GHz.

Applying the same technique used to compute CPU utilization, we can compute the percentage of time the processor spent in each pstate for a given sampling interval. We can also derive an average frequency for the same interval by computing a weighed sum of the pstate frequency and the percentage of time spent in that pstate. Such data were derived and shown in Figure 1.

 

 

Summary

Using adb and leveraging the existing performance infrastructure that Android* inherited from Linux*, we can obtain insight into performance characteristics of Android* application. Such insights can be used for application optimizations such as improving user experience, reducing power consumption.