By Xiao-Feng Li
User experience evaluation is different from traditional performance evaluation. The scores of many performance benchmarks are not able to tell of real user experience, because they only measure the steady state of the system. User interactions usually involve system state transitions. In this document, we introduce Android Workload Suite (AWS), which includes a set of Android workloads to map user interaction to system behavior, and then use software stack metrics to measure the interaction scenarios. We expect AWS to reflect the representative usage of Android client devices, and to be used to evaluate and validate Android optimizations in performance, power and user experience.
To systematically evaluate user experience of client devices, we need a set of standard workloads to represent the typical usage scenarios and to return metrics values to users. To construct such a workload suite, we have following steps:
To define user interaction scenarios, we have extensively surveyed available materials including public documents from industry key players, popular applications from market stores, built-in applications from shipping devices, form-factor usages of tablets and smartphones. We also have investigated user interaction life-cycle and software design of Android source code. We partition client usages into four categories as shown in Figure 1. All are important for a comprehensive workload suite.
All usage models include multiple use scenarios according to the system computation nature. For example, use scenarios of browser usage life-cycle can be shown as a state transition graph in Figure 2.
Note that this browser interaction lifecycle does not include HTML5 technology but rather, common and general browsing. Otherwise, the scenarios should be specific to the concrete web application developed in HTML5.
Every scenario in the client usage models can be roughly classified into following three scenario categories:
Each of the scenarios may expose specific behaviors within the system, hence requiring specific metrics to measure. The major metrics we consider essential for user experience are classified into two kinds, with each kind having two measurement areas.
Based on the methodology described so far, we can refine the browser interaction lifecycle with the measurement areas for each scenario as shown in Figure 3.
In order for the workload suite to be representative for both tablet and smartphone, we have investigated their usage differences. Unsurprisingly, the key difference is their size difference. A smartphone is usually used as a handy gadget as a Swiss Army®-knife, with following features:
In application design wise, smartphone applications are designed to fit the small screen size. Many smartphone games are cartoon-style and have light-weighted animations. The sensor controls in games are usually simple, and many are based on the accelerometer: Smartphone games are more designed to use "shaking" to control because it is easy to shake the phone in a hand. As a comparison, tablet games are more designed to use slow "tilting" to control by leveraging the gyroscope sensor. Besides the sensor usage difference, tablets with larger screen size have following additional characteristics:
Due to the differences between smartphones and tablets, some typical scenarios in one form factor may not be representative in another form factor. For example, smartphones usually have a status bar, whereas tablets use systems bar. The browser application in a smartphone usually switches its window when opening a new web page, while in a tablet it generally opens in a new tab.
At the same time, some scenarios exist on both form factors but should have design variants for each. For example, a 2D game has more animated sprites in its tablet profile then in smartphone. A browser scenario can load PC web page in tablet, while using mobile web pages in the smartphone.
With the use scenarios and measurement areas defined, we can construct workloads to reflect the interesting scenarios and to measure the user experience.
Before really developing the workloads, we should understand the relation between workloads and tools. Workloads characterize the representative usage model of the system, while tools analyze the system behavior. A tool itself does not represent a use case of the device, but analyzes the use case. At the same time, the common part of multiple workloads can be abstracted into a tool so as to be reused across the workloads. Below Figure 4 shows the various kinds of workloads.
The top half of Figure 4 shows the common user interaction scenarios in an Android system. The "Input" triggers the execution of "Activity 1", which in turn invokes "Service 1". Then "Service 1" communicates with "Service 2", which launches "Activity 2". The "Output" is extracted from "Activity 2".
In the bottom half of Figure 4 are displayed the four situations of how we measure the system.
The workloads we construct for Android user experience evaluation have all four different kinds for their different purposes. Mostly we expect to include in our final workload suite kind 1 and kind 4, because the kind 1 workloads are easy to use for white-box investigation, and the kind 4 workloads are useful to investigate various devices as black-boxes.
Once we decide upon a workload and its scenarios, we need to have a good understanding of the Android software stack for every scenario, and then choose the right metrics for every scenario.
Since our goal is to provide an engineering tool for engineers to evaluate and optimize Android user experience, we expect our evaluation methodology to be objective. We set up the following criteria f our measurement of user experience.
Take video playing evaluation as an example. Traditional performance benchmarks only measure video playback performance with some metrics like FPS (frame-per-second), or frame drop rate. This methodology has at least two problems when evaluating user experience. The first is that video playback is only part of the user interactions in playing video. A typical life-cycle of user interaction usually includes at least the following links: "launch player" → "start playing" → "seek progress" → "video playback" → "back to home screen", as shown in Figure 5. But, good performance in video playback cannot characterize the real user experience in playing video. User interaction evaluation is a superset of traditional performance evaluation. The other problem is that FPS is not enough to evaluate the smoothness of video playback. We describe the common challenges in workload construction in next section with a case study.
We use browser a scrolling scenario to discuss the workload construction process. Figure 6 shows the interactions when a user scrolls a page top-down in a browser.
As shown in Figure 6, at time T0, the finger presses on the screen and starts to scroll the page from position P0. When the finger reaches position P1 at time T1, the page content starts to move after the finger scrolling. When the page content reaches position P1 at time T2, the finger has moved to position P2. That is, during the finger movement, there is a lag distance between the page content and the finger. At time T3, the finger releases from the screen, and the page content finally reaches the same position where the finger releases.
In this scenario, we choose to measure following three aspects:
To measure the response time, we need understanding of the software internals for page scrolling. The scrolling process is shown in Figure 7.
The figure has three rows for input raw events, browser events, and browser drawing respectively. The screen touch sensor detects the touch operation and generates raw events to the system. When the framework receives the raw events, they transform them into move events, such as ACTION_DOWN, ACTION_UP and ACTION_MOVE. Every event has a coordinate (X, Y) pair associated. The browser computes the distance between the current move event and the starting position (the coordinate of the ACTION_DOWN event). If the distance reaches a platform specified threshold value, the browser considers the move events as part of a scroll gesture, and starts to scroll the page content accordingly. The browser scrolls the page content by drawing new frames with moved position on the screen. The user then can see continuous scrolling of the page content.
Figure 8 shows how we measure the responsiveness of browser scrolling. The response time of browser scrolling is the time from the first raw event delivered to the first scrolling frame is drawn.
To measure the smoothness of page scrolling, we log all the scrolling frames' time when they are drawn, as shown in Figure 9. We then compute the maximal frame time, number of frames longer than 30ms, FPS, and frame time variance to represent the smoothness.
One tricky thing in smoothness measurement is determining which frames are the scrolling frames. It is easy to determine the first frame. The difficulty is to determine the last frame. When the finger releases from the screen, there are still a few frames drawn as a result of the scrolling momentum (unless the finger moves very slowly.) We can count the last frame using the frame right before the finger releasing, or using the real last frame when the browser re-renders the page. To simplify the design, we choose the former approach, based on an assumption that the smoothness situation before the finger releasing is adequate to reflect the entire scrolling process smoothness.
To measure the lag distance between the fingertip and the page content, we logged the coordinate values and the timestamps of all the raw events and the drawn frames. We can compute the maximal distance between a frame and those events happening before the frame is drawn. We denote it as Distance[k] for frame k. Then for the entire scrolling process, we compute the lag distance as the maximal of Distance[k] for all frames. The approach is illustrated in Figure 10.
In order to make the measurement repeatable, we use Android UXtune* toolkit to generate the standard input gesture.
For different purposes, we have two different versions developed for browser scrolling measurement. One is a standalone workload that has a browser packaged in the workload together with the scenario driver to trigger the automatic execution and measurement. The other has only the scenario driver that triggers the device built-in browser to execute for the measurement. The former one can be used to compare the Android framework of different devices, while the latter one is mainly used to compare the device built-in browsers.
Based on the methodology described in preceding sections, we develop Android Workload Suite (AWS) to drive and validate our Android user experience optimizations.
Table 1 shows the AWS 2.0 workloads. The use cases were selected based on our extensive survey in the mobile device industry, market applications, and user feedbacks.
Table 1. Android workload suite (AWS) v2.0
We have established a systematic methodology for Android user experience optimization. It includes following steps.
For step 3, we basically rely on Android workload suite (AWS). For step 4, we have developed Android UXtune toolkit to assists user interaction analysis in the software stack.
AWS is still evolving based on user feedbacks and Android platform changes.
Some online public websites have useful information on user interactions and experience.
The author thanks his colleagues Greg Zhu and Ke Chen for their great supports in developing the methodology for Android user experience optimizations.
Xiao-Feng Li is a software architect in the System Optimization Technology Center of the Software and Services Group of Intel Corporation. Xiao-Feng has extensive experience in parallel software design and runtime technologies. Before he joined Intel in year 2001, Xiao-Feng was a manager in Nokia Research Center. Xiao-Feng enjoys ice-skating and Chinese calligraphy in his leisure time.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804