| December 18, 2011 11:00 PM PST | |
Abstract
User experience evaluation is different from traditional performance evaluation. The scores of many performance benchmarks are not able to tell of real user experience, because they only measure the steady state of the system. User interactions usually involve system state transitions. In this document, we introduce Android Workload Suite (AWS), which includes a set of Android workloads to map user interaction to system behavior, and then use software stack metrics to measure the interaction scenarios. We expect AWS to reflect the representative usage of Android client devices, and to be used to evaluate and validate Android optimizations in performance, power and user experience.User interaction scenarios with client device
To systematically evaluate user experience of client devices, we need a set of standard workloads to represent the typical usage scenarios and to return metrics values to users. To construct such a workload suite, we have following steps:- Define the representative usage scenarios
- Map user behavior to system software operations
- Construct workloads
Figure 1. Client device usage categories
All usage models include multiple use scenarios according to the system computation nature. For example, use scenarios of browser usage life-cycle can be shown as a state transition graph in Figure 2.
Note that this browser interaction lifecycle does not include HTML5 technology but rather, common and general browsing. Otherwise, the scenarios should be specific to the concrete web application developed in HTML5.
Figure 2. Browser use scenarios
Every scenario in the client usage models can be roughly classified into following three scenario categories:
- User operations.
It mainly includes the common interaction scenarios like browsing, gaming, authoring, setting, configuring, etc. Touch and sensors are the major input devices. This category also includes I/O and communication scenarios. - Loading and rendering.
The category mainly includes system computation scenarios like browsers loading a web page, eBook document opening. gallery viewing of an image, etc. Rendering scenario is usually part of a loading scenario, and sometimes considered a separate one after loading process, including HTML5 rendering, media rendering, graphics rendering, etc. - Task management.
The two categories above cover the common application scenarios. The last category includes the scenarios for application management, such as application launch, task switching, notification alert, incoming call, and multi tasking management.
- How a user controls a device. This aspect has mainly two measurement areas.
- Accuracy/fuzziness. It evaluates what accuracy, fuzziness, resolution, and range are supported by the system for inputs from the touch screen, sensors, and other sources. For example, how many pressure levels are supported by the system, how the sampled touch events' coordinates are close to the fingertip move track on the screen, how many fingers can be sampled at the same time, etc.
- Coherence. It evaluates the drag lag distance between the fingertip and the dragged graphic object in the screen. It also evaluates the coherence between the user operations and the sensor-controlled objects, e.g., the angle degree difference between the tilting controlled water flow and the device oblique angle.
- How a device reacts to a user. This aspect also has two measurement areas:
- Responsiveness. It evaluates the time between an input being delivered to the device and device showing visible response. It also includes the time spent to finish an action.
- Smoothness. This area evaluates graphic transition smoothness with maximal frame time, frame time variance, FPS, and frame drop rate, etc. As we have discussed, FPS alone cannot tell all the user experience regarding to smoothness.
Figure 3. Browser use scenarios with measurement areas for each scenario
In order for the workload suite to be representative for both tablet and smartphone, we have investigated their usage differences. Unsurprisingly, the key difference is their size difference. A smartphone is usually used as a handy gadget as a Swiss Army®-knife, with following features:
- Phone, in voice and video
- Music player, for music and podcast
- Camera, for shooting photo and video, barcode scanner, face recognition
- Navigation, with GPS, AGPS, compass, etc.
- Communicator, for chatting over text and multimedia
- Book/News reader
- Other utilities, like flashlight, night vision, etc.
- More realistic view experience in graphics and actions
- Easier or more controls through touch/sensors, such as virtual controllers in games, rich editor, and handwriting
- Bigger space to put more contents for news, education, eBook, etc.
- Support more than one players with games, interactive educations
- PC-experience web access for browser and other info portal such as RSS reader
- More small utilities apps for daily use on user's desk for convenient access. As a comparison, smartphones are used more in the user's pocket
At the same time, some scenarios exist on both form factors but should have design variants for each. For example, a 2D game has more animated sprites in its tablet profile then in smartphone. A browser scenario can load PC web page in tablet, while using mobile web pages in the smartphone.
Workloads construction for Android user interaction evaluation
With the use scenarios and measurement areas defined, we can construct workloads to reflect the interesting scenarios and to measure the user experience.Before really developing the workloads, we should understand the relation between workloads and tools. Workloads characterize the representative usage model of the system, while tools analyze the system behavior. A tool itself does not represent a use case of the device, but analyzes the use case. At the same time, the common part of multiple workloads can be abstracted into a tool so as to be reused across the workloads. Below Figure 4 shows the various kinds of workloads.
The top half of Figure 4 shows the common user interaction scenarios in an Android system. The "Input" triggers the execution of "Activity 1", which in turn invokes "Service 1". Then "Service 1" communicates with "Service 2", which launches "Activity 2". The "Output" is extracted from "Activity 2".
In the bottom half of Figure 4 are displayed the four situations of how we measure the system.
- Standalone workload. It runs as a complete workload without user giving inputs, and outputs the result when the execution finishes
- Micro workload. It only tresses certain execution paths of the stack, is not a complete application of the platform
- Measurement tool. It allows the engineer to provide inputs, and then returns the metrics results. It is actually a tool that can process different inputs
- Scenario driver of built-in app. It provides inputs to triggers and extracts outputs from the built-in applications. One usage scenario is to provide standard inputs to different devices to measure their browser user experience
Figure 4. Relation Between workloads and toolkit
The workloads we construct for Android user experience evaluation have all four different kinds for their different purposes. Mostly we expect to include in our final workload suite kind 1 and kind 4, because the kind 1 workloads are easy to use for white-box investigation, and the kind 4 workloads are useful to investigate various devices as black-boxes.
Once we decide upon a workload and its scenarios, we need to have a good understanding of the Android software stack for every scenario, and then choose the right metrics for every scenario.
Since our goal is to provide an engineering tool for engineers to evaluate and optimize Android user experience, we expect our evaluation methodology to be objective. We set up the following criteria f our measurement of user experience.
- Perceivable. The metric has to be perceivable by a human being. Otherwise, it is irrelevant to the user experience.
- Measureable. The metric should be measurable by different teams. It should not depend on certain special infrastructure that can only be measured by certain teams.
- Repeatable. The measured result should be repeatable in different measurements. Large deviations in the measurement mean that it is a bad metric.
- Comparable. The measured data should be comparable across different systems. Software engineers can use the metric to compare the different systems.
- Reasonable. The metric should help reason the causality of software stack behavior. In other words, the metric should be mapped to the software behavior, and can be computed based on software stack execution.
- Verifiable. The metric can be used to verify an optimization. The measured result before and after the optimization should reflect the change of the user experience.
- Automatable. For software engineering purpose, we expect the metric can be measured largely unattended. This is especially useful in regression test or pre-commit test. This criterion is not strictly required though, because it is not directly related to user experience analysis and optimization.
Figure 5. Use scenarios of video playing workload
Workload construction case study
We use browser a scrolling scenario to discuss the workload construction process. Figure 6 shows the interactions when a user scrolls a page top-down in a browser.Figure 6. User interactions in scrolling a page in a browser
As shown in Figure 6, at time T0, the finger presses on the screen and starts to scroll the page from position P0. When the finger reaches position P1 at time T1, the page content starts to move after the finger scrolling. When the page content reaches position P1 at time T2, the finger has moved to position P2. That is, during the finger movement, there is a lag distance between the page content and the finger. At time T3, the finger releases from the screen, and the page content finally reaches the same position where the finger releases.
In this scenario, we choose to measure following three aspects:
- Response time - How fast the content starts to move as the response to finger scrolling
- Lag distance - How far the content movement lags behind finger
- Smoothness – How smooth the browser animates the scrolling
Figure 7. Android software stack internals for scrolling
The figure has three rows for input raw events, browser events, and browser drawing respectively. The screen touch sensor detects the touch operation and generates raw events to the system. When the framework receives the raw events, they transform them into move events, such as ACTION_DOWN, ACTION_UP and ACTION_MOVE. Every event has a coordinate (X, Y) pair associated. The browser computes the distance between the current move event and the starting position (the coordinate of the ACTION_DOWN event). If the distance reaches a platform specified threshold value, the browser considers the move events as part of a scroll gesture, and starts to scroll the page content accordingly. The browser scrolls the page content by drawing new frames with moved position on the screen. The user then can see continuous scrolling of the page content.
Figure 8 shows how we measure the responsiveness of browser scrolling. The response time of browser scrolling is the time from the first raw event delivered to the first scrolling frame is drawn.
Figure 8. Browser scrolling response time measurement
To measure the smoothness of page scrolling, we log all the scrolling frames' time when they are drawn, as shown in Figure 9. We then compute the maximal frame time, number of frames longer than 30ms, FPS, and frame time variance to represent the smoothness.
Figure 9. Browser scrolling smoothness measurement
One tricky thing in smoothness measurement is determining which frames are the scrolling frames. It is easy to determine the first frame. The difficulty is to determine the last frame. When the finger releases from the screen, there are still a few frames drawn as a result of the scrolling momentum (unless the finger moves very slowly.) We can count the last frame using the frame right before the finger releasing, or using the real last frame when the browser re-renders the page. To simplify the design, we choose the former approach, based on an assumption that the smoothness situation before the finger releasing is adequate to reflect the entire scrolling process smoothness.
To measure the lag distance between the fingertip and the page content, we logged the coordinate values and the timestamps of all the raw events and the drawn frames. We can compute the maximal distance between a frame and those events happening before the frame is drawn. We denote it as Distance[k] for frame k. Then for the entire scrolling process, we compute the lag distance as the maximal of Distance[k] for all frames. The approach is illustrated in Figure 10.
Figure 10. Browser scrolling lag distance measurement
In order to make the measurement repeatable, we use Android UXtune* toolkit to generate the standard input gesture.
For different purposes, we have two different versions developed for browser scrolling measurement. One is a standalone workload that has a browser packaged in the workload together with the scenario driver to trigger the automatic execution and measurement. The other has only the scenario driver that triggers the device built-in browser to execute for the measurement. The former one can be used to compare the Android framework of different devices, while the latter one is mainly used to compare the device built-in browsers.
Android Workload Suite (AWS) and user experience optimization
Based on the methodology described in preceding sections, we develop Android Workload Suite (AWS) to drive and validate our Android user experience optimizations.Table 1 shows the AWS 2.0 workloads. The use cases were selected based on our extensive survey in the mobile device industry, market applications, and user feedbacks.
Table 1. Android workload suite (AWS) v2.0
We have established a systematic methodology for Android user experience optimization. It includes following steps.
Step 1. Receive the user experience sightings/issues from users, or identify the interaction issues with manual operations or workload evaluations
Step 2. Define the software stack scenarios and metrics that transform the user experience issue into a software symptom
Step 3. Develop a software workload to reproduce the issue in a measureable and repeatable way. The workload reports the metric values that reflect the user experience issue
Step 4. Use the workload and related tools to analyze and optimize the software stack. The workload also verifies the optimization
Step 5. Get feedback from the users and try more applications with the optimization to confirm the user experience improvement
For step 3, we basically rely on Android workload suite (AWS). For step 4, we have developed Android UXtune toolkit to assists user interaction analysis in the software stack.Step 2. Define the software stack scenarios and metrics that transform the user experience issue into a software symptom
Step 3. Develop a software workload to reproduce the issue in a measureable and repeatable way. The workload reports the metric values that reflect the user experience issue
Step 4. Use the workload and related tools to analyze and optimize the software stack. The workload also verifies the optimization
Step 5. Get feedback from the users and try more applications with the optimization to confirm the user experience improvement
AWS is still evolving based on user feedbacks and Android platform changes.
Additional Resources
Some online public websites have useful information on user interactions and experience.- http://ux.stackexchange.com/
- http://www.useit.com/papers/responsetime.html
- http://www.measuringux.com/
Acknowledgements
The author thanks his colleagues Greg Zhu and Ke Chen for their great supports in developing the methodology for Android user experience optimizations.About the Author
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (0) 
Trackbacks (0)
Leave a comment 
To obtain technical support, please go to Software Support.
