Mobile applications can behave differently between emulator and device and, as an app grows more and more complex, debugging performance bottlenecks can become extremely difficult. The GPA System Analyzer is a tool that can help diagnose a variety of performance issues.
Building Your App for Remote Debugging
To allow the GPA System Analyzer to connect to your app, you need to ensure that it has INTERNET permission and that debuggable is set to “true.”
The Art Browser sample app is a bare-bones sample that encapsulates certain aspects of larger real-world applications. It’s an extremely simple image carousel, such as you might find in a media player or picture browsing app. It has two buttons that you would not find in a production application, which are useful for recreating common performance issues:
Build and install the Art Browser sample app on your device or emulator. Then make sure your Android development tool (Eclipse, etc.) is not running, as this will cause problems connecting to your app. Start the GPA System Analyzer tool. It will list your local machine, as well as any running emulators or attached devices.
Make sure your device or emulator is unlocked and USB Debugging is enabled. Click the “Connect” button to connect to it. GPA will show a list of all debugging-enabled apps.
GPA has a rich variety of data it can monitor, but for this app we’ll be most interested in the frame rate and CPU load. Drag CPU → Aggregated CPU Load from the left sidebar into the upper graph and drag OpenGL → FPS into the lower graph.
To demonstrate a CPU-bound situation, we’ll need to enable our Complex Math Calculation via the button in the app. We can immediately see that it is consuming 100% of the processor. This explains the frame rate of near 0 FPS. Clicking the button again to turn off the heavy-duty math improves the situation a bit, bringing the aggregated CPU load below 30% and raising our FPS to a marginally-usable 10.
We’ve disabled the only CPU-intensive pieces of code, but our performance is still relatively poor. Our only option now is to optimize the OpenGL rendering. Graphics bottlenecks can be more difficult to untangle than CPU bottlenecks, since the OpenGL graphics pipeline is a complex process, and there is not always a single metric that will reveal a problem. Fortunately, GPA comes with a rich set of OpenGL optimizing tools which consist of checkboxes that turn or replace different off parts of the OpenGL rendering pipeline.
The easiest way to determine if your app is GPU-bound with an OpenGL bottleneck is to use the Disable Draw Calls state override. This will turn off any operations that have been sent to the GPU. If using this override doesn’t improve performance, we know our problem is CPU-related. However, if FPS climbs significantly, we definitely have an OpenGL bottleneck.
As you can see, the FPS graph shot up as a result, so we know our app is GPU-bound. We can see if perhaps our high-resolution textures are causing an issue by disabling all state overrides and then using the Texture 2x2 override.
This effected little change. We can then try using the Simple Fragment Shader override to see if our shader code is too complex.
Again, not the gains we were looking for. We can test for overly-complex geometry by comparing the TA Load metric with USSE Vertex Load metric. Drag the GPU → TA Load metric to the top graph, then hold CTRL and drag the GPU → USSE Vertex Load metric to the top graph as well, to let it graph beside the TA Load. Somewhat reverse of what you might be expecting, a high TA Load with a low Vertex Load indicates too many vertices are being processed.
Clearly this is an issue, since TA Load is an order of magnitude higher. However, notice that it’s still hovering under 50%. It’s worth noting again that even severe graphics bottlenecks may not send any one metric to 100%.
Using the Geometry Complexity spinner in our app, we can simplify our geometry to a 2x2 grid.
This gives us an immediate FPS boost, and TA Load becomes more balanced with USSE Vertex Load. We can also try 8x8 and 32x32 if we want to find the sweet spot between performance and depth sorting. Now, the app is ready for primetime!
Note: The application is tested and the results are analyzed on Intel Atom processor Z2760 tablets.
Although performance issues can be difficult to debug on commercial-scale apps, the GPA System Analyzer can be a huge asset in investigating complex performance bottlenecks. For more information and complete documentation, check out the GPA System Analyzer homepage on Intel.com.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserverd for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804