by Carla Brossa
The revolution of mobile platforms
The earliest attempt I know of porting a 3D engine to a real phone was that of Superscape, back in the very early 2000s. They were working with a number of OEMs to try to make their Swerve engine run on an ARM7. Those phones’ CPUs ran at about 40 MHz and included no cache. The content they could run on those devices was a maximum of 40 polygons, flat-shaded, with no texture and no z-buffer. It was a challenge for any artist! By comparison, early smartphones like the Nokia 7650 were super-fast, with an ARM9 running at 100 MHz, and cache. But that was more than ten years ago.
The evolution of mobile platforms since then has been spectacular. The first 3D games on phones had very little in common with what we now see on Android devices. One of the triggers of this giant leap was certainly the integration of dedicated graphics hardware into mobile SoCs (System-on-Chip). Along with many other architecture improvements, it powered a huge boost in the triangle throughput capability, from a few hundreds to hundreds of thousands, and an increase of two orders of magnitude in the pixel count. This has more recently allowed developers to finally create console quality games for mobile devices.
Yet, game creators are hungry consumers of resources and have the bad habit of pushing the technology to its limits. That is why many challenges nowadays are very similar to those of the past. In many ways, mobile platforms are almost on par with the current generation of consoles, but they are still way behind modern gaming PCs, and they also have some particularities that one should know about before diving into developing mobile games.
Energy efficiency is still the main constraint that limits the overall processing power of mobile devices, and will continue to be so in the foreseeable future. Memory is also limited—although this has improved enormously in the past few years—and shared with other processes running in the background. Bandwidth is, as always, a very precious resource in a unified architecture and must be used wisely or it could lead to a dramatic drop in performance. In addition, the variety of devices, processing power, display sizes, input methods, and flavors in general is something that mobile developers have to deal with on a daily basis.
Here comes Anarchy!
At Havok we have been trying to make life a bit easier for Android developers by handling most of these challenges ourselves with Project Anarchy.
We have recently announced the release of this toolset made up of Havok’s Vision Engine, Physics, AI, and Animation Studio; components of which have been used to build multiple games like Modern Combat 4, Halo* 4, Skyrim*, Orcs Must Die, and Guild Wars 2 to name a few. Project Anarchy optimizes these technologies for mobile platforms, bundles them together along with exporters for Autodesk’s 3ds Max* and Maya* and a full WYSIWYG editor, and allows users to download a complete toolkit for development on iOS*, Android (ARM and x86), and Tizen*.
Figure 1. "A screenshot of the RPG demo included in Project Anarchy, is an example of content that runs on current Android platforms."
Vision goes mobile
As one would expect, the tool that required the most work to be ported to Android was our 3D game engine. The Vision Engine is a scalable and efficient multi-platform runtime technology, suited for all types of games, and capable of rendering complex scenes at smooth frame rates on PCs and consoles. Now the Vision Engine had to perform at similar standards on mobile platforms. And as important as that, we wanted to provide the same toolset as for any other platform, but streamlined specifically to address the challenges associated with development on mobile platforms.
Having worked with consoles such as Xbox 360*, PlayStation* 3, and PlayStation Vita*, we were already familiar with low memory environments, and we had optimized our engine and libraries for those kinds of constrained environments. But moving to mobile meant having to make other optimizations, and the specifics of mobile platforms required us to think of some new tricks to make things run nicely with limited resources. Several optimizations had to be made to reduce the number of drawcalls, the bandwidth usage, the shader complexity, etc.
A few rendering tricks
For example, additional rendering passes and translucency are expensive. That is why we had to simplify our dynamic lighting techniques. The optimization we used here was to collapse one dynamic light—the one that affects the scene the most and would thus have produced the highest overdraw—into one single pass with the static lights. As there is often one dominant dynamic light source in a scene, this greatly helped performance by reducing drawcall count and bandwidth requirements. In addition, we also offer vertex lighting as a cheap alternative, but pixel lighting will still be required for normal maps.
Vision also supports pre-baked local and global illumination, which is stored in lightmaps (for static geometry) and what we call a lightgrid (used for applying pre-computed lighting contributions to dynamic objects). In a lightgrid, you have a 3D grid laid out in the scene that stores the incoming light from six directions in each cell. On mobile devices, we can optionally use a simpler representation to improve performance. This representation only stores light from one primary direction along with an ambient value. The lighting results do not achieve the same visual fidelity, but they are usually good enough and very fast.
Figure 2. "The difference in the lighting results when using a normal lightgrid versus a simple lightgrid."
As mobile GPUs often have limited resources for complex arithmetic operations, evaluating exponential functions for specular lighting could also become a serious bottleneck in terms of frame rate. To avoid this, we pre-bake cubemaps in our scene editor that accumulate lighting information from all surrounding light sources. While diffuse lighting is computed as usual, we approximate specular highlights by sampling from the generated cubemap and adjusting the intensity to account for local occlusion. This allows us to approximate an arbitrary number of specular highlights at the cost of a single texture lookup, while still getting a very convincing effect.
Shadow mapping was another feature that needed some tweaking. Instead of using a deferred shadow mask as we do on PCs (i.e., performing the depth comparison in a full-screen post-processing pass and then using the resulting texture to modulate the dynamic lighting), we fetch the shadow map directly during the lighting pass to save memory bandwidth. Furthermore, as texture sampling is relatively expensive on mobile devices, we limited our shadow maps to a single sample comparison instead of percentage-closer filtering. As a result, the shadows have hard edges, which is generally acceptable if shadow casting is restricted to a relatively small area. We currently support shadow maps for directional and spot lights, but we chose not to support shadow maps for point lights on mobile platforms for now, as the tetrahedron shadow mapping technique we use on PCs and consoles would be prohibitively expensive. Shadow mapping on mobile is also recommended to be used only in small areas, and to have few objects casting shadows, like the players and maybe a few enemies for example.
We also spent some time in making volumetric effects (volumetric lights, fog volumes, sun shafts) run smoothly on mobile. These techniques typically require rendering multiple transparent passes, performing multiple texture sampling operations per pixel, or computing integrals—each of which is prohibitively expensive on mobiles. As a result, we ended up going down a different route. On mobile platforms, our volumes are actually made of a low-poly mesh consisting of a few layers, like an onion, which a shader will fade out as the camera approaches. The trick here consists of collapsing the geometry to lines as soon as the transparency is so low that you can’t actually see the geometry anymore. These degenerated triangles will not be rasterized and so the pixel fill-rate is significantly decreased and reasonable performance is achieved.
Figure 3. "An example of shadow maps and volumetric effects running on Android*"
Terrains also required some modifications for mobile. On PCs and consoles we use height-field based terrains with dynamic geometry mipmapping, along with runtime texture blending, and three-way mapping to avoid texture stretching on steep slopes. As a result, the vertex counts are relatively high, and the bandwidth requirements resulting from mixing multiple detail textures are substantial. To make Vision terrains work on mobile platforms, we allow generating optimized static meshes from heightmaps and baking down the textures into a single map per terrain sector. As a consequence, we can’t render truly huge worlds with runtime-modifiable terrain, but this limitation is typically acceptable on mobile.
Another convenient feature that we added to Vision to improve performance of pixel-heavy scenes on devices with very high resolution displays is an option for upscaling. This is done by rendering the scene into a low resolution off-screen target and upscaling it to the display resolution in a separate step. On the other hand, to maintain high visual quality, UI elements and text are still rendered at the display full resolution. This works quite well on devices with resolutions higher than 300 dpi, and can yield substantial performance gains.
Shader authoring considering mobile GPU oddities
All our existing shaders in the Vision Engine are written in HLSL. So, the first obvious problem when targeting OpenGL* ES platforms is that shaders require GLSL. To make cross-platform development as easy as possible, we designed a system in which shaders only need to be written once, in HLSL/Cg, and they are automatically translated to GLSL by vForge, our scene editor, when they are compiled.
The second concern when writing shaders for mobile is how different the hardware architecture is from other more traditional platforms. For a start, to save space and power, all mobile SoCs have unified memory. System RAM is shared between the CPU and GPU; it is limited, and typically slower. Therefore, our aim is to avoid touching RAM as much as possible. For example, minimizing the vertex size and the number of texture fetches is generally a good idea.
Another big difference is that most mobile GPUs, such as the PowerVR* GPUs used in Intel® Atom™ systems, use tile-based deferred rendering. The GPU divides the framebuffer into tiles (16x16, 32x32), defers the rendering until the end, and then processes all drawcalls for each tile—one tile fits entirely inside one GPU core. This technique is very efficient because pixel values are computed using on-chip memory, requiring less memory bandwidth and less power than traditional rendering techniques, which is ideal for mobile devices. , An additional benefit of this approach is that, as it just involves comparing some GPU registers, depth and stencil testing is very cheap. Also, as only the resolved data is copied to RAM, there is no bandwidth cost for alpha blending, and MSAA is cheap and uses less memory.
In tile-based architecture, color/depth/stencil buffers are copied from RAM to tile memory at the beginning of the scene (restore) and copied back to RAM at the end of the scene (resolve). These buffers are kept in memory so that their contents can be used again in the future. In many applications, these buffers are cleared at the start of the rendering process. If so, the effort to load or store them is wasted. That is why in Vision we use the EXT_discard_framebuffers extension to discard buffer contents that will not be used in subsequent operations. For the same reason, it is also a good idea to minimize switching between render targets.
We also want to avoid dependent texture reads in the pixel shader, as they make texture prefetching useless. When dependent texture reads are performed by the shader execution units, the thread will be suspended and a new texture fetch task will be issued. To prevent this, we do not do any mathematical operations on texture coordinates in the pixel shader.
Dynamic branching in our shaders is also something that we want to avoid, as it causes a pipeline flush that ruins performance. Our solution for this is a shader provider that will select the particular shader permutation for a specific material depending on its properties and thus avoid branching. Also, to reduce the runtime memory consumption we store these shaders in a compressed format and only decompress them when they are actually needed.
It is also important to take into account the precision used in mathematical operations in shaders, as reducing the precision can substantially improve performance. Therefore, it is recommended to always use the minimum acceptable precision to achieve any particular effect.
Figure 4. "An example of usage of lightweight mobile shaders in Vision: a glowing emissive texture and a specular cubemap that gives a shiny effect to the rocks."
These are just general optimizations that should work on all Android platforms, but keep in mind that every device and every GPU has its oddities. So, a good piece of advice would be to always read the vendor-specific developer guidelines before targeting any platform.
A Lifetime headache
With incoming calls and messages and a thousand different events popping up at the most inappropriate time, application lifetime management on Android devices becomes a serious matter. The operating system can require applications to free up resources, for instance, when another application is launched and requires system resources. Similarly, the operating system can require your application to terminate at any time.
In Vision we handle unloading and restoring graphics resources (textures, GPU buffers, shaders) when the mobile app goes to the background. This is mandatory for Android because all OpenGL ES handles are invalidated as soon as the app goes to the background, but on other platforms it is also generally a good idea to free some memory to reduce the risk of the app being terminated by the operating system due to a low memory situation.
Also on Android, handling the OS events can be a tricky job, because the order in which they happen is not the same for different devices and/or manufacturers. So this requires implementing a robust internal state handler that depends on the exact order of events as little as possible. This means monitoring the running state of an app, checking if it has a window handle, and whether it is focused.
Figure 5. "Application lifetime management on Android devices becomes a serious matter."
Havok Physics, AI, and Animation Studio
The other products included in Project Anarchy—Havok Physics, AI, and Animation Studio—do not have any graphical parts in them. So, when we ported them to mobile, it was purely about CPU and memory optimization.
We already supported Linux*-based systems, so when we started on mobile, and since they have broadly similar compilers and system APIs to Linux environments, getting the code to work was relatively straightforward. The main effort after that was to make them fast. We worked closely with Intel to make sure our code was optimized for Intel® Streaming SIMD Extensions (Intel® SSE). The compiler can make a large difference in some areas of code, and we see on-going increases in performance from newer compiler revisions as the platform SDKs mature.
The second prong of attack was multithreading. Given that most mobile CPUs are now multicore, we took our code, already well optimized for multithreaded environments on PCs and consoles, and thoroughly profiled it on mobile platforms to ensure that it was efficiently multithreaded on our target systems.
Finally, we had to make sure our code stayed cache efficient, given that memory speeds on mobile are relatively low. This is not a problem specific to mobile, so our existing optimizations to reduce cache misses ported over well.
From painful to painless workflow
The development workflow on mobile platforms has always been known to be somehow painful, especially when developing for multiple platforms and having to care about porting assets to different formats to match the requirements on each device (i.e., different texture sizes, file formats, compression methods). On top of this, files are usually required to be bundled together with the application package, which means that for each asset change—textures, sounds, models—the package has to be rebuilt and uploaded to the device. For larger projects the build time of the packages, and the upload and install times, can become prohibitively long and slow down development due to lengthy iteration cycles.
Figure 6. "Screenshot of the RPG demo content in the scene editor vForge during development"
Managing and previewing assets
To make this process easier and faster, we decided to implement a few custom tools. The first one is an asset management system that has an easy to use asset browser integrated with our scene editor vForge. The asset management system provides automatic asset transformation capabilities and can convert textures from their source format (i.e., PNG, TGA) to a platform-specific format (i.e., DDS, PVR, ETC). As a result, developers do not have to think about which texture formats are supported on which platform. The actual conversion is automatically performed in vForge, but developers can also configure each asset individually to allow precise tweaking if needed, or even hook in their own external tool to do custom transformations on any type of asset (i.e., reducing the number of vertices of models).
We also added a material template editor in vForge that allows specifying platform-dependent shader assignments. This makes it possible to have different shaders, optimized for different platforms, configure them once and use them on every material that should use the same configuration.
All scenes can be previewed in vForge using device-specific resources and shaders instead of the source assets, thus allowing the artists to quickly see how the scene will look on the target device without having to deploy it.
Figure 7. "The asset management system includes an easy to use asset browser integrated with the scene editor, with automatic asset transformation capabilities."
The magically mutating assets
The second tool we implemented to enable faster turnaround times is an HTTP-based file serving system that allows an application running on a mobile device to stream in data from a host PC. This is extremely useful during development cycles because—together with the vForge preview—it completely removes the need for re-packaging and re-deploying the application every time an asset is modified.
Behind the scenes, the file server will cache downloaded files on the device and only re-download them when they have changed on the host PC, allowing for very fast iteration times, as only changed scenes, textures, etc. are transferred. In most cases it isn't even necessary to restart the application on the device to update resources, as almost all resource types can be dynamically reloaded inside a running application.
As a side effect, creating and deploying application packages is usually much faster when using this tool, as packages will only have to contain the compiled executable code—even scripts can be transferred over the file server connection. This allows for much faster iteration times, given that executables are typically very small in comparison with the associated scene data.
Handling the input remotely
Another tool we created to shorten turnaround times is what we’ve called “Remote Input.” It is actually a very simple idea, consisting of an HTML5-based web app that forwards inputs from a mobile device to the game running on a PC. Touch input events, as well as device acceleration and orientation data, are simply forwarded from the web browser on your mobile to the PC version of your application, or even to a scene running inside vForge. It can be used to rapidly prototype and test multi-touch input in your game without having to deploy it to a mobile device.
OpenGL ES 3.0 and the future
Some of the limitations in the techniques explained in this article may not be necessary in the near future. As smartphones and tablets get more and more powerful, the restrictions will be relaxed. But game features will advance and continue to push mobile hardware to its limits, as they have been doing for the past fifteen years.
New devices will offer more CPU and GPU cores, making it even more necessary to exploit the wonders of multithreaded computing. Longer term, we will probably get closer in performance and capabilities to current generation PCs, but there will still be some gotchas and caveats to watch out for on mobile, like the limited memory bandwidth.
The new APIs that are right there on your doorstep also offer a broad fan of new, exciting, and challenging possibilities. We already have a few devices out in the wild with cores and drivers fully conformant with OpenGL ES 3.0 (supported from Android 4.3 Jelly Bean). Some of the new features include occlusion queries (already in use on PCs and consoles), transform feedback (enabling features like GPU skinning with very high bone counts), instancing (extremely useful to reduce drawcall count and therefore CPU load), multiple render targets (to facilitate deferred rendering and post-processing effects), a bunch of new texture formats, and many other cool features. On the other hand, we will also be able to start moving some of the computational work over to the GPU thanks to OpenCL*, which is just emerging on mobile. We already have full GPU-driven physics simulations on the PlayStation 4, but this is an open R&D area for us in the mobile arena and will certainly be very exciting to explore.
About the author
Carla is a Developer Relations Engineer at Havok, responsible for helping developers to make better games with the Vision Engine. She has been working in the mobile 3D graphics arena since 2004. She started at RTZ interactive, a small company in Barcelona, developing 3D games for Java and Brew phones. A few years later, she moved over to developing games for the iPhone. Prior to joining Havok, she spent a couple of years at ARM working on the OpenGL ES drivers for the Mali-T600 series of GPUs.
Intel, the Intel logo, and Atom are trademarks of Intel Corporation in the U.S. and/or other countries.
Copyright © 2013 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
OpenCL and the OpenCL logo are trademarks of Apple Inc and are used by permission by Khronos.