CPU particles in Unreal Engine* 4 have many features above and beyond GPU particles including accurate collision with level geometry and allowing bouncing off floors and walls. They can also light what’s around them, creating more realistic effects as they move around the level. CPU particles also influence other objects, such as using gravity to move things in the scene. Glorp was created to show how to best make use of CPU particles in a game scene.
The latest release of Glorp can be found on GitHub*, GameTechDev Glorp. It is built for a special version of Unreal 4.21 that includes the code changes discussed below. The changes for 4.21 can be found published as a pull request on GitHub*. If the pull request cannot be viewed, be sure to get access to the Unreal Engine 4* source code at UE4 On GitHub*
Figure 1. Glorp_Main Map
Glorp is a small, 1 to 4 player game that demonstrates how to get the most out of CPU particle systems with Unreal Engine 4. Within Glorp, there are three scenes to help with this. The first, Glorp_Main, features a small game that utilizes CPU particle systems to show off all of the optimizations covered in this guide. The other two maps, GlorpDynamicTest and GlorpStressTestLit, are for testing the optimizations and pushing the system as far as it will go.
Figure 2. GlorpStressTest Map
The GlorpStressTest Map provides a static representation of the particle system used for the players in the Glorp_Main map.
Figure 3. GlorpDynamicTest Map
The GlorpDynamicTest map provides a method of stress testing the system Glorp is running on and displays the number of particles it can support. This is done with several parameters, namely whether CPU or GPU particles are used and the percentage of lit particles.
One of the most GPU heavy features of CPU particles are lit particles, so it’s important to not use more than needed. The simplest and most effective way to reduce GPU time spent rendering lit particles is to lower the lit percentage. In many particle systems, this won’t change the appearance of the system by much and will speed up the scene immensely. In particle systems where a flickering or pulsating light is desired, such as a flame, having a low lit percentage will help create that effect.
Usually a particle system that is 20% lit will appear full of moving lights, but experimentation is needed to find the right percentage of lit particles for the desired effect.
Figure 4. A Glorp particle system open in the Unreal* editor
In the Cascade particle editor in Unreal Engine, set or change the percentage of lit particles by double clicking the asset in the content browser. In the center section of the Cascade editor, locate the emitter. In the list of modules for that emitter, find and click on the Light module. (Highlighted in orange in figure 4 above). If the emitter doesn’t have a light module, we can add one by right clicking in the blank area under the list of modules and selecting “Light” in the “Light” section.
Figure 5. The Details panel of the "Light" module
Once the Light module is selected, find the "Spawn Fraction" option under "Light" in the details panel in the lower right of the screen. This will control the percentage of lit particles spawned on a scale between zero and one.
Unreal Engine 4 provides the ability to detect the capabilities of the hardware the game is running on and scale the visual quality accordingly. Add this by simply calling three blueprint nodes on setup: Get Game User Settings, Run Hardware Benchmark, and Apply Hardware Benchmark results.
Figure 6. Running a benchmark in Blueprints
Once the function is created and settings adjusted, connect the three nodes together as pictured in Figure 6 above. The first of these nodes, Get Game Users Settings, retrieves the user settings object that the other two nodes need to target. When calling the Run Hardware Benchmark node, Unreal Engine will run a quick benchmark to generate CPU and GPU benchmark data. Be careful not to run the benchmark while the game is active as it will cause a hitch in rendering while the data is gathered. After this is completed, Unreal Engine will have generated the scalability settings it believes are appropriate for the hardware it’s running on.
The Apply Hardware Benchmark Results node will apply the scalability setting to the game, scaling both graphics quality settings and graphics effects such as particle systems. Note that for a particle emitter to be affected by this, the "Apply Global Spawn Rate Scale" option in the Spawn module of said emitter must remain checked.
Many complex games run into a problem where the render thread starts becoming the bottleneck as the number of objects on screen rises and Glorp is no exception. To lessen the load on the render thread and improve overall system throughput a dedicated DirectX* 11 API submission thread (RHI thread) was added.
The PhysX* library suffers from a sub-optimal locking strategy and excessive flushing when the Unreal Engine submits ray->triangle intersection tests in a threaded manner during particle collision. Pre-optimization, all intersection tests were serialized through a single lock and a physics scene flush was being done before each read operation, impacting performance on a well threaded app like Glorp.
To solve the lock issue, the scene locks were separated into reads and writes. Writes wait for reads to finish and reads wait for writes. In most cases, Unreal Engine only queries the physics scene after writes are finished. This provides a big boost in performance and eliminates the serialization problem.
To reduce the flushing issue, a scene flush is added after each write is finished and the flush before each read is removed. As with the lock issue, this works because Unreal Engine typically only reads from the physics scene after writes are finished and also because there are many less writes than reads.
By default, Unreal Engine updates particle systems on the game thread, performing operations such as collision and spawning in serial. While this is a simpler approach than updating each particle system in parallel, it doesn’t utilize the multi-core functionality of modern CPUs very well. For this sample, the engine code was modified to allow the update, or tick, of each particle system to happen in parallel. This gives a big boost, especially in maps that contain a high number of particle systems.
This optimization can be toggled on or off by setting the console variable "FX.AllowAsyncTick" to zero or one.
Just like with updating particle systems, Unreal Engine will check the collision of each particle of a system in serial. Keeping in mind that particles within a single system do not collide with each other, by adding an optimization where sufficiently large systems are split into smaller pieces the collision results will be calculated in parallel. Doing this can allow many times more colliding particles in one system to be used at once.
This optimization can be toggled on or off by setting the console variable "FX.ParallelParticleCollision" to zero or one.
Figure 7. Before Optimizations
Figure 8. After Optimizations
With the above optimizations in place, we can now get up to 3.6 times more particles as we could before without dipping under 60 frames per second.
Table 1. Particle Improvements Measured
|CPU||Core Count||Clock Rate||Particles Before||Particles After||Improvement|
|i7-6950X||10||3.0 GHz||37,000||133,000||3.6 times|
|i7-7820HK||4||2.9 GHz||74,000||103,000||1.4 times|
|i7-8550U||4||2.0 GHz||22,000||35,000||1.5 times|
|i5-7300U||2||2.7 GHz||13,000||14,000||1.08 times|
Figure 9. Particle Improvements Visualized
Glorp was designed to showcase how to take advantage of the features in Unreal Engine 4 to manipulate particles for more realistic game effects. By understanding lit particles, scaling for different hardware, and by using the provided performance patches a developer can greatly enhance the look of their game with particles that collide, bounce, and interact with other objects and more.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804