| February 3, 2010 11:00 PM PST | |
By Alex J. Champandard
The demo discussed in this article was implemented with Radu Cristea based on work by Jeremy Tryba. Thanks to Björn Knafla for his help and Damian Isla for his insights.
Making the most of the computation you have available is always important, but when multiple threads working in parallel are available, improving efficiency pays off even more. For game characters, by designing your artificial intelligence (AI) system cleverly, you can get even more information to make better decisions and reduce reaction times to avoid embarrassing pauses.
In the previous article in this series, you saw how to leverage multi-threading at a mid-level (that is, relying on Intel® Threading Building Blocks [Intel® TBB]) using parallel line-of-sight (LoS) checks to speed up the process of acquiring information. This article looks into the design of sensory reasoning at a higher level to provide more reliable performance and reduce worst-case scenarios.
In this article, you'll learn how to:
Of course, there's certainly room for additional low-level optimizations:
To give you an idea of how much room for improvement there is, if you add low-level parallel queries to your system, you're still most likely facing the following problems. These statistics are based on the same Hide & Seek scenario from the AI Sandbox [1] used in the previous article (see Figure 1).

Figure 1. Hide & Seek mini-game in the AI Sandbox. A single patrolling agent (in red) is moving around the world in a fixed patrol route, while the hiders (in yellow) are fleeing to cover, because their previous hiding location was invalidated. While hiding, dozens of LoS calculations are evaluated for nearby cover points.
Such an architecture has multiple benefits:

Figure 2. The sensory and reasoning systems within the AI architecture, with the sensory system discussed in the first article [1] in the box on the left. The new reasoning system stands in the middle and acts as an intermediate between the behavior (or high-level AI) on the right.
Here's an overview of how the system shown in Figure 2 works.
Figure 3. The reasoning component acquires information via asynchronous requests from the sensory component, and stores the results internally.
Figure 4. When the behavior system requires information, direct synchronous calls are made to query the information that's already available.
The most robust approach is to use a round-robin solution, which goes through requesting LoS information for all known cover locations in turn. Each frame, however, would have a limit on the maximum number of queries that could be made. Here is an example of the main loop split into the three phases: pre-update (sequential), update (parallel), and postupdate (sequential):
This algorithm is by far the simplest you could implement here, and it's a great default implementation. There are many options for dynamically prioritizing the LoS calculations, but having such a predictable scheduling algorithm running in the background can help guarantee a maximum worst-case response time. Expect to be using round robin at the base of your implementation for a long time to come. The following code provides an example of how the behavior can use this information:
Going further, many options for dynamic prioritization can help make the most of your parallel calculations while maintaining determinism. For example, use distance to the player and/or camera to boost the priority of the cover calculations and to prevent delays when the AI actors are nearby and particularly visible. Another option is to increase the importance of LoS calculations for cover locations that AI actors are currently using to provide information about invalidation sooner. Also, you can locally boost the priority of cover locations around actors that could potentially look for cover-for example, using a radius based on what the AI is interested in.
The downside of prioritization is that it does take two extra steps: one to calculate priorities and a second to sort the options before even making requests. However, if done well, the cost is negligible compared to being able to leverage the underlying multi-threaded LoS calculations.
Figure 5 and Figure 6 show statistics taken from 20 seconds of simulation of the Hide & Seek mini-game in the AI Sandbox [3]. In this configuration, two enemy guards were patrolling, and 12 civilian actors were taking cover and hiding. In this setup, there were 104 different cover locations for taking cover and roughly two dozen obstacles to consider.
What can you take away from these results? First, capping the maximum number of queries each frame performs rather well! Assuming a reasonable cap, you'll be able to process the majority of the queries in one frame. The downside of using capping is that there's an unpredictable lag in the queries, and that lag becomes worse when you need the information most-during spikes. The managed queries (via the reasoning component) guarantee a certain baseline performance, so the data is available at the very least within three frames in this scenario. Also, in the best case, it means that the data is available immediately without any extra changes to the structure of the main game loop. This extra reasoning layer assures a minimum number of LoS queries during otherwise empty frames, which is useful for maximizing performance of the parallel computation.

Figure 5. Number of sensory queries processed each frame using three different policies for comparison: in red, all queries are executed; in blue, there's a maximum number of queries per frame (in this case, 30); and in grey, an extra reasoning layer manages the queries (20 per frame).
Digging into the details from the simulation in Figure 5, five common patterns are worth noting:

Figure 6. Comparative analysis of the delay incurred by different query-management policies. In this case, it takes four frames to calculate LoS to relevant objects near the actors. The first results can be available the very same frame when the reasoning layer is used. In contrast, the bulk of the queries made lazily are processed 1-3 frames afterwards (assuming a cap of ~10 queries per frame).
In practice, the reasoning system is often programmed explicitly alongside the AI behavior to make sure it prepares information that closely matches what's going to be requested. This behavior is particularly useful, because you can get up and running very quickly once you've put your architecture in place, and it takes little extra code to start seeing the bene-fits of this approach. Similarly, the overhead of implementing the reasoning logic is insigni-ficant compared to the amount of time spent on the high-level AI and behavior.
The alternative-that is, building a system that can deal with arbitrary requests from the AI (including when it changes throughout development) and pre-caching and pre-empting its queries-isn't an easy task, as it requires tight integration with the high-level AI and a good understanding of how behaviors execute.
Also, it's important to note that the AI can still be given access to the low-level sensory system to make direct requests for one-off LoS queries. Of course, it's best to rely on the middle layer of abstraction provided for the bulk of the information that the AI requires (that is, over 90%).
In particular, this article showed how you can create a separate system responsible for managing LoS queries and maintaining things like cover suitability and validity on a per-faction basis. This kind of architecture is not only useful for eliminating redundant computation but also provides an opportunity for further improving the effectiveness of the computation by prioritizing calculations better-according to what the game demands.
[2] Blackboard Architectures and Knowledge Representation with Damian Isla; AiGameDev.com Masterclass, 2009.
[3] The AI Sandbox [http://aisandbox.com/].
The demo discussed in this article was implemented with Radu Cristea based on work by Jeremy Tryba. Thanks to Björn Knafla for his help and Damian Isla for his insights.
Making the most of the computation you have available is always important, but when multiple threads working in parallel are available, improving efficiency pays off even more. For game characters, by designing your artificial intelligence (AI) system cleverly, you can get even more information to make better decisions and reduce reaction times to avoid embarrassing pauses.
In the previous article in this series, you saw how to leverage multi-threading at a mid-level (that is, relying on Intel® Threading Building Blocks [Intel® TBB]) using parallel line-of-sight (LoS) checks to speed up the process of acquiring information. This article looks into the design of sensory reasoning at a higher level to provide more reliable performance and reduce worst-case scenarios.
In this article, you'll learn how to:
- Set up persistent queries for information that can be run on demand when time is available.
- Reduce redundant computation by better understanding what information is required.
- Prioritize queries to ensure that more time is spent on the information that is most important to the AI.
The Existing System
The system described in the previous article is flexible [1]: You can create arbitrary sensory queries and implement jobs that can run in parallel to provide information to the AI. By leveraging multi-core hardware, you can boost performance anywhere from 1.6× to 5× the performance for 4-8 threads.Of course, there's certainly room for additional low-level optimizations:
- Further batching up all the sub-jobs that use identical data to increase cache coherence.
- Removing the overhead of indirection for each sub-job when those jobs can be handled in batches.
To give you an idea of how much room for improvement there is, if you add low-level parallel queries to your system, you're still most likely facing the following problems. These statistics are based on the same Hide & Seek scenario from the AI Sandbox [1] used in the previous article (see Figure 1).
- 80% of the queries run by the system in the worst-case frames are redundant. With each actor working independently to find cover, each actor often requests the same LoS calculation as another actor looking for similar cover from the same threat.
- 27% of the frames don't have enough processing to be worth parallelizing. The way an AI actor behaves in a world means it won't continuously require information, because most of the time is simply spent following a course of action.
- 8% of the frames have too much processing to do, so certain calculations must be delayed. When actors wait until the last moment to request information for decisions, you get spikes. You can spread these computations out, but doing so causes unpredictable lags in the behavior.
Figure 1. Hide & Seek mini-game in the AI Sandbox. A single patrolling agent (in red) is moving around the world in a fixed patrol route, while the hiders (in yellow) are fleeing to cover, because their previous hiding location was invalidated. While hiding, dozens of LoS calculations are evaluated for nearby cover points.
A Sensory Reasoning System
The key is to create an extra layer of logic within the AI (see Figure 2)-called a reasoning system-based around a blackboard [2]. This layer contains code that runs independently of the high-level AI to gather information as well as memory that acts as a buffer for information from the sensory system.Such an architecture has multiple benefits:
- It avoids redundant computation by centralizing all of the independent queries of all the actors.
- It can set up sensory jobs in anticipation of the information being required before it's even needed to reduce delays.
- It caches recent information so that there's no need to cap the number of queries made each frame, particularly when lots of queries are made at the same time!
Figure 2. The sensory and reasoning systems within the AI architecture, with the sensory system discussed in the first article [1] in the box on the left. The new reasoning system stands in the middle and acts as an intermediate between the behavior (or high-level AI) on the right.
Here's an overview of how the system shown in Figure 2 works.
- In each frame, the reasoning system makes sensory queries to acquire information about the cover locations for which it's trying to determine the validity. These queries are based on the known threats-for instance, the Red actors patrolling in the world. See Figure 3.
- The sensory system returns delayed results for the LoS queries as processed by the ray test jobs that were described in the previous article [1]. As this information becomes available, the reasoning system processes it.
- The reasoning stores information about each cover location based on the threats. In this case, the cover validity is calculated per "faction," although it could be done individually (at the cost of more computation).
- When the high-level AI needs new cover, it can look up the cover suitability for its faction and process the information further to pick a final cover point. The reasoning system tries to guarantee that the cover validity remains as up to date as possi-ble. See Figure 4.
- As the AI actors hide in cover, they can ascertain the validity of any cover point simply by looking up the value in the reasoning system. In fact, this particular example is best employed using an event-driven notification rather than polling.
Figure 3. The reasoning component acquires information via asynchronous requests from the sensory component, and stores the results internally.
Figure 4. When the behavior system requires information, direct synchronous calls are made to query the information that's already available.
Parallel Implementation
How does the reasoning system manage the validity or suitability of cover locations in practice? What part of the code is responsible for distributing the computation over multiple frames? Can the system predict the requests of the AI actors to pre-cache information before it's even requested?The most robust approach is to use a round-robin solution, which goes through requesting LoS information for all known cover locations in turn. Each frame, however, would have a limit on the maximum number of queries that could be made. Here is an example of the main loop split into the three phases: pre-update (sequential), update (parallel), and postupdate (sequential):
// This sequential phase is run before the parallel Sensory update.
void ReasoningSystem::preUpdate()
{
size_t coverCount = m_CoverLocations.size();
// Loop through the pre-allocated queries. This is the maximum allowed
// number of queries this frame.
for (size_t i=0; i<m_Queries.size(); ++i)
{
// Pick the next location via round-robin.
CoverLocation& cover = m_CoverLocations[m_iCurrentIndex++ % coverCount];
// Setup a query for known enemies; only one used here.
CoverLineOfSightQuery& q = m_Queries[i];
Actor& enemy = findNearestEnemy(cover);
m_Query.setTargetPosition(enemy.getPosition());
m_Query.setSourcePosition(cover.getPosition());
m_Query.setCover(cover);
// This LoS calculation will be processed in parallel later.
m_pSensorySystem->addQuery(q);
}
} // preUpdate()
/** NOTE: The parallel SensorySystem::update() will be called in the meantime. **/
// This phase is run sequentially after the sensory results are available.
void ReasoningSystem::postUpdate()
{
for (size_t i=0; i<m_Queries.size(); ++i)
{
CoverLocation& cover = m_Query.getCover();
cover.setVisibility(m_Query.getVisibility());
m_pSensorySystem->removeQuery(q);
}
} // postUpdate()
This algorithm is by far the simplest you could implement here, and it's a great default implementation. There are many options for dynamically prioritizing the LoS calculations, but having such a predictable scheduling algorithm running in the background can help guarantee a maximum worst-case response time. Expect to be using round robin at the base of your implementation for a long time to come. The following code provides an example of how the behavior can use this information:
CoverLocation* MyBehavior::selectCover()
{
CoverLocations locations;
findNearbyCoverLocations(locations);
for (size_t i=0; i<locations.size(); ++i)
{
CoverLocation& cover = locations[i];
if (m_pReasoningSystem->isCoverValid(cover))
{
return &cover;
}
}
return NULL;
} // selectCover()
Going further, many options for dynamic prioritization can help make the most of your parallel calculations while maintaining determinism. For example, use distance to the player and/or camera to boost the priority of the cover calculations and to prevent delays when the AI actors are nearby and particularly visible. Another option is to increase the importance of LoS calculations for cover locations that AI actors are currently using to provide information about invalidation sooner. Also, you can locally boost the priority of cover locations around actors that could potentially look for cover-for example, using a radius based on what the AI is interested in.
The downside of prioritization is that it does take two extra steps: one to calculate priorities and a second to sort the options before even making requests. However, if done well, the cost is negligible compared to being able to leverage the underlying multi-threaded LoS calculations.
Results
Both viable solutions for managing LoS queries are tested here: one using on-demand query requests, then capping the maximum number of calculations in each frame (as discussed in the previous article [1]), and another using an extra reasoning layer as described in this article. Comparing the two approaches purely in terms of lag and performance emphasizes their relative tradeoffs.Figure 5 and Figure 6 show statistics taken from 20 seconds of simulation of the Hide & Seek mini-game in the AI Sandbox [3]. In this configuration, two enemy guards were patrolling, and 12 civilian actors were taking cover and hiding. In this setup, there were 104 different cover locations for taking cover and roughly two dozen obstacles to consider.
What can you take away from these results? First, capping the maximum number of queries each frame performs rather well! Assuming a reasonable cap, you'll be able to process the majority of the queries in one frame. The downside of using capping is that there's an unpredictable lag in the queries, and that lag becomes worse when you need the information most-during spikes. The managed queries (via the reasoning component) guarantee a certain baseline performance, so the data is available at the very least within three frames in this scenario. Also, in the best case, it means that the data is available immediately without any extra changes to the structure of the main game loop. This extra reasoning layer assures a minimum number of LoS queries during otherwise empty frames, which is useful for maximizing performance of the parallel computation.
Figure 5. Number of sensory queries processed each frame using three different policies for comparison: in red, all queries are executed; in blue, there's a maximum number of queries per frame (in this case, 30); and in grey, an extra reasoning layer manages the queries (20 per frame).
Digging into the details from the simulation in Figure 5, five common patterns are worth noting:
- (a) Small and frequent spikes that cause minor delays in the queries when they are capped per frame
- (b) Sustained processing within the capped range, which does not cause any particular delays (These are ideal conditions for efficiency.)
- (c) Large spikes when actors request LoS queries at the exact same time, which must be dealt with if the frame rate is to be kept constant
- (d) An equally large lag in the results from the LoS calculations because of the capping of the queries per frame
- (e) Spikes on top of a sustained period of processing also cause multiple frames of lag-up to eight or more
Figure 6. Comparative analysis of the delay incurred by different query-management policies. In this case, it takes four frames to calculate LoS to relevant objects near the actors. The first results can be available the very same frame when the reasoning layer is used. In contrast, the bulk of the queries made lazily are processed 1-3 frames afterwards (assuming a cap of ~10 queries per frame).
Discussion
The concept of a reasoning layer has many benefits from an AI and architectural perspective, including decoupling and modularity. In this case, it also provides some performance benefits-particularly in its ability to schedule parallel workloads reliably.In practice, the reasoning system is often programmed explicitly alongside the AI behavior to make sure it prepares information that closely matches what's going to be requested. This behavior is particularly useful, because you can get up and running very quickly once you've put your architecture in place, and it takes little extra code to start seeing the bene-fits of this approach. Similarly, the overhead of implementing the reasoning logic is insigni-ficant compared to the amount of time spent on the high-level AI and behavior.
The alternative-that is, building a system that can deal with arbitrary requests from the AI (including when it changes throughout development) and pre-caching and pre-empting its queries-isn't an easy task, as it requires tight integration with the high-level AI and a good understanding of how behaviors execute.
Also, it's important to note that the AI can still be given access to the low-level sensory system to make direct requests for one-off LoS queries. Of course, it's best to rely on the middle layer of abstraction provided for the bulk of the information that the AI requires (that is, over 90%).
Summary
Making the most of multi-core hardware always requires a good understanding of the problem at hand. Of course, low-level details are important, too; but in the case of AI and processing information from the environment, you can leverage multiple threads to get orders of magnitude better results.In particular, this article showed how you can create a separate system responsible for managing LoS queries and maintaining things like cover suitability and validity on a per-faction basis. This kind of architecture is not only useful for eliminating redundant computation but also provides an opportunity for further improving the effectiveness of the computation by prioritizing calculations better-according to what the game demands.
References
[1] Multi-threading Line-of-Sight Calculations to Improve Sensory System Performance in Game AI by Alex J. Champandard, 2009.[2] Blackboard Architectures and Knowledge Representation with Damian Isla; AiGameDev.com Masterclass, 2009.
[3] The AI Sandbox [http://aisandbox.com/].
About the Author
Alex Champandard is the mastermind behind AiGameDev.com, the largest online hub and continuous training program for artificial intelligence in games. He has worked in the digital entertainment industry as a senior AI programmer for many years, most notably for Rockstar Games. He regularly consults with leading studios in central Europe, recently finishing a contract on the multiplayer bots for KILLZONE 2 at Guerrilla Games. Alex also organizes Paris Game AI Conference and is an Associate Editor for the IEEE Transactions on Computational Intelligence & AI in Games. He serves on the Program Committee for the AIIDE Conference.For more complete information about compiler optimizations, see our Optimization Notice.
Comments (0) 
Trackbacks (0)
Leave a comment 
To obtain technical support, please go to Software Support.
