My last blog post about data-oriented behavior trees was/is too long for many to find the time necessary to read it. I myself would have problems to block enough time and concentration to fully delve into it. Therefore here is the high level overview of it stripped of the description of my attempt to follow data-oriented design practices.
- July 17, 2011 – Posted bjoernknafla.com, too.
Articles in the data-oriented behavior tree series
- Introduction to Behavior Trees
- Shocker: Naive Object-Oriented Behavior Tree Isn’t Data-Oriented
- Data-Oriented Streams Spring Behavior Trees
- Data-oriented Behavior Tree Overview (this post)
- Behavior Tree Entrails
Memory accesses and data movement have larger costs (energy and clock-cycles) than computations working on data in registers in todays and especially future hardware. Memory bandwidth to main memory is limited. The gap, measured in processor cycles, between memory access speed and computational performance is a frightening abyss (I love writing non-scientific hyperbole…). Cache misses and/or the necessity to get data from main memory instead of from local memory are a bottleneck for (core local) computations and possibly steal memory bandwidth from calculations running on other cores.
Behavior tree (BT) implementations relying on classical hierarchies of nodes pointing to other nodes easily result in many random memory accesses when the tree is traversed. Each random memory access is a potential cache miss – which means to wait on data until it arrives and wasting cycles twiddling virtual thumbs.
Additionally, if the actions called by leaf nodes crunch through huge amounts of data, then even more cache misses happen – data to crunch is requested, on arrival it might evict the behavior tree data, which needs to be recovered from main memory once tree traversal goes on.
While many uses of behavior trees won’t see a blip of their traversal in the profiler I want to understand and learn how to build a more hardware-efficient and therefore data-oriented behavior tree to run many of them for a lot of entities – perhaps even on the SPUs of a PS3 (daydreaming in progress).
During development, short iteration times and support for monitoring and debugging of the game AI is an important factor to get the gameplay therefore the player experience right. Instead of re-compiling and re-starting a game due to changes of behavior trees I’d like to support live tuning of the running game.
Key attack vectors
To meet the needs for speedy traversal of behavior trees during a game as well as rapid modification, and game observing during development, I use two separate behavior tree representations for runtime and development-time.
Runtime – behavior streams
Three classes of actions are used:
- Immediate actions are called immediately during traversal of a behavior tree instance but only work on isolated and private data of the controlled entity – aka actor. Actor knowledge is aggregated in a blackboard – which should be a reasonable and self-contained blob of data. Isolated data enables parallel processing of different actors and, if the data is easily moveable and is small enough to fit into local storage, shouldn’t trash the cache too hard (trash me baby not one time).
- Actions digging into huge amount of data or which are best collected and batch processed are deferred. Instead of calling them immediately, requests to run them are collected. Cache misses should be minimal during behavior tree traversal and non-unified memory address spaces, like SPU local memory or graphics device memory, should work with deferred actions.
- As deferring introduces latency – at earliest the following behavior traversal after enqueueing an action request can receive and react to a state update – persistent actions are introduced. They run without an explicit invocation via behavior tree traversal and their last behavior result state is always accessible without any waiting involved.
Restricting immediate actions to only access an actor’s blackboard, deferring actions, and introducing persistent action execution can be used by node-and-pointer-based behavior tree implementations. But let’s push forward.
To get rid of the random memory accesses from chasing pointers up and down the node hierarchy, I flatten the graph via a pre-order traversal into an array of node representing items – a stream.
This stream describes the static shape of a behavior tree.
Traversal of this shape item stream is mainly forward oriented (in a later blog post I’ll show that I jump backwards to stop running actions if children aka sub-behaviors are abandoned when a parent bails out) though sub-streams might be skipped forward. As a consequence, streaming and possibly even prefetching of data is an option. Because of the branchy nature of behavior trees, the forward skipping might decrease performance benefits from the array storage approach though.
Each behavior tree instance – aka actor – has its own traversal state (e.g., which child of a sequence was running during last traversal and should be visited again during the next decision making update) and action state (ready to run, running, success completion, fail without side effects, error with side effects) based on the static shape stream. Only inner nodes – non actions – have a traversal state. As all inner nodes decide how to traverse an actor’s behavior tree, and therefore decide which actions to run, I call their state decider state instead of traversal state.
States (for actions and decider nodes) contain the shape item index they are associated with and are sorted according to it in increasing order. To update the decision making process of an actor an interpreter (a kind of virtual machine (VM) or processing kernel) starts traversing the actor’s behavior shape stream from the beginning. If no state is found for the currently visited node its default state is used. This way fewer state data needs to be stored which should result in lower bandwidth requirements to move state data compared to always storing a state for every shape node.
By analyzing the static shape of a behavior tree it should be possible to determine the max number of action and decider states active at once. The necessary storage buffers are therefore pre-allocated.
I’ll look into editing in a later blog post. My main thoughts about it are:
- Use a representation that makes it easy to create and change behavior trees. In an editor, they do not need to be traversed multiple times per frame by many actors.
- Establish a connection between the edit-/development-time representation and the runtime streams to enable monitoring of the runtime and stats collecting.
- While connected, collect edit-side changes, store them in delta-change commands, and send them to the runtime to adapt to modification live.
- Delta-change commands will adapt shape item index ranges stored in action and decider states, and in the mapping between the behavior tree runtime and the systems which execute deferred and persistent actions.
Execution – interpreters go, go, go
During each simulation tick a behavior tree system runs through four stages:
- Collect action state changes triggered by deferred and persistent actions. Bring actor blackboards up to date.
- Apply editing changes to the affected behavior shape streams, and their actors action and traversal states.
- Iterate over the actors associated with the different behavior tree shapes and update their decision making – possibly in parallel.
- Distribute the collected action launch and cancel requests to their respective systems.
Decision making updates are carried out by interpreters – you could also call them kernels or virtual machines (VMs) working on the streams of behavior tree shape items and decider and action states.
Each interpreter has its own private set of working data to:
- Collect the action requests of the actor it currently works on.
- Store action states for running immediate and deferred actions.
- Push- and pop inner behavior tree decider items onto a decider guard stack to keep track of the chain of parent nodes – or scope – of traversed nodes and to handle behavior result states according to their parent node type and decider state (e.g., last child traversed).
- Store the updated decider state which is emitted when leaving an inner node and therefore popping the decider guard stack.
On finishing an actor update, an interpreter:
- Replaces the last action and decider states stored in the actor with the newly generated ones.
- Emits action requests to the shared action request queue of the behavior tree system (to enable sorting and batching of actions for all actors).
- Adds a running state for each requested action to keep track of running deferred actions for which the actor expects action state changes.
- Cleans the volatile helper buffers and stacks to prepare working on the next actor to update.
EOP (end of post – nearly…)
Data-oriented behavior trees are already in heavy duty use throughout the industry as I learned at the insight-heavy Paris Shooter Symposium. I haven’t hear about a behavior tree design that allows live editing though – which seems to be the main point I should push with my work.
Additionally, I already have a concept how to eventually drive my data-oriented behavior trees via events instead of traversing them for each update but spare that topic for a later, more specialized article. I’m not convinced that the added complexity leads to performance benefits though.
Thanks to all at the Paris Game AI Conference 2011 who told me that my articles are too long to find the time to read them – you are absolutely right and I’ll try to write shorter but hopefully still useful articles.
Special thanks to Behave and my own approach. He also motivated me to draw more diagrams!
I am looking forward to your questions and feedback and would love to learn more about how you are using behavior trees – data-oriented or not – in the trenches or to discuss their use here or via email.