“Do you think that the input and output data structures would have a lot in common across different games? Obviously, differing gameplay requirements will impact the exact details, but I wonder whether the answer to questions like “What’s the impact of using the exact distance to the nearest threat as an input, versus a string of booleans that indicate whether there is a threat present at the 5m/10m/15m/20m range?” would have applicability across many games. I’m not sure whether it’s possible to spot patterns like that, or whether these systems make it too difficult to link any particular observed behavior back to any particular aspect of the feature selection.”
It’s a great question and one that deserved a lot of thought in order to frame a coherent, measured response. Feature Selection in general is still an open question in a lot of regards. For many techniques, too many features of the world being provided can lead to trouble – neural networks are a prime example of a system that confuses very easily when extraneous data is thrown at it. At the same time though, it’s far far too easy to eliminate something important when you start getting a bit particular about what you put in and what you leave out. There are of course other approaches you could take that that don’t get quite as messed up by the addition of extra values, and still others that select for themselves what they see as the important aspects. I’m going to dig into this a little bit in the remainder.
Wacka Wacka Wacka…
For better or worse I’ve spent a significant portion of my career to date thinking about Pac-Man. For a period I watched over around 100 computers playing games of Ms Pac-Man day in, day out while my experiment tried to evolve a good configuration of parameters for the AI system I had written to play the game. Even just talking about it now, I can feel the flashbacks starting up again…
Pac-Man itself is deterministic, meaning that the ghosts will behave in the same manner consistently. They will always react in the same way if you always act in the same way. The upshot of this is that there are known “optimal” solutions to Pac-Man, which is to say that some very lonely/bored individuals have figured out exactly moves you need to do in order to “win”. Ms. Pac-Man on the other hand is non-deterministic. The ghosts have behavioural patterns, but there are random elements introduced, so whilst at a high level you can characterise what the ghosts will do, you can’t predict with any certainty where they will be given a fixed set of inputs from the player. That makes it interesting as an AI challenge problem, and writing an agent to play Ms. Pac-Man has been part of the competition track for a number of conferences for around 5 years or so now. It’s telling as to the complexity of even a simple game such as Ms. Pac-Man that although the highest recorded score for a human player is 920,000, no AI has yet been able to consistently break 50,000.
Feature selection in this environment is key. It’s a fast paced world, so time spent on deliberative reasoning will likely result in death. At the same time though, there are some concepts that need to be reasoned about. For the controller I mentioned earlier that I was evolving parameters for, we took a few basic metrics such as distance to the nearest ghost, distance to the nearest power pill and so forth and tried to use these to trigger transitions within a Finite State Machine.
Of course, it was incredibly naive. We had a very minimal feature set that we were working with in an effort to reduce complexity, and we reduced it alright. We’re now working with much robust models of the game state, using weighted edges to know not just where the nearest pill is, but where all the pills are and their distribution across the maze. The more advanced agents have tactics up their sleeve such as setting ambushes for the ghosts by lurking near power pills. The choice of what to expose to the decision logic is making a big difference to their performance. (Although it’s worth noting that at least one year, the competition winner actually was a very poorly designed agent – it’s strength came from the Software Engineering approach the researchers had taken : they’d disabled garbage collection for their system, making it execute much faster. Sometimes efficiency is more important than feature selection.)
Knowing When to Hold ‘Em
The Pac-Man example shows why you might sacrifice input data in order to reduce the complexity of the decision making process. At the other end of the spectrum, I’d like to briefly summarise some work we’ve been doing in Poker, specifically Texas Hold ‘Em. Because it has a very stable probabilistic model, over time there are average trends that hold out as sound play strategies. Professional human players typically categorise their opponents based on three primary statistics – “Voluntarily Put in Pot” (VPiP), “Pre-Flop Raise” (PFr) and “Aggression Frequency” (AFq). VPiP is when a player chooses to put money in the game – it’s known that only around 13% of the hands you are dealt will be worth considering playing, Similarly, there’s only a set number that are worth raising while you can only see two cards in your own hand (PFr), and as the game progresses, it only makes sense (probabilistically speaking at least) to be a certain level of aggressive (AFq). So over time, it’s possible to build up a model of what a “rational” player will do. Human players categorise around nine different deviations from the norm using these three different measures. But what if you could go further? Using machine learning techniques and a massive dataset, we started trying to categorise players using a 28-dimension representation of a player. From that we were able to extract 16 distinct types of player.
In the case of the work we were doing at the time, a more accurate classification of the opponents meant better prediction of their actions, which meant we could build our agent in such a way as to exploit weaknesses in the opponent’s play and win more frequently. It’s not a big leap to take the idea a step further though – instead of trying to build an AI player that can maximise its winnings against an opponent, you can take these concepts and use them to maximise the challenge. Silent Hill – Shattered Memories showed that it’s possible to tailor a game to the player’s own personality using psychological profiling, and Mahlmann et al demonstrated the way machine learning could be applied to take this kind of analysis and predict a player’s behaviour in Tomb Raider Underworld, so as much as it applies in a theoretical book-learnin’ context like Poker, it also holds true in AAA titles. I’m a little bit off topic right now, but imagine a game that tuned into you specifically and could push the “one more turn” button people experience with Civilisation – tailored specifically to give you what you need, when you need it. It’s a cool, if somewhat terrifying, prospect.
Back to the Point
I’ve waffled along for a bit and given you two examples of games that seem to require very different data structures and approaches to feature selection. But I can’t say I’m in a position to answer Richard’s question yet. And as far as I’m aware, he’s right in thinking that nobody is doing a comparative analysis of what techniques work in general contexts. There are some accepted rules of thumb for what will and won’t work in certain genres of game, and problems tend to have standard “shapes”, as an example, driving games generally require an AI that will react to the current situation rapidly and typically won’t need to perform long term reasoning whereas strategy games require more stateful reasoning over time. With that said, humans don’t require different hardware or software to play different types of games (or maybe they do – has anyone taken an MRI scan to compare active neural areas?). I suspect that what we will find over time is that humans reason using abstract rather than concrete terms – that Richard’s example about absolute distance to enemies or truth values for a range to the enemies will actually turn out to be neither – instead being something more like concepts of “near” and “far” which can take arbitrary values that are context dependent, whilst the actual representation itself remains reasonably independent of context. But that’s purely speculation – this kind of general purpose system is far from being a twinkle in any researcher’s eye just now.
I don’t know if that really addresses Richard’s question properly, but that’s my take on it. Hit the comments to join in the discussion!