DeepMind teaches Artificial Intelligence playing Diplomacy

DeepMind teaches Artificial Intelligence playing Diplomacy

DeepMind is an Alphabet-backed machine learning lab, learning through video games. It managed to tackle, Montezuma’s Revenge, Go, Starcraft 2, Chess, and beyond. It believes that the board game, Diplomacy, can motivate a promising new direction in reinforcement learning research. Preprint server, Arxiv.org, published a paper. Their researchers describe an artificial intelligence system that achieves high scores in Diplomacy while yielding consistent improvements.

Artificial intelligence systems achieved strong competitiveness in large-scale, complex games like poker, Hex, and Shogi. Nevertheless, the bulk of those is zero-sum two-player games where a player can win only because another player loses. Thus, it does not necessarily reflect the real world. Tasks like interacting with customers, congestion, and contract negotiations all involve consideration and compromise of how the preferences of group members conflict and coincide. Even when artificial intelligence software becomes self-interested, it makes gains by cooperating and coordinating. This means interacting among diverse groups, requiring complex reasoning concerning others’ motivations and goals.

Diplomacy forces those interactions by tasking seven players to control multiple units on Europe’s province-level map. In each turn, all the players move all their units simultaneously within one of 34 provinces. Furthermore, one unit might support another unit owned by another or the same player. This is to allow it to overcome resistance by other units. Thus, units that have equal strength can move to an adjacent space or hold a province. Supply centers are provinces, and units can capture supply centers by occupying the province. To own more supply centers is beneficial because the player can then build more units. Thus, it is by holding most of the supply centers, that the game is won.

Diplomacy

Because of the interdependencies between units, players need to negotiate the moves of their units. They can make gains by coordinating their movements with other players. Moreover, they need to anticipate how other players will act and reflect those expectations in their own actions.

artificial intelligence

The co-authors wrote that they propose to use games like Diplomacy to study the detection and emergence of manipulative behaviors. This will ensure that they will know how to mitigate such behaviors in the real world. Research on Diplomacy can pave the way toward creating artificial agents. Those agents may successfully cooperate with others. Such situations include how to handle difficult questions that arise around maintaining and establishing alliances and trust.

DeepMind’s focus was on a “no-press” variant of the game. Here, no explicit communication was allowed. Reinforcement learning agents are agents that take actions in order to maximize a reward. This kind of situation trained those agents using an approach called SBR (Sampled Best Responses). It handled the large number of actions that players took in Diplomacy. Moreover, it had a policy iteration technique, which approximates the best responses to the actions of players as well as any imaginary situations.

This system creates a data set of games. It chooses its actions through a module called an improvement operator. This improvement operator uses policy (strategy) and a value function to discover a policy that can defeat a previous policy. Then, it trains the value function and policy to predict the actions.