Add ‘Diplomacy’ to the list of games that AI can play as well as humans

Machine learning systems have been mopping the floor with their human opponents for over a decade (seriously, that first Watson Jeopardy win was all the way back in 2011), though the types of games they excel at are pretty limited. Typically competitive board or video games with a limited playing field, consecutive moves, and at least one clearly defined opponent, any game that requires number crunching is to their advantage. However, diplomacy requires very little processing power, instead demanding players negotiate directly with their opponents and play at the same time – things that modern ML systems are generally not built for. But that hasn’t stopped Meta researchers from designing an AI agent that can negotiate global policy positions as well as any UN ambassador.

First released in 1959, Diplomacy works like a more refined version of RISK, where two to seven players take on the role of a European power and try to win the game by conquering their opponents’ territories. Unlike RISK, where the outcome of conflicts is determined simply by the roll of the dice, diplomacy requires players to negotiate with each other first – making alliances, working underhand, all that good stuff – before everyone moves their pieces at the same time during the next game phase . The ability to read and manipulate opponents, convince players to form alliances and plan complex strategies, navigate delicate partnerships, and know when to switch sides are all a big part of the game – and all the skills that machine learning systems generally lack.

On Wednesday, Meta AI researchers announced they had overcome those machine learning shortcomings with CICERO, the first AI to demonstrate human-level achievements in diplomacy. The team trained Cicero on 2.7 billion parameters over the course of 50,000 rounds on webDiplomacy.net, an online version of the game, where it placed second (out of 19 participants) in a 5-game league tournament, all while doubling the average score of his opponents.

The AI ​​agent proved so adept “at using natural language to negotiate with humans in diplomacy that they often preferred working with CICERO over other human participants,” the Meta team noted in a press release Wednesday. “Diplomacy is a game about people and not pieces. If an agent cannot recognize that someone is probably bluffing or that another player would view a certain move as aggressive, he will quickly lose the game. Likewise, if he does not talk like a real person – show empathy, build relationships and talk well about the game – it won’t find other players willing to work with it.”

Target Cicero

meta

Essentially, Cicero combines the strategic mindset of Pluribot or AlphaGO with the natural language processing (NLP) capabilities of Blenderbot or GPT-3. The agent is even able to think ahead. For example, Cicero can deduce that it will need the support of a particular player later in the game, and then devise a strategy to win that person’s favor — even recognizing the risks and opportunities that player sees from his particular point of view . view,” the research team noted.

The agent does not train through a standard reinforcement learning program as similar systems do. The Meta team explains that this would lead to sub-optimal performance, as “pure reliance on supervised learning to choose actions based on previous dialogues results in an agent that is relatively weak and highly exploitable.”

Instead, Cicero uses “iterative planning algorithm that balances dialogue consistency with rationality”. It will first predict its opponents’ play based on what happened during the negotiation round, as well as what play it thinks its opponents think they will make before “iteratively improving these predictions by trying to adopt new policies that have a higher expected value given the other players’ predicted policies, while also trying to keep the new predictions close to the original policy predictions.” Easy right?

The system is not yet foolproof, because the agent occasionally gets too smart and quits play yourself by taking contradictory negotiating positions. Yet his performance in these early trials is superior to that of many human politicians. Meta plans to further develop the system to “serve as a secure sandbox to advance human-AI interaction research.”

All products recommended by Engadget are selected by our editorial team, independent from our parent company. Some of our stories contain affiliate links. If you purchase something through one of these links, we may earn an affiliate commission. All prices are correct at time of publication.

Add Comment