DeepMind and other researchers often turn to games to demonstrate how AI agents have progressed. The Alphabet division just finished a StarCraft II Demonstration where its AlphaStar AI agent successfully defeated professional gamers 10-1.
In late 2017, DeepMind set out to master StarCraft II after conquering Go. Blizzard created StarCraft II Learning Environment (SC2LE) with special hooks for researchers and developers. The video game is a “grand challenge” for how successful AI agents are at multiple tasks that have to be balanced concurrently in real-time. Skills needed to win include Game theory, Imperfect information, Long term planning, Real time, and Large action space.
For example, while the objective of the game is to beat the opponent, the player must also carry out and balance a number of sub-goals, such as gathering resources or building structures. In addition, a game can take from a few minutes to one hour to complete, meaning actions taken early in the game may not pay-off for a long time. Finally, the map is only partially observed, meaning agents must use a combination of memory and planning to succeed.
Last November, DeepMind demonstrated its progress at Blizzcon, but today’s demonstration showed off AlphaStar competing and winning against two professional players. Each played a five-game series, with DeepMind’s AI sweeping all 10 rounds.
During these matches, AlphaStar had the advantage of being able to see the whole map at once, but DeepMind worked with the players to level the playing field. Mainly, AlphaStar could not react quicker than a human, nor execute more actions per minute.
Those games took place in December, with DeepMind just releasing the recordings today as part of the livestream. However, in a live exhibition match afterwards, a human was able to defeat AlphaStar after having more time to analyze the AI agent.
Livestreamed on YouTube and Twitch, there were approximately 34,000 live viewers during the over two-hour demonstration that had commentators, the DeepMind team responsible, and players discuss progress. Full match replays from DeepMind are now available for players to analyze.
In early 2018, DeepMind set out to “scale up and speed up” its StarCraft project. It did so by having different versions of AlphaStar compete against each other in an AlphaStar league. Lasting two weeks, Google’s third-generation Tensor Processing Units were leveraged for training.
Agents learned how to beat one another and improved rapidly. Discovering new strategies in the process, DeepMind touted approximately 200 years of training for AlphaStar.
Agents are initially trained from human game replays, and then trained against other competitors in the league. At each iteration, new competitors are branched, original competitors are frozen, and the matchmaking probabilities and hyperparameters determining the learning objective for each agent may be adapted, increasing the difficulty while preserving diversity. The parameters of the agent are updated by reinforcement learning from the game outcomes against competitors. The final agent is sampled (without replacement) from the Nash distribution of the league.