Current location - Education and Training Encyclopedia - Graduation thesis - A paper on mahjong
A paper on mahjong
In recent years, artificial intelligence systems have successfully challenged human players in the ancient board game Go and poker-based gambling game Texas Hold 'em, even in the complex video game environments such as Dota and StarCraft. Now, the MSRA team has adopted the traditional China mahjong game opportunity, bluff and strategic mahjong.

At the World Artificial Intelligence Conference (WaiC) held in Shanghai on August 29th, Harry Shum, global executive vice president of Microsoft, officially called MSRA's Suphx ("Super Phoenix") "the strongest mahjong AI in history".

Synced previously reported the AI work in mahjong, which is an imperfect information game. From the perspective of game theory, it is completely different from complete information games such as chess and Go. Players in mahjong can't see anything that may affect the outcome of the game. When choosing to move, they must guess the cards that their opponents can't see.

Suphx learns complex mahjong by cooperating with Tenhou, a global popular online mahjong platform with more than 300,000 members in Japan. From March to June this year, Suphx played more than 5,000 games with human opponents and won the highest ranking of 10 Dan. (The highest level, 1 1 Dan, is only open to human players. The Suphx with a thick sky ranks around 8.7, which is higher than the highest human average of 7.4.

AI's famous video game breakthrough this year is a product with comprehensive game capabilities, including strategy and operation and execution skills. Pure intelligence and strategy games like mahjong pose unique challenges-as Liu Tieyan, vice president of Microsoft Research Asia, said, "Games like Dota are more like' games', while games like mahjong are more like' AI'".

Related research papers have not yet been published, but MSRA disclosed some properties of Suphx model on its blog (in Chinese and Japanese), explaining how they approached Mahjong through deep reinforcement learning:

Adaptive decision: In order to cope with the huge state space, Suphx dynamically adjusts the diversity of the exploration process, so it can test the different possibilities of the game more effectively than the traditional algorithm.

Former coach: In order to solve the challenge of incomplete information, Suphx uses the "first coach" technology to enhance the effect of reinforcement learning. The basic idea is to use some hidden information to guide the training direction of the model in the self-learning training stage, so that the learning path is closer to the optimal path with perfect information. This forces the AI model to learn and understand the visible information more deeply, thus forming an effective decision-making basis.

Comprehensive prediction: In view of the complex reward mechanism of mahjong, the research team adopts comprehensive prediction technology to make up the gap between each game and the final result. This predictor can understand the different contributions that affect the final result in each game, so as to allocate the final reward signal back to each game reasonably, so as to guide the self-competition more directly and effectively, and make Suphx learn advanced technical perspectives from the overall situation.

Microsoft said that it believes that the AI algorithm developed in the Suphx project to navigate the "uncertainty of mahjong" can also be used to solve the problems of unknown factors and random events in the real world.