(1) Select
Starting from the root node, that is, the decision-making situation R, choose a node T that needs to be expanded most urgently; Case R is the first node to check. If the checked node has a move M that has not been evaluated, then the new situation obtained after the checked node executes M is that we need to expand T; If all feasible moves in the checking situation have been evaluated, a feasible move with the largest ucb value is obtained by using ucb formula, and the new situation generated by this move is checked again; If the checked situation is that the game has ended, directly execute step 4; Through repeated checking, we finally get the last checked situation C and the unevaluated movement M at the bottom of the tree, and execute step 2.
(2) expansion
Add a child node to Scenario C that exists in memory at this time. This child node is obtained by scenario c executing move M, which is t.
(3) Simulation
Starting from situation T, the two sides began to settle at random. Finally, a result (win/lose) is obtained to update the winning rate of T node.
(4) Back propagation
T After the simulation, its parent node C and all its ancestor nodes update the winning rate in turn. The winning rate of a node is the average winning rate of all its children. It starts from T and propagates back to the root node R, so the winning percentage of all nodes on the path will be updated.