Current location - Education and Training Encyclopedia - Graduation thesis - How to explain game theory with "prisoner model" in economics?
How to explain game theory with "prisoner model" in economics?
First, the generation of the optimal strategy in the game

Before starting to study cooperation, Robert axelrod set two premises: first, everyone is selfish; Second, there is no right to interfere in personal decision-making. In other words, individuals can make decisions completely according to their own interests. Under this premise, the problems to be studied in cooperation are: first, why do people cooperate; Second, when people cooperate and when they don't; Third, how to get others to cooperate with you.

There are many problems of cooperation in social practice. For example, tariff retaliation between countries, raising tariffs on other countries' products is conducive to protecting their own economies, but raising tariffs between countries will raise product prices, lose competitiveness and damage the complementary advantages of international trade. In the countermeasures, because both sides pursue the maximization of their own interests, the interests of the group are damaged. Game theory describes this problem with the famous prisoner's dilemma.

A and B each represent one person, and their choices are completely indistinguishable. Choose c in cooperation, and choose d in non-cooperation. If AB chooses to cooperate with C, they will each get 3 points; If one party chooses C and the other party chooses D, the one who chooses C will get 0 points and the one who chooses D will get 5 points. If AB chooses D, both parties get 1 point.

Obviously, the best result for the team is that both sides choose C, with 3 points each and 6 points. If one chooses C and the other chooses D, the total score is 5 points. If both people choose D, they will get 2 points in total.

This matrix is used by countermeasure scholars to describe the conflict between individual rationality and group rationality: when everyone pursues the maximization of individual interests, the group interests are damaged, which is the prisoner's dilemma. In the matrix, for A, when the opponent chooses C, D gets 5 points, and C only gets 3 points; When the opponent chooses D, he chooses D to get 1 point, and chooses C to get 0 point, so whether the opponent chooses C or D, for A, D gets the most points. This is a unilateral advantage strategy. When two optimal strategies meet, that is, A and B both choose D, the result is 1 minute. This result is not the best in the matrix. The dilemma is that when everyone adopts their own dominant strategy, the solution is stable, but it is not Pareto optimal. This result reflects the contradiction between individual rationality and group rationality. Mathematically, this one-time decision matrix has no optimal solution.

If the game is played many times, as long as the player knows the number of games, he will definitely take the strategy of betraying his opponent at the last time. In this case, it is not necessary for every bureau to cooperate. So in many known games, no one will cooperate.

If the game is played among many people and the number of times is unknown, players will realize that when they continue to cooperate and reach a tacit understanding, each person will get 3 points, but if they continue to cooperate, each person will always get 1 point. In this way, the motivation for cooperation is revealed. For many games, the future income should be more than the current income by a discount rate W. The greater the W, the more important the future income is. When the multiplayer game continues and W is large, that is, the future is important enough, the optimal strategy is related to the strategies adopted by others. Suppose someone's strategy is to cooperate for the first time, and then as long as the other party does not cooperate once, he will never cooperate. Of course, it is the best policy to cooperate with such countermeasures. If someone always cooperates regardless of the other's strategy, then his uncooperative strategy always gets the highest score. For those who always don't cooperate, they can only adopt the strategy of non-cooperation.

Axelrod did an experiment and invited many people to participate in the game. The scoring rules are the same as the previous matrix, and it is unknown when the game will end. He asked each contestant to write the strategy with the highest score into a computer program, and then asked the programs to compete with each other in a single round robin to find out the strategy with the highest score.

The first round of the game involved 14 programs, plus Axelrod's own random program (that is, choose cooperation or non-cooperation with 50% probability), and it was run 300 times. Results The program with the highest score was tit for tat written by Canadian scholar Rob. The feature of this program is that the first game adopts a cooperative strategy, and every step in the future follows the opponent's strategy. Last time you cooperated, this time I cooperated. You didn't cooperate last time, and I won't cooperate this time. Axelrod also found that the program with the highest score has three characteristics: first, it never betrays first, that is, "kindness"; Second, to retaliate against the betrayal of the other party, it is either always cooperating or "irritating"; Third, if someone betrays you once, you can't take revenge endlessly. In the future, as long as people change their cooperation, you should also cooperate, which is "tolerance."

In order to further verify the above conclusion, Ai decided to invite more people to play the game again and announce the first result. The second time, he collected 62 programs, plus his own random programs, and held another competition. As a result, the first place is still "tit for tat." Ai's conclusion to this game is: First, "answer blows with blows" is still the best strategy. Second, the above three characteristics are still valid, because among the top 15 people, only the eighth Harrington program is "unfriendly", while among the bottom 15 people, only 1 is always "friendly". Irritability and tolerance have also been proved. In addition, a good strategy must also have a "clear" feature that can be recognized by the other party in three or five steps. Too complicated countermeasures are not necessarily good. "Tit for tat" has a good clarity, so that the other party can quickly discover the law and have to adopt a cooperative attitude.

Second, the process and law of cooperation.

The strategy of "tit for tat" got a good score in a static group. Then, in a dynamic and evolving group, can such cooperators appear, develop and survive? Will the group evolve in the direction of cooperation or in the direction of non-cooperation If everyone doesn't cooperate at first, can they still cooperate in the process of evolution? In order to answer these questions, ehrlich analyzed the evolution process of cooperation by using ecological principles.

Assuming that the strategic group formed by the game evolves from generation to generation, the rules of evolution include: first, trial and error. When people treat their surroundings, they don't know what to do at first, so they try this, try that and do anything that works well. Second, heredity. If a person is cooperative, his offspring will have more cooperative genes. Third, study. The process of competition is the process of learning from each other. If the "tit for tat" strategy is good, some people are willing to learn. According to this idea, ehrlich designed an experiment, assuming that among 63 countermeasures, whoever scored high in the first round will have a higher proportion in the second round group, and it is a positive function of his score. In this way, the structure of the population will change in the process of evolution, from which we can see in what direction the population evolved.

The experimental results are very interesting. "Measure for measure" initially accounted for 1/63 in the crowd. After 1000 generations of evolution, the structural stability accounts for 24%. In addition, some programs disappeared in the process of evolution. One of the programs is worth studying, that is, the only "unkind" Harrington program in the original top 15. Its countermeasure is to cooperate first. When the other party has been cooperating, it suddenly refuses to cooperate. If the other side retaliates immediately, it will resume cooperation. If the other party still cooperates, then continue to betray. This program developed rapidly at first, but it began to decline when other programs except tit for tat began to disappear. Therefore, measured by the cooperation coefficient, groups are more and more cooperative.

Evolutionary experiments reveal a philosophy that the success of one strategy should be based on the success of the other. "Tit for tat" in two people's countermeasures, it is impossible to score more than the other side, and at most it is tied, but its total score is the highest. The foundation on which it lives is very solid, because it makes the other side get high marks. That's not the Harrington plan. When it gets a high score, the opponent will get a low score. Its success is based on the failure of others, and the losers will always be eliminated. When the losers are eliminated, the winners who take advantage of others will also be eliminated.

Then, can "answer blows with blows" survive in a group of extremely selfish people who are not the author? Ai found that when the score matrix and future discount coefficient are fixed, it can be calculated that as long as 5% or more members of the group are "tit for tat", these cooperators can survive, and as long as their scores exceed the overall average score of the group, the cooperative group will become bigger and bigger and eventually spread to the whole group. On the other hand, in the group with the majority of collaborators, no matter how big the proportion of non-authors is, it is impossible for non-authors to go from bottom to top. This shows that the ratchet wheel of social evolution to cooperation is irreversible, and the cooperation of groups is getting bigger and bigger. It is with such an inspiring conclusion that axelrod broke through the research dilemma of "Prisoner's Dilemma".

In the research, it is found that the necessary conditions for cooperation are: first, the relationship should be lasting, and there is no motivation for cooperation in the one-time or limited game; Second, in order to repay the other person's behavior, a countermeasure that will always cooperate will not cooperate with him.

So, how to improve cooperation? First of all, to establish a lasting relationship, even love needs to establish a marriage contract to maintain the cooperation between the two sides. Why do railway station vendors cheat people? Why should a group system be formed at work? When changing the defense, one side always has to attack a little, which is the case on the front line between China and Vietnam. Second, we should enhance our ability to recognize each other's actions. If we don't know whether the other party cooperates, we can't repay him. Third, to protect your reputation, you must do it if you want revenge, so that people will know that you are not easy to bully and will not dare not cooperate with you. Fourth, games that can be completed step by step should not be completed at one time to maintain a long-term relationship. For example, trade and negotiations should be carried out step by step, and the other side should adopt a cooperative attitude. Fifth, don't be jealous of others' success. "Measure for measure" is such a model. Sixth, don't betray first, lest you bear the moral pressure of the culprit. Seventh, we should not only repay betrayal, but also cooperation. Eighth, don't be smart and take advantage of others.

(The difference between playing bridge and playing mahjong)

Axelrod put forward several conclusions at the end of the book The Evolution of Cooperation. First, friendship is not a necessary condition for cooperation. Even enemies may cooperate as long as they meet the conditions of continuous relationship and mutual return. For example, during World War I, German and British troops were caught in a three-month rainy season in trench warfare. In these three months, the two sides reached a tacit understanding that they would not attack each other's grain trucks and supplies and fight to the death in the big counterattack. This example shows that friendship is not a prerequisite for cooperation. Second, foresight is not a prerequisite for cooperation. Ehrlich cited an example of cooperation between lower animals and plants in the biological world to illustrate this point. However, when far-sighted human beings understand the law of cooperation, the process of cooperative evolution will be accelerated. At this time, foresight is useful and learning is also useful.

When random interference is taken into account in the game, that is, the countermeasures begin to betray each other due to misunderstanding, Dr. Wu Jianzhong found through research that the revised "answer blows with blows" means not retaliating against each other's betrayal with a certain probability, and "answer blows with blows" means actively stopping betrayal with a certain probability. The stronger the ability of all members of the group to cope with the random environment, the better the effect of "repentance and retribution" and the worse the effect of "leniency and retribution".

Third, axelrod's contribution and limitations

Axelrod studied how to break through the prisoner's dilemma and realize cooperation by mathematical and computerized methods, which made this research reach a new level. His mathematical proof is undoubtedly very eloquent and convincing. Moreover, some of his conclusions in computer simulation are very amazing findings. For example, the person with the highest total score does not get the highest score in every game. (The War between Liu Bang and Xiang Yu)

From the sociological point of view, the strategy of "answer blows with blows" discovered by Ai Shi can be regarded as a kind of "reciprocal altruism". The motivation of this behavior is personal self-interest, but the result is that both sides benefit, and through reciprocal altruism, it may cover the widest social life. People form a social life order by giving gifts and returning gifts, which is the easiest to understand even among people who have been isolated for many years and have no language. For example, when Columbus landed on the American continent, his initial contact with Indians began with the exchange of gifts. Some seemingly pure altruistic behaviors, such as giving gifts for free, have also been rewarded in some indirect ways, such as gaining social reputation. Studying this behavior will be of great significance to our understanding of social life.

When the prisoner's dilemma expands into a multiplayer game, it embodies a broader problem-"social paradox" or "resource paradox". The resources that human beings have are limited. When everyone tries to get more from limited resources, local interests and overall interests conflict. Population problem, resource crisis and traffic congestion can all be explained by social paradox. Among these problems, the key is to control everyone's behavior by studying and formulating the rules of the game.

Some of axelrod's conclusions can be easily found in China's classical culture and moral tradition. The idea of "tit-for-tat" is embodied in "returning a peach to a plum" and "I won't commit a crime unless people attack me". But these things are not optimal, because "tit for tat" is flawed in the real social life full of randomness. In this regard, thousands of years ago, Confucius put forward such a wonderful corrective strategy as "repaying good for evil and repaying bad for good". The so-called "straightness" is justice, which is a revised "tit-for-tat" and corrects the degree of revenge. You were supposed to be fined 5 points, but now you are only fined 3 points, thus ending the revenge from generation to generation with a fair trial.

However, some of Ai's assumptions and conclusions about players make his research inevitably divorced from reality. First of all, the book Evolution of Cooperation implies an important assumption, that is, the game between individuals is completely indistinguishable. In the real game, it is impossible for players to be absolutely equal. On the one hand, there are differences in the actual strength of countermeasures. If both sides betray each other, they may not get 1 point, but the strong one will get 5 points and the weak one will get 0 points. In this way, the revenge of the weak is meaningless. On the other hand, even if the two sides of the game are really evenly matched, one side may have the gambler's psychology, think that he is stronger, and take the strategy of betrayal to take advantage. The score matrix of Ai ignores this situation, and this gambler psychology has just triggered a large number of zero-sum games in society. Therefore, the program can be further improved on this basis.

Secondly, ehrlich believes that cooperation needs no expectation and trust. This is where he is often questioned. Countermeasures make their own tactics according to opponents' previous tactics, while cooperation requires individuals to recognize those they have met and remember their interaction history in order to respond, which implies "expected" behavior. Trust may be an essential part of cooperation between the two sides when dealing with complex confrontation environment. However, how to embody expectations and trust in computer programs still needs to be studied.

Finally, repeated games are difficult to be fully realized in reality. The existence of a large number of disposable games has led to many uncooperative behaviors. Moreover, after being betrayed by the other party, the counter-attacking party often has no chance or strength to retaliate. For example, the default behavior in the stage of capital accumulation, nuclear deterrence between countries. In this case, if the society wants to make the transaction possible and prevent uncooperative behavior, it must adopt legal means to replace the "tit-for-tat" between individuals with legal punishment and standardize social behavior. This is an important enlightenment of axelrod's research to the institutional school.