Current location - Education and Training Encyclopedia - Educational Knowledge - How to scientifically explain "correlation ≠ causality"
How to scientifically explain "correlation ≠ causality"
Text /W xiansen

I mentioned in my article more than once that correlation is not equal to causality, but I have never been able to analyze the middle principle well. Today I will use this article to talk about it in detail.

This paper is divided into the following four parts:

0 1 Talking about our cognitive habits from some phenomena in life;

Talk about the difference between relevance and causality.

03 linear regression model-the most reliable method to prove correlation

Methodology part: What's the use of knowing this?

Parts 02 and 03 are slightly brain-burning, and you can skip them if you find it difficult to understand. From the method of finding correlation in statistics, we can know that it is quite difficult to prove the causal relationship between two events, or even it is difficult to prove only the correlation.

Therefore:

Knowing this, how should we think?

Usually, if you have a cold, how do you attribute it?

From the direct cause: maybe you accidentally caught a cold somewhere?

Starting from living habits: Maybe you have been sleeping too late recently, or your diet is unhealthy?

Starting from the surrounding environment: Which family member/friend/colleague may have been ill recently?

This is actually quite normal speculation, and there is another abnormality, that is, judging directly with things you don't like. For example, my mother especially likes to boil down the unpleasant things to: "Do you dare to drink so much coke and catch a cold?" However, I only drink one or two cans of coke a week, saying that it will cause a cold, of course I won't believe it.

To sum up, in daily life, people are used to attributing things intuitively, which is often unfounded or even completely wrong. Let's look at a few more examples:

These propositions seem to be "very reasonable" cause and effect, but in fact, the statement of "breaking" is very simple:

Please note that here, it is said that the cause and effect are broken, but the correct conclusion is not given.

Here, these examples are all to prove:

So what kind of attribution is right?

Let's first look at the concepts of correlation and causality in statistics and find out the relevant methods.

Correlation and causality have the following statistical significance:

Relevance means that event A is related to event B, and there can be many relationships here, which can be that A causes B or B causes A; Or A is just one of the reasons, and other C, D and E events may be needed to cause B..

Causality is a kind of correlation, but its requirements are more stringent, and it has one more attribute than correlation-inevitability, that is, every cause has its consequences, and every A has its B.

If there is a causal relationship between event A and event B, then they must be related; However, if there is only a correlation between event A and event B, there may be no causal relationship between them.

For example, water will boil when heated to 100 degrees Celsius.

A = "Heat water to 100℃"

B = "Water will boil"

We say that A is the cause and B is the effect. If we experiment in plain areas, this conclusion is always correct. But if you put water on the plateau above 3000 meters above sea level, the boiling point will become lower. At this time, A can't deduce B, so we have to change A to A 1:

A 1 = "Heat water to boiling point (boiling point decreases with altitude)"

B = "Water will boil"

Science is like this, even if it has been recognized as the law of cause and effect, it may be constantly updated. We also see the same exploration process from the history of physics: from Newton's three laws of mechanics to Einstein's theory of relativity, to quantum mechanics and then to string theory. Truth is always denied and then reconstructed, so:

So how do statisticians find the correlation and gradually deduce the cause and effect? Next, let's look at the regression model of statistics.

There is a thing called regression model in statistics, which can prove the correlation between two sets of data. The description of image points is to throw two sets of data into an XY coordinate system, and then use a straight line or curve to fit it, so that this line passes through the points as much as possible. If this straight line can be expressed by an equation, then we say that they are linearly related.

Regarding linear regression, there is a famous study in history, which comes from Galton's law put forward by British statistician Francis Galton, and the conclusion is this:

The results of graphic display are as follows:

The abscissa is the height of parents, and the ordinate is the height of children.

The data in the purple box represents the part where the parents are short but the children are taller than the parents.

The data in the small green box represents the part where the parents are tall but the children are shorter than their parents.

The red line is the line that can pass through the most points, which is often a regression equation.

Taking the above data as an example, the mathematical steps to solve this equation are as follows:

0 1 Find the difference between the y value corresponding to the same x value and the y' value on the straight line, which is called residual.

Add the sum of squares of all residuals and try to minimize this sum of squares of residuals.

Find the partial derivatives of slope and intercept respectively.

Solve a binary linear equation group.

Similarly, we can also use Excel to do it. The method is relatively simple:

0 1 Paste data

Insert scatter plot

03 Add Element-Trend Line

04 Trend Line Settings-Display Formula

The following is the legend of sleep data analysis, because it is not the focus of this article, so I will not elaborate on it for the time being.

Finally, summarize the steps to find out the correlation through statistics:

First, we need to collect a large number of sample data;

Then, curve fitting (linear regression) is carried out;

Finally, determine the correlation types (positive/negative correlation, straight line/curve correlation, complete/strong/weak correlation, etc.). )

Now, we have come to three conclusions:

How does knowing this help our daily life?

Pay attention to the words you or others use to express cause and effect: because, so, so, that is, only, definitely and so on. Doing so can effectively detect a person's speech and even thinking mode.

I found that when I respond to others, I am used to starting with "so", but there is no causal relationship at all. It's just a pattern of my behavior-I'm used to summing up and speculating on other people's ideas.

When we think we have found some laws, we can try to think about the cause and effect in reverse.

When it comes to racial discrimination, A Brief History of Mankind says that people always think that some biological difference leads to the performance of all kinds of "inferior people" in black people. But in fact, the reason why black people behaved like that at that time was because they had never received a good education and their living environment was always bad.

Kai-fu Lee also said in Born to Die that if the disease is not the "fruit" brought by bad behavior, but the "cause" of what God wants him to know, then the disease is not a curse but a blessing.

As mentioned in the previous example, people on the plains always thought that the boiling point of water was 100 degrees Celsius, and only when they met the plateau did they know that the boiling point of water would decrease with the elevation.

Cross-border, interdisciplinary, multi-perspective and multi-dimension are all hot words in recent days, not only because this research method is comprehensive enough to see the truth better, but also because it is easier to innovate.

Listening to books online and making friends in the community are all good ways to gain an interdisciplinary perspective.

Even if some conclusions are drawn through research, practice and investigation, these conclusions are only assumptions.

The world is always changing, and these changes generally exist in people and the environment, so these conclusions are staged. I have always believed that there is no truth in the world, and now it seems to be a wrong judgment. If we add the dimension of time to our thinking, in fact, I can only express it this way-there is no always correct truth in the world, but there can be the best and most useful truth now.

How can we constantly update ourselves?

This is why today we all say that we want to be lifelong learners.

Finally, tell a story I read in the newspaper a long time ago:

The story ends here. If you read the article here, I believe you can put forward different views on this story. Please discuss with me in the comments section.