The error line mainly represents the error (or uncertainty) range of each data point of the data, shows the potential error or the degree of uncertainty relative to each data mark in the series, and presents the data in a more accurate way.
Whether using standard deviation (SD), standard error (SE) or confidence interval (CI), it is acceptable to write and publish scientific papers. The key point is to make it clear which one it belongs to in the paper.
So what's the difference between these different representations and calculation processes?
Standard deviation
The standard deviation is the square root of the arithmetic mean deviation from the mean square, and is expressed by σ. The formula for calculating the standard deviation (population) is as follows:
In the formula, the average (arithmetic average) of X 1, X2, X3, ... ……XN (all real numbers) is μ, and the standard deviation is σ.
note:
In some formulas, the degree of freedom (N- 1) is often divided by the root sign instead of n, mainly because:
If it is a population (that is, population standard deviation), divide by n in the root sign and express it with σ (corresponding to Excel function: stdevp);
In the case of sampling (that is, the standard deviation of the sample), the root sign is divided by (N- 1) and expressed by S (corresponding to Excel function: stdev);
Because of the large sample size, we usually divide by (N- 1) in the root number. Therefore, in sampling statistics, the calculation formula is:
In fact, since population standard deviation is unknown, the standard deviation of samples is usually used to estimate population standard deviation. Therefore, the range of error lines can be expressed as the following two types:
standard error
The standard error, also known as the root mean square error, is calculated as follows:
σ stands for population standard deviation, and n is the number of samples. When the population standard deviation is unknown, the standard deviation of the sample is used to estimate:
The range of the error line can also be expressed as:
Confidence interval
Refers to the estimation interval of the overall parameters constructed by sample statistics, which involves interval estimation (point estimation). The calculation of confidence interval needs to be estimated according to whether σ is known or not and the sample size is different.
1, σ (population standard deviation) is known or unknown, but it is a large sample (generally, the sample size is greater than or equal to 30), and it is considered that the sample mean approximately obeys the normal distribution:
So as to perform interval estimation, thereby obtaining:
σ (population standard deviation) is known, and the standard deviation can be directly calculated. If unknown, the population standard deviation is estimated by the sample standard deviation s before calculation:
Z* can be obtained by normal distribution test and table lookup. Different confidence values are different. Common confidence (c) data are as follows (both sides):
For example, the commonly used confidence interval of 0.95 is 1.96 times standard deviation (the range of error line is as follows):
2, σ (population standard deviation) is unknown, and it is a small sample (the sample size is generally less than 30, and many biological research experiments are often less than 30 and σ is unknown), then choose T distribution:
T* is also obtained by t- test look-up table, and its value is related to confidence and freedom (n- 1) (applicable to both sides here):
Take chestnuts for example. Generally, the repeated measurement data of three organisms is n=3 (degree of freedom is 2), and the confidence interval of 0.95 corresponds to t* of 4.303, which is 4.303 times of the standard error (error line range):