Let the variable y have a linear regression relationship with the variables X 1, X2, …, Xm, and the observed values of n samples are Yj, Xj 1, Xj2, …Xjm? (j = 1, 2, n), so the mathematical model of multiple linear regression can be written as:
The regression coefficients β0, β 1, …, βm in the above formula can be estimated by least square method, and the β value can be obtained, and then the multiple linear regression model can be used for prediction.
After calculating the multiple linear regression equation, in order to solve the practical prediction problem, mathematical test must be carried out. The mathematical test of multivariate linear regression analysis includes the significance test of regression equation and regression coefficient.
The significance test of regression equation adopts statistical method:
Where: is the sum of regression squares, and its degree of freedom is m; , is the sum of squares of residuals, and its degree of freedom is (n-m- 1).
After the F value is calculated by the above formula, it is tested by the F distribution table. Given the significance level α, the value Fα with degrees of freedom m and (n-m- 1) can be found in the f distribution table. If F≥Fα, y is closely related to X 1, X2, …, Xm. On the other hand, the linear relationship between them is not close.
The significance test of regression coefficient adopts statistical method:
Where Cii is the diagonal element of the correlation matrix c = a- 1
For a given confidence level α, look up the F distribution table and get fα (n-m- 1). If the calculated value Fi≥Fα, the original hypothesis is rejected, that is, Xi is considered as an important variable, otherwise, the Xi variable can be eliminated.
The residual standard deviation can be used for the accuracy of multivariate linear regression model.
To measure. The smaller s is, the more accurate it is to predict y by regression equation; or vice versa, Dallas to the auditorium