Current location - Education and Training Encyclopedia - Graduation thesis - Seeking Foreign Language Translation by Linear Regression or Stepwise Regression
Seeking Foreign Language Translation by Linear Regression or Stepwise Regression
Linear regression

From Wikipedia, a free encyclopedia.

Jump to: Navigation, Search

In statistics, linear regression is used for two things;

Construct a simple formula to predict the value of one variable given the value of another.

Tests whether and how a given variable is related to another variable or variables.

Note: Correlation does not mean causality.

Linear regression is a form of regression analysis, in which the relationship between one or more independent variables and another variable (called dependent variable) is modeled by least square function (called linear regression equation). This function is a linear combination of one or more model parameters, which is called regression coefficient. When the predicted value (that is, the dependent variable in the regression equation) is plotted against the independent variable, the linear regression equation with one independent variable represents a straight line: this is called simple linear regression. But note that "linearity" does not refer to this straight line, but refers to the way the regression coefficient appears in the regression equation. These results are subject to statistical analysis.

An example of linear regression of independent variables. Content [hide]

1 Introduction

1. 1 theoretical model

1.2 data and estimates

1.3 classical hypothesis

2 Least square analysis

2. 1 least squares estimation

2.2 Regression inference

2.2. 1 univariate linear case

2.3 Difference analysis

Three examples

4 Test the results of regression model

4. 1 Check model assumptions

4.2 Evaluation of goodness of fit

5 Other procedures

5. 1 generalized least squares

5.2 Variable error model

5.3 Generalized linear model

5.4 Robust regression

5.5 Tool variables and related methods

6 Application of linear regression

6. 1 trend line

6.2 Epidemiology

6.3 Finance

6.4 Environmental Science

7 See also

Eight notes

9 references

10 external link

[Edit] Introduction

[edit] theoretical model

The linear regression model assumes that given a random sample, there may be an incomplete relationship among Yi, regression variables and regression variables. The interference term is also a random variable, which is added to this hypothetical relationship to capture the influence of all other factors except Yi. Therefore, the multivariate linear regression model takes the following form:

Note that regression variables are also called independent variables, exogenous variables, covariates, input variables or predictive variables. Similarly, regression variables are also called dependent variables, response variables, measured variables or predictive variables.

Models that do not conform to this specification can be treated by nonlinear regression. The linear regression model need not be a linear function of independent variables: linearity in this paper means that the conditional mean of Yi is linear in parameter β. For example, the model is linear in the parameters β 1 and β2, but not linear in the nonlinear function of Xi. The following example shows an example of this model.

[Edit] Data and Estimates

It is important to distinguish the models represented by random variables from the observed values of these random variables. Generally, the observed values or data represented by lowercase letters consist of n values.

Generally speaking, there are p+1 parameters to be determined. In order to estimate parameters, it is usually useful to use matrix symbols.

Where y is the column vector, including the observed values, including the unobserved random components, and the matrix X is the observed values of regression.

X usually includes a constant sequence, that is, a column that does not change with the observed value, and is used to represent the intercept term β0.

If there is any linear correlation between the columns of x, the vector of parameter β cannot be estimated by least square method unless β is constrained, for example, by requiring the sum of some of its components to be 0. However, in this case, some linear combinations of β components are still only estimable. For example, this model cannot solve β 1 and β2 independently because the rank of the observation matrix is 2. In this case, the model can be rewritten as and solved to give the value of the composite entity β 1+2β2.

Please note that only the least square estimation of is performed, and it is not necessary to treat the sample as a random variable. As we have done so far, it may be simpler in concept to regard samples as fixed and observable values. However, under the background of hypothesis testing and confidence interval, it is necessary to interpret samples as random variables, which will produce estimators that are also random variables. Then it is possible to study the distribution of estimators and make inferences.

[Edit] Classical Hypothesis

The classical assumptions of linear regression include: the sample is randomly selected from the population of interest, the dependent variable is continuous on the real straight line, and the error term obeys the same and independent normal distribution, that is, the error is the same distribution and Gaussian distribution. Note that these assumptions mean that the error term is statistically independent of the value of the independent variable, that is, statistically independent of the predicted variable. Unless otherwise specified, these assumptions are used in this paper. Note that all these assumptions can be relaxed, depending on the nature of the real probability model of the problem at hand. The problem of choosing which assumptions to relax, which function form to adopt and other choices related to the underlying probability model is called canonical search. In particular, it is meaningless to assume that the error term is normally distributed unless the sample is very small, because the central limit theorem means that as long as the error term has finite variance and the correlation is not too strong, even if the potential error is not normally distributed, the parameter estimation will be approximately normally distributed.

Under these assumptions, the equivalent formula of simple linear regression (which explicitly shows linear regression as a model of conditional expectation) can be given as

The given easy conditional expectation value is an affine function. Note that this expression is based on the assumption that the mean value of is zero on Xi.