Current location - Education and Training Encyclopedia - Graduation thesis - The method of evaluating the applicability of regression model
The method of evaluating the applicability of regression model
Life is usually simple when you only know one or two skills. If the result is continuous, use linear regression; If the result is binary, use logistic regression! However, the more choices, the more difficult it is to choose the right answer. A similar situation also occurs in the choice of regression model.

1. In various types of regression models, it is very important to choose the most appropriate technology according to the types of independent variables and dependent variables, data dimensions and other essential characteristics of data. The following are some suggestions on how to choose the appropriate regression model:

(1) Data mining is an indispensable link in establishing forecasting model. This should be the first step in choosing the correct model, such as determining the relationship and influence of variables.

(2) Comparing the fitting degree of different models, we can analyze their different index parameters, such as statistical parameters, R-squared, adjusted R-squared, AIC, BIC and error terms, and the other is the Cp criterion of Mallows. Check the possible deviation of the model by comparing (or carefully selecting) the model with all possible sub-models.

(3) Cross-validation is the best method to evaluate the forecasting model. You can divide the data set into two groups (training set and verification set). By measuring the simple mean square deviation between the observed value and the predicted value, the measurement of prediction accuracy can be given.

(4) If the data set has multiple mixed variables, don't use the automatic model selection method, because you don't want to put these mixed variables into the model at the same time.

(5) It also depends on your goal. Compared with the highly statistical model, the simple model is easier to realize.

(6) Regression regularization methods (LasSo, Ridge and ElasticNet) are effective when the data set is high-dimensional and the independent variables are multiple * * * linear.

2. What is regression analysis? Regression analysis is a method of predictive modeling technology, which studies the relationship between dependent variable (target) and independent variable (predicted value). This technique is used for forecasting, time series modeling and finding causality between variables.

3. What are the regression types?

(1) linear regression

Linear regression is the most widely known modeling technology, and it is one of the first choices when people learn how to predict the model. In this technique, the dependent variable is continuous and the independent variable can be continuous or discrete. The essence of regression is linear.

Linear regression establishes the relationship between the dependent variable (y) and one or more independent variables (x) by using the best-fit straight line (also called regression line).

The expression is: Y=a+b*X+e, where a is the intercept of the straight line, b is the slope of the straight line, and e is the error term. If the independent variable x is given, the predicted value, that is, the dependent variable y, can be calculated through this linear regression expression.

(2) Logistic regression is used to calculate the probability of success or failure. When the dependent variable is binary (0/ 1, true/false, yes/no), logical regression should be used. The value range of y here is [0, 1], which can be expressed by the following equation.

Where p is the probability of an event. You may ask, "Why use logarithm in the equation?"

Because we use binomial distribution (dependent variable) here, we need to choose an appropriate activation function to map the output to [0, 1], and the Logit function meets the requirements. In the above equation, the best parameters are obtained by using maximum likelihood estimation instead of linear regression to minimize the square error.