Current location - Education and Training Encyclopedia - Graduation thesis - How to control industry and year variables
How to control industry and year variables
Generally, it refers to multiple regression in which annual variables and industry variables are added to the regression.

X variables in multivariate regression analysis are generally divided into two types: explanatory variables and control variables. Explanatory variables are often the variables that the author wants to pay attention to, while control variables can also affect Y variables and X variables, but they are not the variables that the author needs to study, but they must also be considered for the rigor of research.

For example, if my Y variable is "salary", my research hopes to explore how the "gender variable" affects "salary", but there are two problems here: first, salary changes with time, and the salary in the past was higher because of the low economic development, and the proportion of men and women participating in work also changed with time. Now women work more; Second, according to different industries, the distribution of wages between men and women is often different. Therefore, if we do not control the two variables of "year" and "industry", many conclusions are impossible. For example, I don't control the industry and year, and come to the conclusion that "women's wages are low and women are discriminated against". Some people may retort that women's industries tend to pay attention to stability, and their wages are low. Women prefer stability, so there is not discrimination against women in wages, but different industries. So in order to really find out whether women are discriminated against, I need to control this industry. For example, is there a statistical gap between men's and women's wages in the teaching profession? The financial industry? ……

So how to control the industry? If only one industry is good, but there are many industries, it is very inefficient for us to do univariate regression by changing databases one by one. At this point, we use a feature of multiple regression. The meaning of each coefficient is: "Control other explanatory variables added to multiple regression unchanged, and this explanatory variable changes 1 unit, then the coefficient of Y changes 1 unit". Therefore, we will directly consider industry variables (the processing of industry variables is often n lines).