Current location - Education and Training Encyclopedia - Graduation thesis - Econometric problems
Econometric problems
I don't know since when answering measurement questions has become a part of my daily life. Readers from all over the world put forward various measurement questions to their colleagues. The following are some typical questions, which I hope will be helpful to the friends who are engaged in empirical research.

1, under what circumstances should the variables before regression be logarithmic?

A: The following situations can be considered.

If the variables in the theoretical model are logarithmic, they should be logarithmic. For example, in labor economics, the decisive factor of the return on education investment is usually the logarithm of wages as an explanatory variable, because it is derived from Mincer model.

Second, if variables have an exponential growth trend, such as GDP, which is generally logarithmic, lnGDP becomes a linear growth trend.

Thirdly, if logarithm can improve the goodness of fit of regression model (such as R2 or significance), consider logarithm.

Fourthly, if we want to interpret the regression coefficient as elastic or semi-elastic (that is, percentage change), we can take the logarithm of the variable.

Fifth, if you are not sure whether you should take logarithm, both cases can be estimated as robustness tests. If the regression results are similar, the results are robust.

2. How to understand the economic significance of cross-term coefficient in linear regression model?

A: In the linear regression model, if there is no nonlinear term such as interaction term or square term, the regression coefficient of a variable indicates the marginal effect of the variable. For example, consider the regression equation.

y = 1 + 2x + u

Where u is a random perturbation term. Obviously, the marginal effect of the variable X on Y is 2, that is, X increases by one unit and Y increases by two units on average. Consider adding interactive items to the model, such as

y = α + βx + γz + δxz+ u

Where x and z are explanatory variables, and xz is their interactive term (cross term). Because of the interaction term, the marginal effect of X on Y is β+δz, which means that the marginal effect of X on Y is not constant, but depends on the value of another variable Z. If the interaction coefficient δ is positive, the marginal effect of X on Y will increase with the increase of Z (for example, the marginal output of labor is dependent on capital). On the other hand, if δ is negative, the marginal effect of X on Y will decrease with the increase of Z.

3. It can be seen in some journals that control variables are introduced into the regression model. What role does the control variable play and how should it be determined?

A: In research, there are usually variables of major concern, and their coefficients are called "parameters of interest". However, if only the variables of major concern are regressed (in extreme cases, unitary regression), it is easy to appear the deviation of missing variables, that is, the missing variables are related to explanatory variables. The main purpose of adding control variables is to avoid the deviation of missing variables as much as possible, so it should include the main factors that affect the explained variable y (but it is allowed to miss variables unrelated to the explained variable).

4. There is a section of "Robustness Test" in many literatures. Does every empirical study have to do this? How to operate specifically?

A: If your paper only reports a regression result, it is hard for others to believe you. Therefore, we need to do more regression, that is, robustness check. Papers that have not been published by robustness testing are difficult to be published in good journals because they are unconvincing. Robustness testing methods include transforming function form, dividing sub-samples, using different measurement methods, etc. Please refer to my textbook. More importantly, learn from the classic literature in the same field and imitate its robustness test.

5. Do panel data have to weigh the fixed effect and time effect? Or can I return directly? I have read a lot of literature, some explain the reasons for using the fixed effect model, and some directly return the results. What is the correct method?

A: Standard practice requires houseman test to choose between fixed effect and random effect. However, because the fixed effect is universal and the fixed effect model is always consistent (the random effect model may be inconsistent), some researchers directly estimate the fixed effect.

At the same time, time effects are also considered, such as adding time dummy variables or time trend items; Unless tested, it is found that there is no time effect. If you don't consider the time effect, your results may not be credible (maybe the correlation between X and Y is only because both of them grow with time).

6. How to decide whether to use two-stage least squares (2SLS) or generalized moment estimation (GMM)?

Answer: If the model is accurately identified (that is, the number of instrumental variables is equal to the number of endogenous variables), GMM is completely equivalent to 2SLS, so 2SLS is enough. In the case of over-identification (instrumental variables are more than endogenous variables), GMM has the advantage of being more effective than 2sl in the case of heteroscedasticity. Because there is a little heteroscedasticity in the data, GMM is generally used in the case of over-identification.

7. In the panel data, the variable x of interest does not change with time. Can we only estimate the random effect (if we use the fixed effect, we can remove the key variable X which does not change with time)?

A: It is usually better to use the fixed effect model (of course, a formal houseman test can be conducted to determine whether to use the fixed effect model or the random effect model). If a fixed effect is used, there are two possible solutions:

(1) If the system GMM is used to estimate the dynamic panel model, the coefficient of the time-invariant variable X can be estimated.

(2) When using the static panel gaze effect model, we can introduce the interaction term between the time-invariant variable X and a time-varying variable Z, and take the interaction term xz (time-varying) as the key explanatory variable.