The function of virtual variables is simply "data classifier", which represents a specific state with variables and linear combinations of variables.
The easiest thing to understand is the "virtual variable trap". The first is the definition: if each qualitative factor in the model has m mutually exclusive types and the model has intercept terms, only m- 1 dummy variables can be introduced into the model, otherwise there will be completely multiple * * * linearity, which is called dummy variable regression.
We can understand this problem from two angles:
1. Rational perspective: Focus on why there is "multiple * * * linearity", which is easy to understand with the knowledge of linear algebra. Post a netizen's answer on NPC Economic Forum:
One more thing to note: if the model contains multiple qualitative variables, and each qualitative variable has multiple classifications, the virtual variables introduced into the model will consume a lot of degrees of freedom, so the number of virtual variables entering the model should be weighed to avoid exceeding the number of sample observations. However, it should not be easy for the number of dummy variables to exceed the number of sample observations. . . 0.0
Summarize so much for the time being, and add some places to continue learning later ~