Current location - Education and Training Encyclopedia - Graduation thesis - How to get regression results from spss regression analysis results
How to get regression results from spss regression analysis results
How to report the results of regression analysis

The results of regression analysis can be divided into the following parts: 1) regression model; 2) Regression coefficient; 3) Characteristics of dependent and independent variables; 4) The relationship between independent variables. Where 1 and 2 are the basic information that must be reported in detail; And 3 and 4 can be different auxiliary information according to the specific situation. The following are discussed separately.

How to describe regression model and regression coefficient

Let me briefly talk about unitary regression. Unary regression, that is, only one independent variable (such as x) is involved. This model is rare in social sciences (a common exception is the long-term trend of dependent variables in time series analysis) and easy to report. Generally, you don't need a form, just write a sentence (such as "b =? ,std =? ,Beta =?” ) or give a formula (such as "Y =? + ? B, where std =? ,Beta =?” ) is enough. If there are multiple univariate regression analysis in a study, it should also be possible to report (participate in? ), so that readers can compare models.

Next, I want to talk about multiple regression. Because many parameters are involved, some must be reported, some are discretionary, and some are completely unnecessary. For the convenience of explanation, I made a list (table 1) of how to report regression models and regression coefficients according to the output results of SPSS regression analysis (similar to other statistical software). As shown in the table, I have divided various parameters into four categories: required report, recommended report, generally unnecessary, and completely unnecessary. My classification standard comes from the four aspects involved in the accepted hypothesis test, namely, the significance, strength, direction and form of the relationship between variables (see the article "Four Questions that must be considered when explaining the relationship between variables" for details). That is to say, the selection of each parameter should and can depend on whether it provides non-repetitive significance (i.e. Sig), intensity (value of b or β), direction (sign of b or β) and form (transformation of independent variables).

Table 1. How to report a list of regression models and regression coefficients

Pay attention to whether and how the source of SPSS results is reported.

Regression model part

The model summary table of the composite correlation coefficient of R dependent variable and all independent variables is completely unnecessary?

R SquareR's square model summary table is generally not needed.

The summary table of the adjusted R-squared correction model must be reported in Table 2.

See Table 2 for suggestions on the standard deviation of the summary table of the estimated dependent variable (note 1).

The variance analysis table of sum of squares and total deviation is completely unnecessary?

Df free variance analysis table is completely unnecessary?

Mean variance mean variance analysis table is completely unnecessary?

F-model F-value ANOVA table is generally not needed?

Importance level of Sig. The F-value ANOVA table must be reported in Table 2.

The number of cases in the ANOVA table of N model (Note 2) must be reported in Table 2.

Regression coefficient part

Non-standardized coefficient (b) Non-standardized coefficient must be reported in Table 2.

Standard error coefficient table of non-standardized coefficient (STD. Error b must be reported in table 2.

Standardization coefficient (β) The standardization coefficient table must be reported in Table 2.

T = B/ standard. Error coefficient table? ?

The valid level of Sig.t value. The coefficient table must be recorded in Table 2.

B's 95% confidence interval (lower limit) B's confidence interval (lower limit) table (Note 3) The recommended report is shown in Table 2.

B's 95% confidence interval (upper limit) B's confidence interval (upper limit) table (Note 3) The recommended report is shown in Table 2.

Note 1: The standard deviation of the predicted value of the dependent variable describes the accuracy of the model. For example, the dependent variable in Table 2 is the current annual salary, and its forecast error is? That is, if we use this model (including three independent variables: starting salary, length of service and gender) to predict the annual salary of employees in enterprises under the same conditions, we can know? . This kind of information can't be known from other parameters of the model, such as R-squared or its correction value, significance level and B or Beta of their respective variables.

Note 2: If the dependent variable and all independent variables have no default values, then the number of cases in the model is equal to the number of samples. However, variables often have default values, so the number of cases in the model will be less than the number of samples, and sometimes there is a big difference between them (of course, it is a serious problem), so we must report the former. SPSS does not display this information directly, but it is easy to calculate, which is equal to the total df+ 1 in the ANOVA table. Regression statistics

Note 3: The confidence interval of B is another tool to test the significance level of B (if there is 0 between the upper and lower limits, it means that B is not significant at 95% level) to make up for the deficiency of T test and its Sig value. This is a classic and complicated problem, which is called null hypothesis significance test (NHST). This article will not discuss it in detail. Interested readers can refer to the relevant web pages (R. C. Fraley Dennis. SPSS does not directly give the confidence interval of B, so it needs to be supplemented in the "statistics" item. As shown on the right, in the output of SPSS regression analysis, only "estimation" and "model fitting" are displayed by default (that is, other parameters in the table 1 except the confidence interval will be generated). It is suggested to increase the "confidence interval".

Now use an example to demonstrate how to report the results of regression analysis. In order to make it easier for you to repeat this example, the data I use is world95.sav that comes with SPSS. This is 1995 "national conditions" data of 9 countries or regions published by UNESCO (or the World Bank and other institutions) in the world, including 26 indicators such as population, geography, economy, society and culture. I take birth_rt (the birth rate per 65,438+0,000 people) as the dependent variable, and gpd_car (per capita GDP), urban (urbanization, that is, the ratio of urban population to population), literacy (that is, the ratio of readers to population) and calories (daily calorie intake) as the independent variables. According to the principle of table 1, I reported the results of this regression analysis in table 2:

[Reprint] How to report the results of regression analysis

Due to the limitation of space and the purpose of this paper, I will not explain the parameters in Table 2. But I want to make some supplementary explanations about the format in the table.

How to add a title to a table: generally, you only need to describe the contents in the table. So, what is the content of this form? It is the result of the birth rate returning to four independent variables. The four independent variables are described in detail in the table, so there is no need to repeat them in the table title.

How to describe variables (including dependent variables and independent variables): I first give the theoretical concept name of each variable (in English if necessary), and then indicate its corresponding SPSS variable name in brackets (this is not necessary, just to facilitate the comparison of SPSS data at hand) and operation definition (it is necessary and highly recommended, from which readers can see whether the variables have been transformed, so as to know the form of the relationship, that is, linear or nonlinear). Why describe variables in detail? The APA manual has a basic principle of "independent information" on how to make tables or charts of various quantitative analysis results, that is, each chart should contain basic information so that readers can read the charts independently without consulting the text. Therefore, simply pasting the output of SPSS is the most common practice, but it is a bad habit.

Do I need to quote a constant? Yes Constants play a very important role in explaining the practical social significance of regression models. For example, the constant in this table is 65.444, which means that the average birth rate in the world (74 countries or regions) (that is, after controlling the influence of four independent variables) is 65.4‰, and so on. It should be noted that in the output results of SPSS, constants are placed in the first line. Should be moved after other independent variables.

Which regression coefficient to report (normalized or non-normalized coefficient): This is the most common question. In the past, there was a dispute between "predictor" and "interpreter". The former advocates that it is enough to report B, while the latter thinks that it is enough to report Beta. In fact, the two reflect different information, and B is not affected by the variability of dependent variables, so B of the same independent variable in each regression model can be compared (this is a problem that many theoretical hypotheses need to be tested); However, Beta can't cross this model because of the variation of dependent variables, but it can be compared with other betas in the same model because of its standardization (there are also many theoretical assumptions to solve this problem). Therefore, the APA Manual suggests that both should be reported at the same time (English fifth edition pp.160-161).

Take a few decimal places: APA manual thinks that it is enough to keep only two decimal places in the general quantitative analysis results. For regression results, it is most appropriate to take two decimal places for standardized parameters such as β, R2 value and significance level (that is, their values are between 0 and 1). B and its related indicators (standard error, confidence interval) are nonstandard (that is, the value can be arbitrarily large or small), so it should be decided as appropriate to take more, less or even no decimal points according to the scale of variables (that is, the range of values). Generally speaking, when the scale of the independent variable is larger than that of the dependent variable, its b will take a smaller value, so it is necessary to take one or more decimal places; On the contrary, when the scale of the independent variable is smaller than that of the dependent variable, its b will take a larger value, so it can take fewer decimals or even no decimals. In this case, the scales of GDP and calories are much larger than the birth rate, so their B values seem small (but it does not necessarily mean that the impact is small). So, I didn't mechanically take only two decimal places. If you look at Table 2 carefully, you will find that my "caution" rule is "take the last two digits of 0", such as -0.00042,-0.033, -0.034, -0.004 1, which is consistent with the basic spirit of the principle of "take two decimal places" in APA manual. The main problem we see every day is to keep too many decimal points, which is often caused by pasting the results of SPSS directly (its default is 6 decimal places) without editing.

Whether there are horizontal and vertical dividing lines in the table: According to APA regulations, none of them are used except three horizontal lines above and below the table and below the column titles. Many people simply copy the default rows of Word tables without making any changes. The judges can tell at a glance that they are "novices" or lazy people.

What is p? It's Sig in SPSS output. P is a common symbol in all statistical textbooks, and Sig is only for SPSS. The former has been more widely recognized.

How to report the multiple regression model? The above is how to report the results of a regression model. In fact, a study (that is, a paper) often involves several regression models. Some authors like to make a regression result table similar to Table 2 for each regression. This method has two problems: first, it takes up too much space, and second, it is not conducive to the comparison of various models. Generally speaking, the results of parallel (that is, all independent variables are the same) or intersecting (that is, some independent variables are the same) regression models should also be placed in the same table. We still use the data of world95, and then regress the mortality and AIDS incidence respectively, and then put the results of the three models in Table 3:

The main difference between Table 3 and Table 2 is that Table 2 is horizontal (each column is the same parameter) and Table 3 is vertical (each column is the same model). In Table 2, six kinds of horizontal parameters are changed into four vertical rows (where the p value is replaced by an asterisk and the upper and lower limits of the confidence interval are combined in one row), so that readers can make a horizontal comparison (this is a basic principle for making all tables of quantitative analysis results). If it is an English report, table 3 will be much simpler after removing Chinese.

How to report the relationship between variable characteristics and independent variables

As mentioned above, the characteristics of dependent variables and independent variables and the correlation between independent variables are auxiliary information that needs to be considered as appropriate. Since this article is already very long, let's talk about it briefly. Variable features mainly refer to

Operational definition of variables (original questionnaire)

Value range (such as 0- 100, 0- 1, 0 or 1, 1-5, 1-7, etc. ); Curious, if the data has been converted into logarithm, square, root and reciprocal, it should also be the most suitable for reporting here)

Descriptive statistics (mean, standard deviation, skewness, kurtosis, etc. )

A recommended method is to list the above characteristics of all variables into a table (Table 4) and put it in the appendix of the paper for interested readers to consult (similar technical details can generally be put in the appendix). ?

-

Analysis of SPSS regression results

How to explain the regression results of writing papers?

Answer:

If you look at the judgment coefficient r-square, in this example, r-square =0.202, and the goodness of fit is very poor. It is generally better to be above 0.6, or at least above 0.4.

Second, look at the sig value of the coefficient estimator. Among them, the sig value of the scale of independent directors sig=0.007, which is less than 0.05, indicating that the variable has a significant impact on the dependent variable, while the general manager's shareholding is not significant, because the sig value is greater than 0.05.

The reason why the model is not good is that you ignore the important influencing factors.

But if we only pay attention to the influence of these two independent variables on the dependent variable, then the conclusion will come out. The goal has been achieved, so it is meaningful.

Statistician Liu Deyi

Answer:

Yes, if it is an independent variable, it is a virtual variable model. As long as a sig is less than 0.05, the model can be said to be effective.

Q:

For example, whether the chairman is also the general manager is 1, otherwise it is 0. Can such data be regressed? From which value can we see that this model is effective? PS。 R seems to be 0.04 1?

Answer:

Generally speaking, it is like this. Only when there is linear correlation can we make a linear regression model.