Current location - Education and Training Encyclopedia - Graduation thesis - What are the statistical methods commonly used in medical scientific research?
What are the statistical methods commonly used in medical scientific research?
The cool autumn wind also brought us the statistics of Professor Liu Ling. Let me talk about the fifth question. The choice of statistical methods in this issue must be carefully studied, maybe you will use it soon.

Editing language

For the commonly used basic statistical methods, generally speaking, T test, one-way ANOVA and Chi-square test are the statistical methods that people often encounter when writing and reading papers (almost every article will involve one or more of them), so what kind of statistical methods should be adopted? Let's talk about it today

First of all, before you get the data and start analyzing it, you must divide the data types (Figure 1), because different data types have different descriptions and different statistical methods.

Figure 1 statistical type

For example (table 1):

Table1health examination records of 735 elderly people over 65 years old in a certain place in 2002

Second, the statistical analysis of all kinds of data (description and statistical inference)

1. Measurement data

Features: there are quantitative differences between the observation values of each observation unit, and there are units;

Description form: "X S" is most commonly used (common in general literature), its average level is described by arithmetic mean, and its dispersion degree is described by standard deviation. If the data is "particularly abnormal" (especially the standard deviation is greater than the arithmetic mean), Md(P25, P75)(Md is the median, and P25 and P75 are quartiles) is adopted (Table 2). Please review the test of normal distribution: statistics of medical research (3): the test of normality and homogeneity of variance that you should know.

Table 2 Characteristics and application occasions of statistical indicators of commonly used measurement data

Statistical inference methods: generally divided into single factor and multi-factor.

The main points of single factor analysis are: 1. Defining the data type (measurement data); The second is to define the type of experimental design (completely random design? How many groups of samples? ); Third, pay attention to the application conditions of the methods used; The fourth is to use t-test when the homogeneity of normal variance is satisfied (note that there are three forms of t-test! ) or one-way ANOVA, and the rank sum test is used when it is not met (Figure 2).

Figure 2 Correct selection of statistical methods for measurement data

Remind two points:

① If the sample data does not obey the normal distribution, only nonparametric test (rank sum test) can be used, but its test efficiency is lower than parametric test (t test or variance analysis). The so-called low inspection efficiency means that there are differences in itself, but there is no ability to find differences.

(2) If it is the data of more than two groups of samples, T test cannot be used (it will increase the false positive error probability), and variance analysis should be used. If p

In the last two classes, we studied T-test (statistics of medical research) (2): Did you do your T-test correctly? ) and analysis of variance (statistics of medical research (4): the soul of statistical methods-analysis of variance), as for rank sum test, it will be introduced step by step in the future.

Multivariate analysis generally adopts regression analysis, mainly linear regression analysis. I will introduce this method to you later.

2. Counting data

Features: disorderly classification, there is no quantitative difference between observation units in the same category, but there are qualitative differences between categories, and each category is incompatible with each other. Among them, the second classification must be counting data (for example, the gender is only male/female, and whether a disease is secondary or not is only secondary), while the multi-classification is counting data (for example, marital status includes unmarried, married, divorced and widowed, which belongs to multiple classifications, but each classification has no difference in degree and grade, so it is counting data, and the qualitative detection results of urine sugar include-,+,+).

Description: "Number of cases (%)" is the most commonly used (common in general literature), mainly to distinguish the difference between composition ratio (relative number of structure) and ratio (relative number of strength) (Table 3). Moreover, in application, the denominator (that is, the sample size) cannot be too small, and the denominator is too small to reflect the objective facts of the data, which is also unstable.

Table 3 Characteristics and Application of Common Statistical Indicators of Counting Data

For example:

1. If there is a male and a female patient in a certain place, the sex ratio of local lung cancer patients is A/B.

2. A study * * * detected three kinds of pathogenic bacteria, the total number of strains is A+B+C, and the number of strains detected by one pathogenic bacteria is A, then A/(A+B+C) is the composition ratio, that is, the proportion or distribution of this pathogenic bacteria in the total pathogenic bacteria.

3. A study treats patients (the total number of cases is B), and the number of cured patients is A, then A/B is the rate (which can be understood as the cure rate).

Statistical inference methods: generally divided into single factor and multi-factor.

The main points of single factor analysis are: 1. Defining the data type (counting data); The second is to define the type of experimental design (completely random design? How many groups of samples? ); Third, pay attention to the application conditions of the methods used; Fourthly, multi-sample rate comparison, such as chi-square test of P.

Figure 3 Correct selection of statistical methods for counting data

Remind two points:

① The composition ratio is based on 100, and the sum of the proportions of each component must be 100%, so the increase or decrease of the proportion of one component will affect the proportion of other components;

② The composition ratio and the ratio are easily confused in practical application, and the main difference lies in the denominator, so we should choose the denominator correctly.

Multivariate analysis generally adopts regression analysis, mainly Logistic regression analysis, which will be introduced to you later.

3. Grade information

Features: It belongs to multi-classification data and meets the requirements of multi-classification with different degrees and grades in nature. The classification attributes are arranged in a certain order (orderly), which is the grade data.

Description form: "Number of cases (%)" is the most commonly used (common in general literature), which is basically the same as the description of counting data. The main difference is that multiple categories must be arranged in order (from small to large or from weak to strong).

Statistical inference method: nonparametric test (rank sum test) is used for statistical analysis of rank data in univariate analysis. Of course, for bidirectional ordered R×C data, that is to say, in the case that both grouping variables and ending variables are ordered (rank), Chi-square test is used for composition comparison, rank sum test is used for degree comparison, and rank correlation (also called rank correlation) is used for trend correlation comparison. Multivariate analysis used ordered Logistic regression.

Note: Classified variables (counting data and grade data) should be properly quantified (assigned) in software analysis operation, and assignment will directly affect the interpretation of statistical analysis results.

Finally, the following figure is used to summarize the choice of basic statistical methods (Figure 4).

Figure 4 Correct selection of commonly used basic statistical methods

Let's call it a day. Students should review more. If you have any questions or do not understand, please leave a message below. We will ask Professor Liu Ling to answer these questions one by one. Ok, let's look forward to the next issue!

Author: Liu Ling Contract Editor: Liu Qin

Typesetting: Bi Li Audit: Dong Wang

Expert introduction

Liu Ling: Associate Professor, Department of Health Statistics, Army Military Medical University, mainly engaged in teaching and research of health statistics. He used to be the 8th member of the Statistical Theory and Method Professional Committee of China Health Information Society, the vice chairman of the Chongqing Preventive Medicine and Health Statistics Professional Committee, and served as the editorial board member and peer reviewer of many magazines such as journal of third military medical university.

Historical recommendation

Talking about Statistics in Medical Research Class (IV): The Soul of Statistical Method —— Analysis of Variance

Talking about statistics in medical research class (3): What you should know about normality and variance homogeneity test

Statistics of medical research (2): Is the T-test correct?

Statistics of medical research (1): What is sample size estimation?