Statistical modeling is a process of exploring and processing batch data by using computer statistical analysis software as a tool and using various statistical analysis methods, which is used to reveal the factors behind the data, interpret social and economic phenomena, or make predictions or judgments on economic and social development. With the rapid popularization and extensive development of computer and network technology, we are facing the challenge of data and information explosion. How to quickly and effectively upgrade data to information, knowledge and intelligence is an important topic for statisticians. Statistical modeling perfectly combines statistical methods with computer technology, drives statistical thinking oriented to data analysis, discovers and excavates the laws behind data, and provides better and more statistical information for economic and social development.
Competition topics generally come from practical problems that have been properly simplified in social, economic and management sciences. Students are not required to master in-depth professional knowledge in advance, but only need to learn the basic contents of statistics, master statistical analysis methods skillfully, and have certain statistical work experience. The topic has great flexibility, allowing participants to exert their creative ability. Participants should complete a paper (that is, an answer sheet) including the hypothesis of the model, the establishment and solution, the design and computer realization of the calculation method, the analysis and test of the results, and the improvement of the model. Competition awards are based on the rationality of assumptions, the creativity of modeling, the correctness of results and the clarity of text expression.
Let's take a look at what statistical modeling is from the following examples.
Case: What conclusions can be drawn from the traffic accident data?
Basic data: traffic accident data of various provinces, municipalities and autonomous regions since the reform and opening up. Data should include motor vehicles (freight, buses, automobiles, agricultural vehicles, tractors, motorcycles and construction vehicles, etc.). ), non-motor vehicles (bicycles, tricycles), others (such as electric bicycles and motor tricycles, although it may be illegal), disabled vehicles, animal-drawn vehicles, pedestrians, etc. ; The data should also include the accident level, the number of accidents, the number of deaths, property losses, the number of injured people, etc. Occupation, age, driving experience, education level, and whether to drink and drive (very important! ), whether you are tired of driving, whether you are using a mobile phone, speed, road conditions (streets, ordinary highways, graded highways, expressways), and accident time periods. These are the standard records of the traffic control department. The data should cover at least 10 years (monthly data is preferred).
Supplementary data: the economic data of each province, city and autonomous region in the corresponding year, including various road mileage and the number of various motor vehicles.
Question:
1. Find out the probability (and influencing factors) of various accidents of various vehicles and the influencing variables of the number of these accidents (such as age, whether drinking, mountainous area or downtown area, time period, what kind of roads, vehicle types, etc.). ).
2. Find out which factors (variables) are most likely to cause accidents, which factors (variables) are most likely to cause major personal injuries, and which factors (variables) cause the greatest property losses.
3. Find out the accident characteristics of all provinces, municipalities and autonomous regions, classify them according to the accident mode, and compare them according to the economic classification. Explain the relationship between traffic accidents and economic development.
4. Find out the trends of local and national accidents and the relationship between these trends and economy (including road mileage, number of motor vehicles, etc.). ). And predict future accidents.
5. Rank the provinces, municipalities and autonomous regions according to various variables related to traffic accidents.
Requirements: Everything is based on data. Any statistical method adopted should explain the conditions and assumptions. The results of any output should be explained and explained.
According to the above cases, it is not difficult to form such a judgment: in a certain sense, statistical modeling is a propositional composition with the following characteristics:
First, statistical modeling starts from the actual situation of economic and social development and finds out the development trend and law of things. Without this point, statistical modeling will lose its meaning.
Second, statistical modeling starts from data, finds out the relationship between data and speaks with data, which is the biggest feature of statistical modeling.
Thirdly, statistical modeling effectively combines statistical analysis methods with computer technology, including collecting data and analyzing the data with statistical analysis software.
Fourthly, statistical modeling involves many aspects, such as data collection, collation and analysis, which requires the comprehensive ability of the modeler.
Second, the process of statistical modeling
(1) Clarify the problem. Statistical modeling emphasizes problem orientation, therefore, the problems that need to be solved must be clarified first.
(2) Collecting information: On the basis of clarifying the problem, according to the requirements of the topic, collect and sort out all kinds of necessary information from the available database.
(3) Model hypothesis: Make necessary and reasonable assumptions about the problem by using statistical analysis methods, so that the main features of the problem are highlighted and the secondary aspects of the problem are ignored.
(d) Model building: according to the assumptions made and the relationship between things, build the relationship between various quantities, turn the problem into a statistical analysis problem, and pay attention to adopting appropriate statistical analysis models and methods as much as possible.
(5) Model solution: using the established model to calculate and get some information related to the problem. If necessary, you can further simplify the problem or make further assumptions.
(6) Model analysis: analyze the obtained information to form a judgment, and pay special attention to whether the obtained results are stable when the data changes.
(7) Result test: analyze the actual meaning of the obtained results and compare them with the actual situation to see if they are in line with the reality. If they are not ideal, we should modify, supplement the assumptions or re-model.
(8) Writing a paper: A paper formed on the above basis should include the explanation of the problem, the description of the hypothesis, the process of model construction, the solution results of the model, the main conclusions and the evaluation of the conclusions.
Third, the basic content of statistical modeling papers
The paper submitted should include three parts:
(1) Title and summary section
Theme-write a more accurate theme
-200-300 words, including the main features, modeling methods and main results of the model.
(2) The main part
1. Ask questions and analyze them.
2. Model construction:
(1) put forward assumptions, define concepts and introduce parameters;
(2) model building;
(3) Model solving.
3. Design of calculation method and computer realization.
4. Main conclusions or findings.
5. Result analysis and testing.
6. Discussion-the advantages and disadvantages of the model and the significance of the results.
7. References.
(3) Appendix
Calculation program, block diagram.
Various processes of solving calculus and intermediate results of calculation.
Various graphs and tables.
The so-called difficult is not difficult, easy is not difficult, and every exact standard is not easy to judge. However, what is certain is that it is easy to learn, but it is difficult not to learn. I hope to encourage it.