Current location - Education and Training Encyclopedia - Graduation thesis - Time series analysis based on SPSS (reprinted from a great god)
Time series analysis based on SPSS (reprinted from a great god)
Application background:

It is also the premise of scientific decision-making to make a reasonable prediction by analyzing the sequence, grasp the future development trend in advance, and provide the basis for business decision-making.

Time series analysis:

Time series is a set of data series arranged in chronological order.

Time series analysis is a statistical technique to discover the changing law of this group of data and use it for prediction.

Analytical tools:

Additional power supply device (abbreviation of Supplementary Power Supply Set)

Practical case: Predicting future data through historical data involves the simplest practice, with emphasis on methods. No matter how complicated the data is, the methods are the same.

If the monthly sales in previous years are known, forecast the future sales.

I introduction to time series analysis

Time series analysis has three basic characteristics:

Suppose that the development trend of things will extend into the future.

The data on which the forecast is based is irregular.

Regardless of the causal relationship of the development of things.

Not all time series necessarily contain four factors. For example, a poem in Time may not contain seasonal factors.

Four factors are usually combined in two ways:

The four factors are independent of each other, that is, the time series is directly superimposed by four factors, which can be expressed by an additive model. Y=T+S+C+I

Four factors influence each other. That is to say, the time series is the result of the synthesis of four factors, which can be expressed by a multiplication model: y = t * s * c * i.

Among them, the original time series value and long-term trend can be expressed by absolute numbers; Seasonal change, periodic change and irregular change can be expressed by relative number (percentage change).

Second, the seasonal decomposition method

When we predict a time series, we should consider decomposing the above four factors from the time series.

Why break down these four factors?

After decomposition, we can overcome the influence of other factors and only consider the influence of one factor on time series.

After decomposition, we can also analyze their interaction and their comprehensive influence on time series.

Excluding these factors, the contrast between time series can be better, thus reflecting the changing and developing law of things more objectively.

The decomposed sequence can be used to establish regression model, thus improving the prediction accuracy.

Should all time series decompose these four factors?

Usually, we consider decomposing the seasonal factors, that is, removing the seasonal variation factors from the original time series and generating a series composed of the remaining three factors to meet the needs of subsequent analysis.

Why only decompose seasonal factors?

The long-term trend in time series reflects the development law of things and is the key research object;

Because of its long period, periodic change can be regarded as a reflection of long-term trend;

Irregular changes are usually not analyzed separately because they are not easy to measure.

Seasonal changes sometimes misjudge the prediction model as irregular changes, thus reducing the prediction accuracy of the model.

To sum up, when a time series has the characteristics of seasonal change, the seasonal factors will be decomposed first when predicting the value.

Steps:

Define the date label variable: that is, define the time of the series before analyzing its time characteristics.

Understand the development trend of sequence: sequence diagram, determine multiplication or addition.

Carry out seasonal factor decomposition

make a model

Interpretation of analysis results

predict

1, define the date label variable.

The characteristic of time series is that the data are arranged in the order of time points, so SPSS needs to know the time definition of the series before analyzing it, and then it can analyze the time characteristics.

According to the format selection of the source data, enter the specific value of the first case.

Three new variables were generated in the source file.

2, understand the sequence development trend

After the definition of date variables is completed, we should first understand the changing trend of time series in order to choose the appropriate model. That is, whether the model is multiplicative or additive is determined by the sequence diagram.

The variable is "sales data" and the timeline label is "date--",which is our custom time.

Data sales sequence diagram

How to judge the multiplication or addition of the model according to the timing chart?

If the seasonal fluctuation of the series is getting bigger and bigger with time, it is suggested to use the multiplication model.

If the seasonal fluctuation of the series can basically remain unchanged, it is suggested to use the addition model.

This example is obvious: with the change of time, the seasonal fluctuation of sales data is getting bigger and bigger, and the multiplication model will be more accurate.

3. Seasonal factor decomposition

The variable is "sales data". According to the sequence diagram, we know that the time series model is multiplicative.

You will be prompted to generate four new variables.

ERR (Error Sequence): The sequence left after seasonal factors, long-term trends and periodic changes are removed from the time series, that is, the sequence composed of irregular changes in the original sequence.

SAS (Seasonal Factor Correction Sequence): It is a modified sequence after removing the seasonal factors in the original sequence.

SAF (seasonal factor): It is a seasonal factor separated from the sequence. Variable values are repeated according to the seasonal cycle. For example, in this case, the seasonal cycle is 12 months, so these seasonal factors are not repeated every 12 months.

STC (long-term trend and periodic trend): This is a sequence composed of long-term trend and periodic change in the original sequence.

As shown in the figure, the cycle is 12 months and the seasonal factor is 12 months.

What is the difference between the sequence decomposed by seasonal factors and the original sequence?

The original sequence is compared with three sequences except seasonal factors (error sequence, seasonal factor correction sequence, long-term neglect and periodic change sequence) by receiving sequence diagram.

To make four sequence diagrams, there will be four variables:

Original Series: Use the variable "Sales Data";

Error sequence: variable "err" is used;

Seasonal factors after school sequence: using variable "SAS"

Long-term trend and periodic change series: using the variable "STC"

Blue line: original sequence

Purple line: long-term trend and periodic change sequence

Light brown: seasonal factor correction sequence

Green line: error sequence (irregular change)

Because the value of the error sequence is very small, the long-term trend and periodic change sequence (long-term trend+periodic change) can basically coincide with the seasonal factor correction sequence (long-term trend+periodic change+irregular change, that is, error).

Make "seasonal factor SAF" sequence diagrams respectively:

Because it is a "seasonal factor" sequence diagram, there is only one variable "seasonal factor SAF"

We can see that the period of seasonal factors is 12 months, which first decreases, then rises to the first peak, then slightly decreases, and then shows an obvious upward trend, reaching the peak in the seventh month, then falling all the way to the last month, and then entering the second cycle.

Through the seasonal decomposition of the original sequence, we can better grasp the time characteristics contained in the original sequence, so as to choose the appropriate model for forecasting.

Third, expert modeling method.

There are four steps in forecasting time series:

Draw a time series diagram to observe the trend.

Analyze the stationarity of the sequence and carry out stationarity processing.

Time series modeling and analysis

Model evaluation and prediction

Stationarity mainly means that all statistical properties of time series will not change with time.

For stationary time series, it has the following characteristics:

Mean and variance do not change with time.

Autocorrelation coefficient is only related to time interval, not time.

Autocorrelation coefficient is the correlation coefficient of different periods in the research series, that is, a series of correlation coefficients of current and different lag periods are calculated for the time series.

Stability-difference method.

Difference refers to the difference between the data of two adjacent periods in a sequence.

Main difference =Yt-Yt- 1

Quadratic difference = (yt-yt-1)-(yt-1-yt-2)

The specific stabilization operation process will be automatically handled by expert modeling, and we only need to hum several orders of magnitude differences according to the model results.

Time series analysis operation:

To analyze all variables, select sales data.

Expert Modeler–Condition, check "Expert Modeler considers seasonal model".

Select Predicted Value to generate a predicted value and save the model.

Interpretation of time series analysis results

The table shows the best time series model and its parameters obtained through analysis. The best hunting performance of U-shape is ARIMA (0, 1, 1) (0, 1, 1).

And the autoregressive moving average model ARIMA(p, d, q).

P: The lag period of the series after seasonal changes is usually 0 or 1, and rarely greater than1;

D, carrying out: D-order difference on the sequence after seasonal change, usually with the value of 0, 1 or 2;

Q: The moving average of the series after seasonal changes is q times, generally taking the value of 0 or 1, rarely exceeding 2;

P, d and q respectively represent the function of a sequence containing seasonal changes.

Therefore, this example can be interpreted as a time series model based on the combination of the first-order difference and the first-order moving average of the sequence excluding seasonal changes and the sequence containing seasonal changes.

This table mainly uses R-square or stationary R-square to evaluate the fitting degree of models. When using multiple models, the optimal model is found through comparative statistics.

Because the original variable has seasonal variation factors, stationary R-square is more meaningful, equal to 32. 1%, and the fitting effect is average.

The table provides more statistical data, which can be used to evaluate the fitting effect of time series model.

Although the square of stationary R is only 32. 1%, the statistical significance of "Yang-Bokshi Q( 18)" is P=0.706, which is greater than 0.05 (here, P & gt0.05 is the expected result), so we accept the original hypothesis that the residual of this series conforms to random distribution and has no abnormal value.

Time series application prediction:

The coming year is from 20 16 to 12, which can be entered manually.

This is the sales trend in the coming year.

If you want to observe and predict the trend from a global perspective, you can link this year's trend with previous data.

The variables at this time should be "original sales quantity" and "20 16 year forecast sales quantity".

The results are as follows:

You can also view the specific values in the table: