Institute of Mechanics, Chinese Academy of Sciences, Beijing, 100080)
Catastrophes are very common in natural, social and economic fields. If the output sequence of the system suddenly changes at an unknown moment, the moment is called a change point. The purpose of statistical analysis of change points is to judge the existence of change points and determine their positions and numbers. The existing change point analysis includes mean change point analysis, probability change point analysis and model parameter change point analysis. This paper puts forward a new concept of slope change point, which refers to the point where the acceleration (deceleration) of curve slope changes the most. Combined with several different types of examples, this paper also puts forward the second-order difference method of regression coefficient to find a single slope change point. It can find the "turning point" of curves with monotonicity and concavity. The example shows that this method is simple, intuitive and effective.
Change point analysis; Slope; Change point; coefficient of regression
1 preface
In natural, social and economic fields, mutation is very common and important. Studying whether and when sudden changes occur is helpful to grasp the evolution law of events or processes, especially the occurrence and development law of disasters, thus providing basis for disaster prediction, prevention and management.
The output sequence of the system suddenly changes at an unknown moment, which is called a change point. The purpose of statistical analysis of change points is to judge and test the existence, position and quantity of change points and estimate the jump of change points. Statistical analysis of change points is a powerful tool to quantitatively analyze various monitoring data and study the laws of various geological disasters.
Change point analysis is divided into mean change point analysis, probability change point analysis and model parameter change point analysis [1]. The mean value (or probability distribution, or model parameters) of data has changed significantly before and after a certain moment, then that moment is called the change point of mean value (or probability, or model parameters). However, in geological problems, there is often a curve that rises gradually and suddenly accelerates after a certain moment; Sometimes the curve begins to accelerate, and then suddenly slows down after a certain point, and gradually tends to be flat. Other curves began to decline slowly, and suddenly turned to decline rapidly along the diagonal after reaching a certain point. Because the "turning point" of this mutation is an important feature point, it is often associated with the specific physical meaning of a specific problem, so it is very important to accurately determine the change point. This "turning point" is called the slope change point of the curve in this paper.
This paper puts forward how to find the dividing point between the second stage and the third stage according to the creep curve of rock and soil surface, how to find the "turning point" of variance reduction function in the relative distance δ (or θ) of soil properties index, and how to find the pre-consolidation pressure P according to the e-lgP curve. Analyze and discuss the change point of slope.
2 the basic principle of finding a single slope change point
Firstly, the simplest slope change point problem is studied. Assuming that a curve has only one slope change point, the problem is how to find a simple, quantitative and accurate method to determine this change point.
Generally speaking, the measured data are mostly discrete pairs of data points. When the measurement time interval is long, it is difficult to form a continuous curve that can truly reflect the process change. Therefore, most observation sequences cannot be easily expressed by curve equations, so the slope of each point cannot be obtained by differential calculus, and the slope of the curve on both sides of a certain time point (or distance point) in a short time interval (or distance) can only be approximated by finding the linear regression coefficient of some continuous data points on both sides of a certain point. This is because the curve is approximately straight in a short time interval (or distance). The difference between the slopes of the curves on both sides of a point can reflect the amplitude of the slope change on both sides of the point, which can be said to use the first-order difference. But the "turning point" we are looking for here is not the point with the biggest change in slope, but the point with the biggest change in local acceleration (deceleration) of slope. Under the condition that the monitoring time is equal interval (or equal interval), the second-order difference of slope change amplitude reflects the acceleration (deceleration) of slope change. By finding the local maximum value of acceleration (deceleration) with slope change, the unit interval where the slope change point is located can be found. Then, the estimated value of the slope change point can be obtained quantitatively by using a method similar to finding the pattern of grouped data.
3 methods and steps to find a single slope change point
Through three different types of examples, this paper introduces the methods and steps to find a single slope change point.
Example 1. See table 1 for the cumulative displacement monitoring data of Asamushi landslide [2 1] on the Tohoku railway line in Japan (the data is measured by figure 1).
Table 1 Cumulative Displacement of Shallow Marsh Landslide of Tohoku Railway in Japan
Figure 1 Change point analysis results of time-cumulative displacement curve of shallow marsh landslide of Tohoku Railway in Japan.
This is the displacement-time curve measured from the landslide surface, which obviously contains the data of the second and third creep stages. In the second stage, the curve segment is approximately a straight line, which embodies the characteristics of constant speed crawling; The third stage is the accelerated creep stage, and its curve is obviously accelerated (Figure 1). That is, in this example, only one slope change point is known. As long as we can find the point where the local acceleration of the slope changes the most, it means that we have found this point of slope change. The second-order difference method of regression coefficient for finding this change point is introduced step by step.
(1) Selection of exploration points: Since the monitoring data are at equal time intervals, the midpoint of two adjacent observation time points is selected as the exploration point to form a sequence of exploration points. For example, the sequence of exploration points in 1 is Ti = 2 1.25, 2 1.75, 22.25, 22.75, 23.25, …, 26.75.
(2) Construct a sliding window with each exploration point as the center, so as to calculate the slope of the curve around the exploration point (that is, the linear regression coefficient of several data points before and after the exploration point). Because the number n of data points participating in regression will affect the value of regression coefficient, the same number (n) of data points are taken before and after the exploration point to form a sliding window, so that the slopes before and after can be compared under the same conditions. Moreover, because the curve is only approximate to a straight line in a short time (or a short distance), n cannot be too large. Here, n = 2, 3 and 4 are taken respectively to form three groups of sliding windows.
(3) In the sliding window with ti, the N data points before (or to the left of) the exploration point ti are linearly regressed to obtain regression coefficients, which are recorded as (Ti); Similarly, linear regression is performed on the N data points behind (or to the right of) the exploration point ti, and the regression coefficient is obtained, which is recorded as (ti). Obviously, when n=2, the calculated regression coefficient reflects more local slope behavior, but not enough overall slope behavior, and because there are too few points (only two points can be connected by straight lines), it is random and has insufficient statistical significance. On the contrary, the regression coefficient calculated when n=4 reflects the overall slope behavior well, but the local slope behavior is poor, and it has stronger statistical significance and less randomness because of more points. N=3 is somewhere in between. So we should pay more attention to the result when n=4. Therefore, the values calculated when n = 2, 3 and 4 are weighted and averaged, and the weight is n2. Therefore, when all the values of a certain ti exist, the weighted average is:
Essays on Geological Disaster Investigation and Monitoring Techniques and Methods
(4) For each exploration point ti, calculate the difference and record it as? S(ti), that is
Essays on Geological Disaster Investigation and Monitoring Techniques and Methods
S(ti) can be said to be the increment (or variation) of the slope of the curve before and after ti point, and it can also be understood as the first-order difference of the slope of the curve before and after ti point, and its magnitude reflects the increase of the slope at ti point.
(5) Right? S(ti) sequence, and then calculate the second-order difference, namely:
Essays on Geological Disaster Investigation and Monitoring Techniques and Methods
The magnitude of this second-order differential value reflects the magnitude of the curve slope acceleration change in the interval (ti- 1, ti). △2S(ti) also constitutes a sequence.
(6) Find the maximum value (greater than the first two values and the last two values) in the 2S (ti) sequence along the sequence of Ti from small to large. Let the corresponding interval be (ti- 1, ti), which should be the interval where the slope change point is located. Then two adjacent intervals (ti-2, ti- 1) and (ti, ti+ 1) and their corresponding values △2S(ti- 1) are used to group data.
Essays on Geological Disaster Investigation and Monitoring Techniques and Methods
The data of the example 1 is calculated by the second-order difference method of regression coefficient, and the intermediate and final results of calculation are shown in Figure 2.
As can be seen from Figure 2, the maximum second-order difference is 2S(ti)=23.63, its corresponding interval is (23.25, 23.75), and its two adjacent intervals are (22.75, 23.25) and (23.75, 24.25). Their corresponding second-order differential values are △ 2s (Ti- 1) = 18.80, △ 2s (Ti+1) =1.47.
According to Formula (4), it can be calculated as follows:
Essays on Geological Disaster Investigation and Monitoring Techniques and Methods
Intuitively, t * is suitable as the slope change point of the curve, and it can be used as the dividing point of the second and third stages of the creep curve. Finding this boundary point has important practical significance, which can remind observers when to start intensive monitoring of landslide dynamics, and also enable field engineers to make a more accurate prediction of landslide occurrence time only by using the data points in the third stage. Therefore, the determination of slope change point is helpful to solve the problem of dividing the second and third stages of landslide creep curve. In addition, it can also help to find the "inflection point" of the curve from rapid decline to sudden slowdown, and the curve from gentle decline to rapid decline along the diagonal.
Fig. 2 Change point analysis of time-cumulative displacement curve of shallow marsh landslide in Tohoku Railway, Japan
4. Look for the "inflection point" of the curve from rapid decline to sudden slowdown and stabilization.
There are many such problems in geology. But in the past, a "turning point" was artificially determined only by naked eye observation, and there was no objective quantitative analysis method. The solution of this problem is not only helpful for theoretical research, but also has important practical economic benefits, such as determining economic and reasonable sampling spacing and exploration network density. It is also commonly known as the problem of finding a "balance point". The application of slope change point analysis will be illustrated by an example of finding soil correlation distance.
Example 2. It is known that the lag distance t and variance reduction function г 2 (t) calculated from the static penetration test data of Vancouver and Hani in Canada are shown in Table 2 and Figure 3.
Table 2 Haney's T and γ 2 (t) data tables
The curve in example 1 starts to rise at a constant speed, and then accelerates after the change of slope point. However, the curve in Example 2 begins to decline rapidly, then suddenly slows down when it reaches the change point, and gradually slows down, even tends to a horizontal asymptote. What we need to look for here is the "inflection point" of sudden deceleration and decline. Although the curves in Example 2 are quite different from those in Example 1, they are all concave curves. Therefore, with the increase of ti (or Ti), the slope of the curve always increases. So the formula is still used for the first-order difference: but when calculating the second-order difference 2S(Ti), it is different from the example 1. In the example 1, the sequence S(ti) generally increases with the increase of ti, so 2s (ti) = s (ti)-s (ti-1); And that sequence in example 2? S(Ti) generally decreases with the increase of Ti, so the second-order difference should be calculated in reverse:
Fig. 3 graph of Haney variance reduction function γ 2 (t)
Essays on Geological Disaster Investigation and Monitoring Techniques and Methods
Other methods and formulas for finding t * are similar to Equation (4).
The calculation result of the data in Example 2 is shown in Figure 4. As can be seen from Figure 4, the second-order difference of the maximum value is 2S(Ti)=0. 1822, and its corresponding interval is (1.1.3). Its two adjacent intervals are (0.9, 1. 1) and (1. 1, 1.3), and their corresponding second-order difference is 2s (ti-1) = 0./.
Essays on Geological Disaster Investigation and Monitoring Techniques and Methods
This is the estimate of t at the "turning point". Then, according to the data of t and γ 2 (t) in Table 2, γ 2 (t *) is obtained by linear interpolation:
Fig. 4 Change point analysis results of Haney's prescription difference reduction function γ 2 (t)
Essays on Geological Disaster Investigation and Monitoring Techniques and Methods
∴γ2(t*)=0.3378-0.054×0.7 195=0.2989
Therefore, the relative distance of soil in Haney is г t * г 2 (t *) =1.1439× 0.2989 = 0.3419 (m). This is very close to the original calculation result δ = 0.324 m, and the relative error is only 5.5%. T*= 1.2 in the original text. If measured by the data in this paper, г 2 (1.2) = 0.2838, then δ≈ 1.2×0.2838 = 0.34056(m), which is closer to the results in this paper, and the relative error is only 0. In the original text, when T*= 1.2, γ 2 (t *) = 0.27, so we can calculate δ≈ 1.2×0.27=0.324(m). It can be seen that the error is mainly caused by the measured data in this paper, and has nothing to do with the method itself. It can be seen that the slope change point method provides another effective method to calculate the correlation distance.
5. Look for the "turning point" where the curve slowly and gently drops to a sudden decline along the diagonal.
Calculating the pre-consolidation pressure Pc according to the e-lgP high-pressure consolidation experimental curve in soil mechanics is a typical representative of this kind of problem.
Example 3. The experimental data of logarithm of vertical pressure P and gap ratio [4] are shown in Table 3 and Figure 5. Firstly, linear interpolation is used to make the test data equidistant. The change point analysis results are shown in Figure 6.
Table 3 e-lgP data sheet
Fig. 5 e-lgP curve of high pressure consolidation test
Fig. 6 Analysis results of change points of e-lgP curve in high pressure consolidation test.
Because the curve is concave, the sum calculated for each Li is greater than this value, so it can be inferred from formula (2):
Essays on Geological Disaster Investigation and Monitoring Techniques and Methods
Because of calculation? S(Li) sequence is basically incremental, so the formula for calculating △2S(Li) can still follow formula (3), namely:
Essays on Geological Disaster Investigation and Monitoring Techniques and Methods
As can be seen from the last column of Figure 6, the maximum △2S(Li) is 0.04 13, and the corresponding interval is (2.45, 2.55). The two adjacent intervals are (2.35, 2.45) and (2.55, 2.65), and the corresponding second-order differential values are △ 2s (Li- 1) = 0.036 1 and △ 2s (Li+1) = 0.0655.
Essays on Geological Disaster Investigation and Monitoring Techniques and Methods
That is, the estimated value of the slope change point is (lgP)*=2.4658. Therefore, the estimated value of preconsolidation pressure Pc is = 102.4658=292.28kPa, and the calculated result of the original Pc is 3 13.9 1kPa, which is very close, and the relative error is only 6.89%.
6 conclusion
In the scientific research of nature, society and economy, we often encounter the problem of finding a "turning point" of a curve. When the curve monotonously increases (or decreases) and is concave (or convex), it shows that the curve has only one slope change point, which can be determined by the second-order difference method of regression coefficient.
When the curve is not monotonous and concave-convex changes, the curve can be divided into monotonous curve segments with single concave-convex, and then the slope change points can be found by segments.
The slope change point obtained by the second-order difference method of regression coefficient is the point with the largest slope acceleration, not the point with the largest slope change. This point is often a "turning point" that changes along an approximate straight line and then suddenly accelerates to change along another straight line or curve, usually before the point with the greatest slope change (or the point with the greatest curvature).
The application of the second-order difference method of regression coefficients requires relatively close equidistance sequences, that is, there are enough data point pairs, preferably more than 20 or 30 pairs, and at least 13 pairs. This is because there is edge effect when using sliding window, and the calculated regression coefficient can not well represent the slope of the curve. If the original data is not equidistant, linear interpolation can be used to convert it into equally spaced and denser data.
The formulas for calculating the change point of slope are slightly different when the curve monotonically increases or decreases and when it is concave or convex: ① When the curve monotonically increases or decreases, use formulas (2) and (3), such as1; (2) When monotonically decreasing and concave, use formulas (2) and (5), as in Example 2; (3) Formulas (6) and (3) are used when convexity monotonically decreases, as in Example 3; ④ Formulas (6) and (5) are used for monotonically increasing convexity.
refer to
[1] item, Shi Jiuen. Statistical methods of data processing in nonlinear systems. Beijing: Science Press, 1997: 1 ~ 43.
[2] Mr Saito. Prediction of slope failure time by tertiary creep. Proceedings of the 7th International Conference on Soil Mechanics and Basic Engineering, Mexico, 1969, Vo 1.2:677~683.
[3] Damika Vikramasinha and R.G. campanella. Wave scale as a soil variability descriptor. Probability method in geotechnical engineering, edited by Li and Luo, Barkema, Rotterdam, 1983:233~239.
[4] Jean-Pierre Baldet. Experimental soil mechanics. Prentiss Hall Company, 1997:297~306
Chen Xiru. Introduction to statistical analysis of change points. Mathematical Statistics and Management, Volume 65438 +0 ~ 4: 1 ~ 43.
[6] Chen× R. Inference in Simple Change Point Model. China Science Series. I. 1988, (6)
[7] D. Lower confidence limit of change point after sequential accumulation and test. Journal of Statistical Planning and Reasoning, 2003:115,311~ 326.
[8] Douglas Hawkins. Multi-point model fitting, calculation statistics and data analysis of data, 37,2001:323 ~ 341
9 Wayne Taylor. Change point analysis: a powerful new tool for detecting changes, 2000,/CPA /cpa/tech/changepoint.html
Wayne Taylor. Pattern test for distinguishing autoregressive data from mean shift data, 2000,/CPA /cpa/tech/changepoint.html