Statistical Model Paper 1 Theoretical Summary and Application Analysis of Statistical Arbitrage Model
Statistical arbitrage model is based on quantitative economics and statistics. Through the analysis of historical data, the probability distribution of related variables is estimated, and the future income is predicted by combining fundamental data to find arbitrage opportunities for trading. Statistical arbitrage is of great theoretical and practical significance because of its statistical characteristics in analyzing time series. In practice, it is widely used in 10 hedge funds to obtain income, which is mainly manifested in capital effectiveness testing and open-end fund rating in theory. This paper introduces the basic principle, trading strategy and application direction of statistical arbitrage.
Analysis on the Application of Statistical Arbitrage Pairing Transaction
1. Brief introduction to the principle of statistical arbitrage model
Statistical arbitrage model is based on two or more stocks or other securities with high correlation. Through certain methods, it is verified that the stock price fluctuation keeps this good correlation for a period of time, so once there is a deviation between the two, and this price deviation is expected to be corrected in the future, arbitrage opportunities can be generated. In the practice of statistical arbitrage, when there is a deviation between the two, you can buy stocks with undervalued performance price and overvalued sales price, and when the price deviation between the two is corrected in the future, you can carry out the opposite liquidation operation. The premise of realizing statistical arbitrage principle is mean reversion, that is, there is a mean interval (in practice, the time series of asset prices is generally stable, and its sequence diagram fluctuates within a certain range), and the price deviation is short-term, and with the passage of practice, asset prices will revert to its mean interval. If the time series is stable, we can build a signal discovery mechanism of statistical arbitrage trading, which will show whether the asset price has deviated from the long-term average and whether there are arbitrage opportunities. In a sense, there are two kinds of securities with the same point (like stocks in the same industry), and there is a good correlation between their market prices, and the prices often change in the same direction, so the price difference or price ratio often fluctuates around a fixed value.
Second, the trading strategy and data processing of statistical arbitrage model
There are many strategies for statistical arbitrage, such as pairing/basket trading, multi-factor model and so on. At present, the widely used strategy is mainly the matching transaction strategy. Paired strategy, also known as carry trade, is to pair a long position and a short position of stocks with long-term stable and balanced stock prices in the same industry, so that traders can maintain a neutral position in the market. This strategy is more suitable for actively managed funds.
There are two main steps to implement the matching trading strategy: one is the choice of stock pairs. Zhou Jian, an analyst of Haitong Securities, pointed out in the article "Research on Absolute Return Strategy-Statistical Arbitrage" that stock selection should combine fundamentals and industries, so as to ensure strategic returns and effectively reduce risks. Such as banking, real estate, coal and electricity industries. In theory, we can use the cluster analysis method in statistics to classify and then carry out cointegration test, so the probability of success will be greater. The second is to test the correlation between the stock price series itself and each other. At present, cointegration theory and random walk model are commonly used.
To judge the correlation of stock price series by cointegration theory, we must first test the stationarity of stock price series. The commonly used inspection methods are graphic method and unit root inspection method. The graphical solution is that the selected time series variables and the first-order difference are both time series. As can be seen from the figure, a certain trend may have appeared in the time series of variables, but the time series after the first-order difference shows randomness, so the series may be stable. But it is subjective to judge whether the sequence exists by graphic method. Theoretically, the stationarity and rank of the test sequence are determined by the unit root test. There are many methods for unit root test, such as DF test, ADF test and Phillips nonparametric test (PP test). ADF testing is commonly used.
After testing, if the sequence itself or the first-order difference is stationary, different stock sequences can be co-integrated tested. Co-integration test method mainly includes EG two-step method, that is, the first-order residual is obtained by ordinary linear regression of the variables to be tested, and then the unit root test is carried out on the residual sequence. If there is a unit root, then the variables have no cointegration relationship, and if there is no unit root, then the sequence is stationary. EG test is more suitable for cointegration test between two sequences. Besides EG test, there are Johansen test, Gregory hansan test and autoregressive lag model method. Johansen test is more suitable for testing the cointegration relationship between more than three sequences. Through cointegration test, we can judge the correlation between stock price series, and then trade in pairs.
Christian L. Dunis and Gianluigi Giorgioni(20 10) use high-frequency data to arbitrage instead of daily trading data, and compare the real-time rate of return of arbitrage between stock pairs with co-integration relationship and stock pairs without co-integration. The results show that the higher the price co-integration relationship between stocks, the more opportunities for statistical arbitrage and the higher the potential rate of return.
According to the random walk model, we can test whether the stock price fluctuation has? Memory? , that is, whether there are predictable ingredients. Generally, it can be divided into short-term predictability analysis and long-term predictability analysis. In the short-term predictability analysis, the test standard is mainly aimed at the third case of random walk process, that is, the study of uncorrelated increment. The test tools that can be used are autocorrelation test and variance ratio test. In the sequence autocorrelation test, the commonly used statistics are autocorrelation coefficient and Baucus-Pierce Q statistics. When these two statistics are significantly greater than their critical level at a certain confidence level, it shows that the autocorrelation of the sequence is predictable. The variance ratio test follows the fact that the variance of logarithmic return rate of stock price increases linearly with cycles, and the increment in these cycles can be measured. In this way, the variance of the income calculated in K period should be approximately equal to the variance of K times the single period income. If the fluctuation of stock price is random, the variance ratio is close to1; When there is positive autocorrelation, the variance ratio is greater than1; When there is negative autocorrelation, the variance ratio is less than 1. Long-term predictability analysis, because the effect of variance ratio test is not obvious when the time span is long, it can be measured by Hurst index in R/S analysis, which is estimated by the regression coefficient of the following equation:
Ln[(R/S)N]=C+H*LnN
R/S is the rescaling range, n is the number of observations, h is the Hurst index, and c is a constant. When H> is 0.5, it shows that these stocks may have long-term memory, but it is still uncertain whether this series is a random walk or a continuous fractal time series, and significance test is needed.
Whether it is judged by cointegration test or random walk, its purpose is to find a short-term or long-term equilibrium relationship and make our statistical arbitrage strategy effectively implemented.
Generally, the data for statistical arbitrage is the closing price data of the trading day, but recent research has found that there are more opportunities for statistical arbitrage of high-frequency data in the market (such as 5 minutes, 10 minutes, 15 minutes and 20 minutes closing price trading data). For daily trading data, we choose the closing price before reinstatement. If the price difference between the two stocks is relatively large, advanced logarithmic processing is needed. Christian L. Dunis and Gianluigi Giorgioni(20 10) use 15 minutes closing price, 20 minutes closing price, 30 minutes closing price and 1 hour closing price as samples for statistical arbitrage analysis, and the results show that using high-frequency data for statistical arbitrage has higher returns. Moreover, in the series of research on absolute return strategy, the financial analysts of Haitong Securities conducted statistical arbitrage matching transactions with the Shanghai and Shenzhen 300 Index as the target stock pool sample, and the cumulative income calculated by using high-frequency data was nearly 5 percentage points higher than that calculated by using daily trading data.
Third, expand the application of statistical arbitrage model-test the effectiveness of capital market.
The efficient market hypothesis put forward by Fama (1969) means that the market can respond to information quickly and reasonably, so that the market price can fully reflect all available information, so that the price of assets can not be predicted with the current information, and no one can continuously obtain excess profits. By testing the existence of statistical arbitrage opportunities, we can verify whether the capital market is efficient, weakly efficient or ineffective. Xu (2005) made an empirical study on the efficiency of China's capital market by using statistical arbitrage, and came to the conclusion that the existence of statistical arbitrage opportunities is incompatible with the efficiency of the capital market for the first time. Based on this theory, this paper examines whether there are statistical arbitrage opportunities in the investment strategies of price inertia, price reversal and value reversal in China stock market, and finds that China stock market has not yet reached weak efficiency. Wu, (2007) used this method to test the weak efficiency of China A-share market, and found that inertia and contrarian investment strategies were ineffective in China A-share market. In addition, Chinese scholar Wu Hewei revised Hogan's statistical arbitrage model and proposed an open-end fund rating method based on statistical arbitrage model.
Four. conclusion
At present, the application of statistical arbitrage model is mainly manifested in two aspects: 1. Arbitrage is an effective trading strategy. 2. Verify the effectiveness of the capital market or market by detecting the existence of statistical arbitrage opportunities. Since the implementation of statistical arbitrage strategy depends on the establishment of short-selling mechanism, with the introduction and improvement of stock index futures and margin trading, it is believed that it will be widely used and developed in China.
refer to
[1] A.N. Burgess: Computational Methodology for Simulating Statistical Arbitrage Dynamics, London Business School, Doctoral Dissertation, 1999.
[2] Fang Hao. Theoretical model and application analysis of statistical arbitrage--based on the test of closed-end fund market in China. Statistics and decision-making, June 2005 (part two).
[3] Mary, Lu,. Feasibility study on spot arbitrage of Shanghai and Shenzhen 300 stock index futures-an empirical study based on statistical arbitrage model. Finance and trade research, 20 1 1, 1.
[4] Wu. Research on arbitrage strategy based on Shanghai and Shenzhen 300 stock index futures [D]. China excellent master's degree thesis. 2009。
[5] Wu,,. Statistical Arbitrage Test of Weak Efficiency in China Stock Market [J]. System Engineering Theory and Practice. 2007,2。
Study on estimation of semiparametric statistical model in statistical model II
With the rapid development of data model technology, the existing data models can no longer meet some measurement problems encountered in practice, which seriously limits the application and development of modern science and technology in data models. Therefore, under this background, scholars put forward new theories and methods of data model measurement experiments and developed semi-parametric model data applications. Semi-parametric model data is a new measurement data model based on parametric model and nonparametric model, so it has many similarities with parametric model and nonparametric model. In this paper, the semi-parametric statistical model will be explored and discussed in detail by combining data model technology.
Keywords semiparametric model to improve longitudinal data error measurement
Taking the semiparametric model as an example, the estimated and observed values of parametric and nonparametric components are discussed, and the estimated expressions of nonparametric components are obtained by cubic spline function interpolation. In addition, in order to estimate the parameters and nonparametric parts of semiparametric model under longitudinal data, the semiparametric data model, asymptotic normality and strong consistency are studied and analyzed under the condition that the error is martingale difference sequence. In addition, this paper discusses the selection of equilibrium parameters, fully explains the pan-least square estimation method and related conclusions, and discusses and studies the iterative method of semi-parametric model.
I. Introduction
In daily life, the parameterized data model used by people is relatively simple, so it is easier to operate; However, there is a big error in the measured data in actual use, such as measuring relatively small objects or measuring dynamic objects. The establishment of semi-parametric data model can solve and alleviate this problem well: it can not only eliminate or reduce the errors in measurement, but also check out the system errors that cannot be parameterized. System error greatly affects all kinds of information of observation values. If it can be improved, it can realize faster, more timely and more accurate error identification and extraction process. This can not only improve the accuracy of parameter estimation, but also effectively supplement related scientific research.
For example, the model is successful and practical in simulation examples and practical applications such as coordinate transformation, GPS positioning and gravity measurement. This is mainly because the semi-parametric data model is consistent with the data model currently used and can meet the actual needs now. The newly established semiparametric model and its estimation of parametric and nonparametric parts can also solve some estimation problems of pollution data. This semiparametric model not only studies its own T-type estimation under longitudinal data, but also expounds some semiparametric data models with smooth terms in detail. In addition, based on symmetry and asymmetry, parameter estimation and hypothesis can be tested under linear constraints, mainly because the factors affecting the observed values are not only included in this linear relationship, but also interfered by some specific factors, so they cannot be classified as errors. In addition, there are some errors in the measurement based on independent variables, which often leads to the loss of a lot of important information in the calculation process.
Second, semi-parametric regression model and its estimation method
This model was put forward by Si Tong, a famous western scholar, in the 1970s, and gradually matured in the 1980s. At present, this parametric model has been widely used in many fields such as medicine, biology and economics.
Semi-parametric regression model is between nonparametric regression model and parametric regression model, and its content includes not only linear parts, but also some nonparametric parts. It should be said that this model successfully combines the advantages of both. The parameters involved in this model are mainly functional relations, which is what we often say can effectively grasp and explain the general trend of variables; The nonparametric part is mainly the fuzzy part of the value function relationship, in other words, it is to adjust the variables locally. Therefore, the model can make good use of the information presented in the data, which is incomparable to parametric regression model and nonparametric regression model, so semiparametric model often has stronger and more accurate explanation ability.
Judging from its use, this regression model is a statistical model that is often used at present. Its form is:
Thirdly, the functions of longitudinal data, linear function and smoothing function.
The advantage of longitudinal data is that it can provide many conditions, which has aroused great concern. There are also many examples of vertical data. But in essence, longitudinal data actually refers to a series of data obtained by repeated observation of the same individual at different times and places. However, due to some differences among individuals, there will be some deviation in calculating the variance of longitudinal data. When observing longitudinal data, the observed values are relatively independent, so its characteristic is that it can effectively combine two completely different data and time series. In other words, we can analyze the trend of individuals over time, and at the same time we can see the overall changes. At present, in many researches on longitudinal data, it not only retains its advantages, but also develops and realizes local linear fitting in longitudinal data. This is mainly because people want to establish the relationship among output variables, covariates and time effects. However, due to the relatively complex time effect, it is difficult to carry out parametric modeling.
In addition, although the estimation of linear model has made many achievements, the estimation of semiparametric model is still blank. The estimation of linear model is not only to solve the problem of rank deficiency or ill-conditioned, but also to provide methods to deal with linear, nonlinear and semiparametric models when the matrix is ill-conditioned. Firstly, by comparing two observation data with similar observation conditions, the nonparametric influence can be weakened. Therefore, the semiparametric model is converted into a linear model, and then the parameters are estimated according to the linear model. In most cases, its linear coefficient will change with the change of another variable, but this linear coefficient changes with time, so it is impossible to find samples in the same model in all time periods, and it is difficult to describe it with one or several real functions. When processing measurement data, if it is regarded as a random variable, it can only achieve the function of estimation. If we want to introduce the nonlinear function of another variable into the classical linear model, that is, the model contains essential nonlinear parts, we must use the semi-parametric linear model.
In addition, it also refers to the body composed of various parts. The research object is the non-smooth and non-differentiable geometric shape produced in nonlinear system, and the corresponding quantitative parameter is dimension. The research of fractal statistical model is one of the main frontier topics in international nonlinear research at present. Therefore, the first method is parametric nonparametric component estimation method, also called parametric estimation method, which is the early work of semiparametric model, that is, to limit the function space to some extent, mainly referring to smoothness. Some researchers believe that nonparametric components in semiparametric models are also nonlinear, and in most cases, they are often not smooth and differentiable. Therefore, cubic smooth spline function can also be used to study semiparametric models with the same data and the same test method.
Fourthly, the robustness of pan-least square method and linear model least square method.
(1) The least square method appeared in1late 8th century.
In the scientific research at that time, a question was often raised: how to get the best estimation of parameters from multiple groups of unknown parameters. Although the norm of the general least square method was not as good as that of the least square method at that time, the least square method was most used at that time, and its purpose was to estimate parameters. After a period of research and application, the least square method has gradually developed into a relatively perfect theoretical system. At this stage, we can not only clearly know the model that the data obeys, but also use iterative weighting method to assist semi-parametric modeling of longitudinal data. This is very effective for estimating nonparametric components by compensated least square method, and the estimation of nonparametric components is relatively reliable as long as the observed values are accurate. For example, in physical geodesy, the least square collocation method has been used for a long time to get the best estimate of gravity anomaly. However, when using the compensated least square method to study gravity anomalies, we should consider the truth of parameter estimators as well as the small overall error. On the basis of comparing iterative weighted partial splines, some shortcomings of the least square method currently used are studied. It should be said that this method only emphasizes the minimization of the overall error and ignores the error in parameter component estimation. Therefore, in the actual operation process, special attention needs to be paid.
(2) The application and difference of semiparametric model in GPS positioning.
In GPS phase observation, the systematic error of semi-parametric model is the main factor affecting high-precision positioning. Because the model has some errors before solving, it is necessary to observe the gross errors in time. In the use of GPS, the specific coordinates of the target point in the actual geographical coordinate system are calculated by broadcasting satellites. In this way, the unknown number of the whole cycle can be found and recovered during the operation. Because the observation value is between the satellite and the observation station, it is difficult to express it by parameters by looking for double difference to weaken or reduce the influence on the satellite and receiver system error. However, in the adjustment calculation, the difference method can obviously reduce the number of observation equations, but for various reasons, it still cannot obtain satisfactory results. However, if we choose to use the parameters in the semi-parametric model to represent the system error, we can get better results. This is mainly because the semiparametric model is a generalized linear regression model. For semiparametric models with smooth terms, an estimation method of linear function can be provided under given additional conditions, thus eliminating gross errors in measured values.
In addition, this method can be used not only for GPS measurement, but also for some parameter models such as optical wave rangefinder and deformation monitoring. In many cases, especially in the theoretical research of mathematics, we always assume that S is a random variable. In fact, this assumption is reasonable. In recent years, we have made some good achievements in the research of this linear model, and because of its relatively simple form and high applicability, this model has played an important role in many fields.
Through simulation examples and practical applications such as coordinate conversion GPS positioning gravity measurement, the success and practicability of this method are illustrated, and it is theoretically explained that the popular natural spline estimation method is essentially a special case of compensated least square method, which will have broad development space in the future. In addition, the research object of fractal theory mentioned in this paper should be non-smooth and non-differentiable geometric bodies produced in nonlinear systems, and fractal has been widely used in fracture mechanics, seismology and other fields, so it should be extended to study semi-parametric models, which can not only identify and extract errors more timely and accurately, but also improve the accuracy of parameter estimation, and is a powerful supplement to the current semi-parametric model research.
Verb (abbreviation of verb) abstract
The semiparametric model mentioned in this paper includes the estimated and observed values of parametric and nonparametric components, and the estimated expression of nonparametric components is obtained by cubic spline function interpolation. In addition, in order to solve the estimation problem of parameters and nonparametric parts of semiparametric model under the premise of longitudinal data, the semiparametric data model, asymptotic normality and strong consistency are studied and analyzed under the condition that the error is martingale difference sequence. At the same time, the least square estimation method is introduced. In addition, the selection of equilibrium parameters is preliminarily discussed, and the pan-least square estimation method and related conclusions are fully explained. Through the discussion and research of semi-parametric model iteration method, it provides a detailed theoretical explanation for iteration method and a theoretical basis for practical application.
refer to
[1] Hu Hongchang. Existence of quasi-maximum likelihood estimation for semiparametric regression model with an error of AR( 1) [J]. Journal of Hubei Normal University (Natural Science Edition), 2009(03).
Qian Weimin, Li Jingru. Strong consistency estimation of semiparametric regression model for longitudinal pollution data [J]. Journal of Tongji University (Natural Science Edition), 2009(08).
Fan, Wang Fenling,. Least square local linear estimation of semiparametric regression model of longitudinal data [J]. Mathematical Statistics and Management, 2009(02).
Cui Hengjian, Wang Qiang. Parameter estimation of EV model with variable coefficients [J]. Journal of Beijing Normal University (Natural Science Edition) .2005 (06).
[5] Qian Weimin, Chai Genxiang. Statistical Analysis of Mixed Effect Model of Longitudinal Data [J]. Mathematical Yearbook A (Chinese Version) .2009 (04)
Sun Xiaoqian, Eugene Red. Iterative weighted partial spline least squares estimation in semiparametric modeling of longitudinal data [J]. China Science (Series A: Mathematics), 2009(05).
Zhang Sanguo, Chen Xiru. Estimation of EV Polynomial Model [J]. China Science (Series A), 2009( 10).
Ren Zhe, Chen Minghua. Least Square Estimation of Parameters in Regression Analysis of Pollution Data [J]. Applied Probability Statistics, 2009(03).
Zhang Sanguo, Chen Xiru. Consistency of modified maximum likelihood estimation of EV model under repeated observation [J]. China Science (Series A) .2009 (06).
[10] Cui Hengjian, Li Yong, Qin Huaizhen. Estimation theory of nonlinear semiparametric EV quadruple model [J]. Science Bulletin, 2009(23).
Luo. Statistical Inference of Variable Coefficient Model with Random Missing Response Variables [D]. Central South University, 20 1 1.
[12] Liu Chaonan. Parameter Bayes estimation and reliability analysis of two-parameter exponential Weibull distribution [D]. Central South University, 2008.
[13] Guo Yan. The tax forecast model of Hunan Province and its empirical test and economic analysis [D]. Central South University, 2009.
Sang hongfang. Bayes Inference of Loss Function and Risk Function of Several Distribution Parameter Estimates [D]. Central South University, 2009.
Zhu Lin. Bayes analysis of zero-failure data subject to several reliability distributions [D]. Central South University, 2009.
Huang Furong. Statistical analysis of exponential nonlinear model and linear model with error of AR( 1) [D]. Nanjing University of Science and Technology, 2009.
Guess you like:
1. Statistical analysis papers
2. Excellent reference model for statistical papers.
3. Excellent papers statistical model essay
4. Reference examples of statistical papers
& gt& gt& gt More exciting next page? On unified planning and grading theory
earnest words offend the ear
Bacon, a great English philosopher, once pointed out: "The light one gets from others' slander is cleaner and purer than t