Current location - Education and Training Encyclopedia - Graduation thesis - Machine Learning Series (36) —— Review of Regression Decision Tree and Decision Tree
Machine Learning Series (36) —— Review of Regression Decision Tree and Decision Tree
Regression decision tree is a decision tree model for regression. Regression decision tree mainly refers to CART algorithm and is also a binary tree structure. Taking the regression problem of two feature prediction outputs as an example, the principle of regression tree is to divide the feature plane into several units, and each unit corresponds to a specific output. Because each node is a judgment of whether it is true or not, the boundary is parallel to the coordinate axis. For the test data, we only need to classify the features into a certain unit according to the decision-making process, and we can get the corresponding regression output value.

According to the division shown in the above figure and the corresponding regression tree, if the feature of a new data is (6,7.5), according to the regression tree, its corresponding regression result is C5. The process of dividing nodes is also the process of building a tree. Every division, the output corresponding to the division unit is determined immediately, and there is one more node. When the partition is terminated according to the corresponding constraints, the output of each unit is finally determined, and the output is also a leaf node. This seems to be similar to the classification tree, but it is quite different. Finding the cut-off point and determining the output value are two core problems of regression decision tree.

The segmentation error of the input space is measured by the least square method between the actual value and the predicted value of the segmentation area:

Where is the predicted value of each segmentation unit, and the predicted value is some combination of the values of each sample point in the unit, such as the desirable mean:

(Input feature space is divided into)

Then solving the optimal segmentation is solving the optimization problem:

Where and are two areas formed by each partition.

The solution to this optimization problem will not be introduced here. We directly use the decision regression tree in skleaen to see the regression effect of decision tree. The data set uses Boston house price data:

Without adjusting the parameters, we can see that the R square on the test set is 0.59, which is obviously not a good result, but an interesting phenomenon is that on the training set:

The R-squared value is 1.0, which means that the regression results predicted by the decision tree on the training set are completely consistent, without deviation and obviously over-fitted. This example also shows that the decision tree algorithm is very easy to produce over-fitting. Of course, we can alleviate over-fitting by adjusting parameters.

Let's draw a learning curve to directly see the performance of the decision tree regression model. First, draw a learning curve according to MSE:

The learning curve is as follows:

Then draw a learning curve based on r square:

The above two results are obtained by default, that is, the depth of the decision tree and the number of leaf nodes are not limited. It is found that in the training set, if there is no restriction, zero deviation can be achieved, which is obvious over-fitting. Then adjust the parameters and draw a learning curve. In order to save space, only the decision tree depth is adjusted, and only the learning curve based on r-square is drawn:

Max_depth= 1

When the maximum depth =3

When the maximum depth =5

With the increase of depth, the complexity of the model becomes higher and higher, and the over-fitting phenomenon becomes more and more obvious. It can be checked that when max_depth=20, it is an unbiased straight line with y= 1 on the training set. Interested students can still modify other parameters to draw a learning curve.

Limitations of decision trees:

Using the iris data in the last article in this series, let's look at the results caused by the sensitivity of decision trees to individual data. In the last article in this series, information entropy was used to divide the decision. By default, the decision boundary drawn for other parameters is:

Then we delete the data with the index of 138, and then draw the decision boundary:

It is found that the decision boundary at this time is completely different, and this is only the influence of a data point.

To sum up, we know that decision tree is actually an unstable algorithm, and its performance is extremely dependent on tuning parameters and data. However, although the decision tree itself is not an efficient machine learning algorithm, it is based on the combination of ensemble learning-Random Forest (RF) is a very robust machine learning algorithm, which will be introduced in the next chapter.