Tree Series 1: Decision Tree, Random Forest

Published:

Tree is one of the most widely used model, with a large family(regression tree, classification tree, bagging: RF, boosting: GBDT), and implementations(classification: [ID3, C4.5, CART], regression: CART). This series of posts will start from a brief introduction of the basic principles of DT, RF and GBDT, then go into details of GBDT and other boosting techniques(e.g. lightgbm, xgboost, catboost), and dive deeper by making comparisons.

Decision Tree

There are two main stages for generating a decision tree: constructing a full tree by greedily find the best split, pruning the tree to avoid overfitting. Details are summarized in the figure below. An extra part is about comparing different classification splitting measurements: sometimes misclassification rate is not sensitive enough for growing a tree, e.g. when it’s a binary classification problem(limited #class), and in each partition region, the majority class happens to be the same one. Gini and info gain are relatively more sensitive than error rate, but both of them have a bias towards splitting with larger k, making more splits, partitioning into smaller sub-regions, which will lead to increase sensitivity to certain training examples. That means overfitting may come out.

Figure 1. Decision Tree Building Strategy Summarization

Table 1. Pros & Cons of DT

ProsCons
Easy to interpret and understandUnstable
Handle both numeric & categorical featuresNot accurate as other models, e.g. linear regression
Can be modified to handle missing valueEasy to cause high variance
Provide feature selection 
insensitive to monotune features and outliers(as split by value ranking) 

Random Forest

Random Forest utilizes bagging strategy, by generating a set of DTs on bootstrapped training data, and averaging the predictions from these DTs. In this way, the variance has been also decreased by average.

Reference

[1]. BISHOP, C. M. (2016). PATTERN RECOGNITION AND MACHINE LEARNING. S.l.: SPRINGER-VERLAG NEW YORK.

[2]. Murphy, K. P. (2013). Machine learning: a probabilistic perspective. Cambridge, Mass.: MIT Press.

[3]. James, G. (2013). An Introduction to Statistical Learning. Springer.

[4]. https://people.csail.mit.edu/dsontag/courses/ml16/slides/lecture11.pdf


Leave a Comment