Tree Series 1: Decision Tree, Random Forest

Published: May 16, 2018

Tree is one of the most widely used model, with a large family(regression tree, classification tree, bagging: RF, boosting: GBDT), and implementations(classification: [ID3, C4.5, CART], regression: CART). This series of posts will start from a brief introduction of the basic principles of DT, RF and GBDT, then go into details of GBDT and other boosting techniques(e.g. lightgbm, xgboost, catboost), and dive deeper by making comparisons.

Decision Tree

There are two main stages for generating a decision tree: constructing a full tree by greedily find the best split, pruning the tree to avoid overfitting. Details are summarized in the figure below. An extra part is about comparing different classification splitting measurements: sometimes misclassification rate is not sensitive enough for growing a tree, e.g. when it’s a binary classification problem(limited #class), and in each partition region, the majority class happens to be the same one. Gini and info gain are relatively more sensitive than error rate, but both of them have a bias towards splitting with larger k, making more splits, partitioning into smaller sub-regions, which will lead to increase sensitivity to certain training examples. That means overfitting may come out.

Figure 1. Decision Tree Building Strategy Summarization

Table 1. Pros & Cons of DT

Pros	Cons
Easy to interpret and understand	Unstable
Handle both numeric & categorical features	Not accurate as other models, e.g. linear regression
Can be modified to handle missing value	Easy to cause high variance
Provide feature selection
insensitive to monotune features and outliers(as split by value ranking)

Random Forest

Random Forest utilizes bagging strategy, by generating a set of DTs on bootstrapped training data, and averaging the predictions from these DTs. In this way, the variance has been also decreased by average.

Reference

[1]. BISHOP, C. M. (2016). PATTERN RECOGNITION AND MACHINE LEARNING. S.l.: SPRINGER-VERLAG NEW YORK.

[2]. Murphy, K. P. (2013). Machine learning: a probabilistic perspective. Cambridge, Mass.: MIT Press.

[3]. James, G. (2013). An Introduction to Statistical Learning. Springer.

[4]. https://people.csail.mit.edu/dsontag/courses/ml16/slides/lecture11.pdf

Share on

Twitter Facebook Google+ LinkedIn

Yanpu/ Lesley

Tree Series 1: Decision Tree, Random Forest

Decision Tree

Random Forest

Reference

Share on

Leave a Comment

You May Also Enjoy

Quick Notes: Variation of GAN (Stacked GAN, StackGAN++, PA-GAN)

Quick Notes: Useful Terms & Concepts in NLP: BOW, POS, Chunking, Word Embedding

Quick Notes: Summarization about Current NER Methods in Deep Learning

Useful Concept for Medical/Healthcare Data Risk Prediction