This post is an introduction to ensemble learning.
Methods
Ensemble learning methods:
- Voting classifier
- Bagging
- Pasting
- Random forest
- Boosting
- Stacking
Voting classifier
Voting classifier uses different training algorithms and then gets the result that has been obtained more times.
Pasting
Pasting applies the same training algorithm on random subsets of the same set without replacement.
Replacement, in the context of statistical learning, means that the training set may include the same instance of data more than once.
Because it can be parallelized, it scales well.
Bagging
Bagging trains models on random subsets with replacement.
Bagging introduces a bit more diversity in the training subsets, so there subsets are comparatively more different between them compared to the ones produced in pasting.
As the subset data may be distorted (because of duplication or missing data), bagging ends up with a slightly higher bias than pasting.
On the other hand, the extra diversity also means that the predictors end up being less correlated, i.e. there are more differences and disagreement between them. This makes that the average may improve the results, so the ensemble’s variance is reduced.
Bagging can be explained intuitively with the sentence “slightly worse trees, but much better forest“.
Because it can be parallelized, it scales well.
Out-of-bag evaluation
Out-of-bag (OOB) evaluation is a method to evaluate a bagging model’s performance.
On average, each bootstrap sample contains about 63% unique training points, so about 37% are left out. This is calculated because as m grows, the ratio approaches 1 – exp(-1), what is 63%.
If there are enough estimators, then each instance in the training set will likely be an OOB instance of several estimators.
Each unused instance can be used to test only the predictors where the instance wasn’t used on training and compare the average result against the true result. Then, calculate the overall error based on each instance error.
Random patches mean sampling both training instances and features.
Random subspaces mean sampling only features.
Random forest
You can read this post about random forest.
Boosting
Hypothesis boosting (Boosting) refers to any ensemble method that combines several weak learners into a strong learners.
Boosting methods:
- AdaBoost
- Gradient boosting
Adaptative boosting (AdaBoost) is a boosting method where the most underfitted instances get more weight when training a new predecessor.
Once all predecessors are trained, a weight is assigned to each predictors depending on their overall accuracy and the prediction is done.
The training is lineal so it doesn’t scale as well as bagging or pasting.
Reference:
Gradient boosting is a boosting method where the residual errors is taken into account to fit a new predictor.
Histogram-based gradient boosting (HGB) uses bins as in histograms.
Reference:
- BREIMAN, Leo. Arcing the Edge [online]. 1997
- FRIEDMAN, Jerome H. Greedy Function Approximation: A Gradient Boosting Machine [online]. 1999
Reference:
- Chapter 7 “Ensemble Learning and Random forests”, section “Boosting”. In: HOML.
Stacking
Stacked generalization, called stacking, uses a blending predictor to aggregate the predictions done by the base predictors.
The blending predictor accepts as the input dimensions the different predictions of the base predictors (produces by applying k-fold), and also takes the expected value from the original data set. Then it is fitted to learn on which base predictor to trust on which situation.
References:
- Stacked generalization [online].
Bibliography
References:
- Ensemble learning [online]. Wikipedia
- GÉRON, Aurélien. Chapter 7 “Ensemble Learning and Random forests”. In: HOML. 3rd ed. O’Reilly.
Related entries
- Machine learning algorithms
- Statistical learning