An ensemble learning method is one that combines different models and takes the average of each result.
A tree ensemble learning method is an ensemble method based on decision trees.
Most of these methods are based on some kind of optimization on the variance-bias trade-off.
The base estimator is the algorithm used when an ensemble learning method uses the same algorithm on all models (e.g., bagging, boosting).
Diversity
Diversity implies that learning models are independent and not correlated.
An ensemble learning requires some
Ways to achieve diversity:
- Use different training sets
- Use different explanatory variables
- Use different learning parameters (like hyperparameters) ¿o learning algorithms?
- Using different output representation
Bibliography:
- ZHOU, Zhi-Hua, Chapter 4, “Ensemble Methods: Foundations and Algorithms”.
Ways to achieve diversity
- Modify inductors (hyperparameters, starting point, optimization algorithm, etc.
- Modify training sample (resampling)
- Modify output representation
- Modify characteristics / explanatory values
- Hybridization (algorithm combination)
Bibliography:
- ROKACH, Lior. Chapter 4. “Pattern classification using ensemble methods”
ECC: Código de corrección errores is an example of output representation.
Prediction combination
Prediction combination in regression:
- Average
- Median
Average is used in regression.
Classification uses labels or probabilities.
Prediction combination in regression:
- Crisp label
- Probability
Labels are related to one-hot encoding.
Probabilities are related to soft-max activation algorithm.
Voting
Voting uses different training algorithms and then gets the result that has been obtained more times.
Class label voting types for classification:
- Plurality
- Majority
- Unanimity
- Weighting
Plurality looks for the mode.
Voting types for regression are mean and median.
Voting uses average in regression, and in classification it can be soft (crisp label) or hard (probabilities).
Hard voting, that corresponds to both majority voting and plurality voting, is based on crisp labels.
Soft voting is based on probabilities.
Methods
Ensemble learning methods:
- Bagging
- Boosting
- Stacking
| Algorithm | Primary goal | Algorithm | Typical base model | Example |
|---|---|---|---|---|
| Bagging | Reduce variance | Homogeneous | High-variance models (Deep Trees) | Random forest |
| Boosting | Reduce bias | Homogeneous | Low-variance/Weak models (Stumps) | AdaBoost |
| Stacking | Improve predictions | Heterogeneous | High-performance (High-variance) models | Voting |
Bagging
You can read this post about bagging.
Bagging is parallel.
Boosting
You can read this post about boosting.
Boosting is sequential.
Stacking
You can read this post about stacking.
Bibliography
References:
- Ensemble learning [online]. Wikipedia
- Ensemble Learning: Bagging & Boosting [online]. Towards Data Science
- GÉRON, Aurélien. Chapter 7 “Ensemble Learning and Random forests”. In: HOML. 3rd ed. O’Reilly. Pp. 211
- ZHOU, Zhi-Hua. Ensemble Methods: Foundations and Algorithms.
- ROKACH, Lior. Pattern classification using ensemble methods. 2nd ed. 2019.
Related entries
- Machine learning algorithms
- Statistical learning