Ensemble learning

An ensemble learning method is one that combines different models and takes the average of each result.

A tree ensemble learning method is an ensemble method based on decision trees.

Most of these methods are based on some kind of optimization on the variance-bias trade-off.

The base estimator is the algorithm used when an ensemble learning method uses the same algorithm on all models (e.g., bagging, boosting).

Diversity

Diversity implies that learning models are independent and not correlated.

An ensemble learning requires some

Ways to achieve diversity:

  • Use different training sets
  • Use different explanatory variables
  • Use different learning parameters (like hyperparameters) ¿o learning algorithms?
  • Using different output representation

Bibliography:

  • ZHOU, Zhi-Hua, Chapter 4, “Ensemble Methods: Foundations and Algorithms”.

Ways to achieve diversity

  • Modify inductors (hyperparameters, starting point, optimization algorithm, etc.
  • Modify training sample (resampling)
  • Modify output representation
  • Modify characteristics / explanatory values
  • Hybridization (algorithm combination)

Bibliography:

  • ROKACH, Lior. Chapter 4. “Pattern classification using ensemble methods”

ECC: Código de corrección errores is an example of output representation.

Prediction combination

Prediction combination in regression:

  • Average
  • Median

Average is used in regression.

Classification uses labels or probabilities.

Prediction combination in regression:

  • Crisp label
  • Probability

Labels are related to one-hot encoding.

Probabilities are related to soft-max activation algorithm.

Voting

Voting uses different training algorithms and then gets the result that has been obtained more times.

Class label voting types for classification:

  • Plurality
  • Majority
  • Unanimity
  • Weighting

Plurality looks for the mode.

Voting types for regression are mean and median.

Voting uses average in regression, and in classification it can be soft (crisp label) or hard (probabilities).

Hard voting, that corresponds to both majority voting and plurality voting, is based on crisp labels.

Soft voting is based on probabilities.

Methods

Ensemble learning methods:

  • Bagging
  • Boosting
  • Stacking
AlgorithmPrimary goalAlgorithmTypical base modelExample
BaggingReduce varianceHomogeneousHigh-variance models (Deep Trees)Random forest
BoostingReduce biasHomogeneousLow-variance/Weak models (Stumps)AdaBoost
StackingImprove predictionsHeterogeneousHigh-performance (High-variance) modelsVoting

Bagging

You can read this post about bagging.

Bagging is parallel.

Boosting

You can read this post about boosting.

Boosting is sequential.

Stacking

You can read this post about stacking.

Bibliography

References:

  • Ensemble learning [online]. Wikipedia
  • Ensemble Learning: Bagging & Boosting [online]. Towards Data Science
  • GÉRON, Aurélien. Chapter 7 “Ensemble Learning and Random forests”. In: HOML. 3rd ed. O’Reilly. Pp. 211
  • ZHOU, Zhi-Hua. Ensemble Methods: Foundations and Algorithms.
  • ROKACH, Lior. Pattern classification using ensemble methods. 2nd ed. 2019.

Related entries

  • Machine learning algorithms
  • Statistical learning

Leave a Reply

Your email address will not be published. Required fields are marked *