Boosting Algorithms is one of the ensemble learning algorithms that combine many weak learners to form a strong learner. The base model is the Decision Tree. All the ensemble learning algorithms use multiple decision trees and predict the combined output, unlike other algorithms which use just one machine learning model. The other ensemble learning algorithms are **stacking** and **bagging**.

## Types of Boosting Algorithms

There are three types of Boosting algorithms:

- AdaBoost (Adaptive Boosting) algorithm
- Gradient Boosting algorithm
- XG Boost algorithm

## Working of Boosting Algorithms

The idea behind the working of boosting algorithms is that it adds ensemble members sequentially which corrects the prediction made by the previous members and outputs the weighted average of the predictions.

- Training data is passed to model 1 and results are predicted.
- Model 2 tries to correct the predictions incorrectly predicted by model 1.
- This process continues till Model n and their weighted average is combined and returned as the predicted output (n is the hyperparameter).

## Difference between Bagging and Boosting Algorithms

Bagging Algorithms |
Boosting Algorithms |

All models are given equal weightage. | Models are given weight according to their performance. |

Models are built independently. | Models are based on the previous models. |

Training data is selected randomly. | Data is selected that were misclassified. |

Reduce Variance, hence suitable for high variance and low bias models. | Reduce Bias, hence suitable for high bias and low variance models. |

You can learn about the bagging algorithm from here.

## Advantages

- Combines many weak learners and convert them into a strong learner that is when a single decision tree is not performing well, it combines many decision trees and the combined model performs well.

## Disadvantages

- It is sensitive to outliers, every new model is expected to fix the errors in the previous model. Thus, the method is too dependent on outliers.
- Boosting Algorithm is almost impossible to scale up because every new model bases its correctness on the previous model, thus making the procedure difficult to streamline.