Bagging also known as Bootstrap Aggregation is an ensemble technique that uses multiple Decision Tree as its base model and improves the overall performance of the model. An example of a bagging algorithm is Random Forest.
Decision Tree has a major problem of Overfitting which can be resolved by a Bagging algorithm like Random Forest which considers multiple Decision Trees to solve the same problem and the output is aggregated causing variance to reduce and resolve Overfitting.
- Every Decision Tree in Bagging Algorithm is given equal importance.
- Each model is built independently i.e. there is no effect of one model on another.
- Bagging algorithms reduced variance, not bias, hence suitable for high variance low bias problems.
Bootstrapping
It is the process of randomly selecting data with replacement i.e. the data which is already selected can be selected again.
Steps to Implement Bagging
- Decide the number of Decision Trees.
- For every Decision Tree,
- Select samples without replacement.
- Select a subset of features.
- Implement all the models independent of each other.
- Aggregate the output from all the models.
Advantages of Bagging
- Many weak learners are combined to become a strong learner.
- Resolves the Overfitting problem.
Disadvantages of Bagging
- Computationally expensive, since a number of models are trained.
- The final output can have some bias if the proper procedure is ignored.