Pruning is a technique in a decision tree that reduces its size by removing sections of the tree that are redundant or non-critical in the classification or regression task.
Methods of Pruning
- Early Stopping or Pre-Pruning
- Post-Pruning or Backward Pruning
1. Early Stopping or Pre-Pruning
It is the process of stopping a tree from splitting further before actually completing it. This can be done by selecting the length of the tree in advance.
Problems
- Deciding the length of the tree.
- In some cases, we get more info by splitting on one side of the tree.
In the above tree, more splits are on the left side. But if we decide on length in advance, it will apply to both sides and important splits may be removed.
To solve problem 2, we can stop further splitting when there is no significant improvement in the score like Information Gain.
2. Post-Pruning or Backward Pruning
In this method, a complete Decision Tree is created using the default stopping condition and then non-significant branches are pruned. Non-Significant branches are those splitting on which does not reduce much of the randomness in data.
Post-Pruning is the most commonly used Pruning method.