CatBoost is an open-source library developed by Yandex based on Gradient Boosting Algorithm.
CatBoost name comes from two words Category and Boosting.
- Cat – It can work with multiple Categories of data such as audio, text, image, and numbers.
- Boost – It is based on Gradient Boosting Algorithm.
It has become very popular due to its powerful features.
Features of CatBoost
Below are a few features which makes Catboost algorithm unique:
- No need to Handle Categorical Features
Handling categorical features (converting categorical features into numerical features) is one of the tedious tasks done during data preprocessing which can be avoided while using CatBoost. It will handle the categorical features using logic discussed later in this article. - Works well with default parameters
It is observed that CatBoost will work fine with the default parameters given while creating the library for most of the cases. Hence. no need for hyperparameter tuning. - Works relatively well even with a small amount of data.
- Speed is High. CatBoost can run on both CPU and GPU.
- CPU Speed – Catboost is fastest with large datasets and slowest on small datasets.
- GPU Speed – Catboost is the fastest among all algorithms. With GPU, boosting algorithm can have 50x speed and 100x in the case of Multiclass classification. The more data you have, the faster the GPU works.
- Handles missing values for Numeric features.
Handling missing values (filling missing values with some other relevant value) is one of the tedious tasks done during data preprocessing which can be avoided while using CatBoost. - Time is saved.
- No Hyperparameter tuning is required.
- Prediction time is the least for CatBoost when compared with other machine learning algorithms.
- Easy to use.
- Builds model with high accuracy and least data preparation.
Installation
For Python, you can use the below command on the terminal.
pip install catboost shap sklearn ipywidgets
shap, sklearn, ipywidgets might be used as supportive libraries. Check this link for more detail.