This tutorial explains how to generate feature importance plots from XGBoost using tree-based feature importance, permutation importance and shap.
During this tutorial you will build and evaluate a model to predict arrival delay for flights in and out of NYC in 2013.
This tutorial uses:
Open a new Jupyter notebook and import the following:
The data is from rdatasets imported using the Python package statsmodels.
As this model will predict arrival delay, the Null values are caused by flights did were cancelled or diverted. These can be excluded from this analysis.
We use a leave-one-out encoder as it creates a single column for each categorical variable instead of creating a column for each level of the categorical variable like one-hot-encoding. This makes interpreting the impact of categorical variables with feature impact easier.
SHAP contains a function to plot this directly.