This tutorial explains how to use leave one out encoding from category_encoders. Leave one out encoding is just target encoding where the average or expected value is calculated ignoring the value in the current row.
This tutorial will data for flights in and out of NYC in 2013.
This tutorial uses:
The data is from rdatasets imported using the Python package statsmodels.
As this model will predict arrival delay, the Null values are caused by flights did were cancelled or diverted. These can be excluded from this analysis.
You should get something back like this:
We use a leave-one-out encoder as it creates a single column for each categorical variable instead of creating a column for each level of the categorical variable like one-hot-encoding. This makes interpreting the impact of categorical variables with feature impact easier. Models can now be trained with any modeling algorithm with the feature set contained in X_train_loo
Encode the test set. This can now be passed into the predict or predict_proba functions of a trained model.