This tutorial explains how to use the robust scaler encoding from scikit-learn. This scaler normalizes the data by subtracting the median and dividing by the interquartile range. This scaler is robust to outliers unlike the standard scaler.
For this tutorial you'll be using data for flights in and out of NYC in 2013.
This tutorial uses:
Open up a new Jupyter notebook and import the following:
The data is from rdatasets imported using the Python package statsmodels.
As this model will predict arrival delay, the Null values are caused by flights did were cancelled or diverted. These can be excluded from this analysis.
We convert the categorical features to numerical through the leave one out encoder in categorical_encoders. This leaves a single numeric feature in the place of each existing categorical feature. This is needed to apply the scaler to all features in the training data.
We apply the robust scaler from scikit-learn.
Scale the test set. This can now be passed into the predict or predict_proba functions of a trained model.