This tutorial explains how to use the standard scaler encoding from scikit-learn. This scaler normalizes the data by subtracting the mean and dividing by the standard deviation.
This tutorial will data for flights in and out of NYC in 2013.
This tutorial uses:
Open up a new Jupyter notebook and import the following:
The data is from rdatasets imported using the Python package statsmodels.
As this model will predict arrival delay, the Null values are caused by flights did were cancelled or diverted. These can be excluded from this analysis.
We convert the categorical features to numerical through the leave one out encoder in categorical_encoders. This leaves a single numeric feature in the place of each existing categorical feature. This is needed to apply the scaler to all features in the training data.
We apply the standard scaler from scikit-learn.
Scale the test set. This can now be passed into the predict or predict_proba functions of a trained model.