This tutorial explains how to generate a train-test split from scikit-learn to allow validation of machine learning models with out of sample data.
You'll use hourly weather data for multiple weather stations (origin) for flights from New York airports in 2013.
This tutorial uses:
Open a new Jupyter notebook and import the following:
The data is from rdatasets imported using the Python package statsmodels.
time_hour contains the hour of the observation as a string. Convert it to a datetime as observation_time. year, month, day and hour are duplicates and can be dropped from the dataframe.
Print out the result: