This tutorial explains how to generate a time series split from pyrasgo to allow out of time validation of machine learning models.
You'll use hourly weather data for multiple weather stations (origin) for flights from New York airports in 2013.
This tutorial uses:
Open up a Jupyter notebook and import the following:
If you haven't done so already, head over to https://docs.rasgoml.com/rasgo-docs/onboarding/initial-setup and follow the steps outlined there to create your free account. This account gives you free access to the Rasgo API which will calculate dataframe profiles, generate feature importance score, and produce feature explainability for you analysis. In addition, this account allows you to maintain access to your analysis and share with your colleagues.
The data is from rdatasets imported using the Python package statsmodels.
time_hour contains the hour of the observation as a string. Convert it to a datetime as observation_time. year, month, day and hour are duplicates and can be dropped from the dataframe.
The function evaluate.train_test_split will split a dataframe into a train and test dataframe.
The observation_time has become a datetime index of the dataframe. For ease of use, we will reset the index and rename it observation_time.