This tutorial explains how to aggregate time series data in pandas using the gold and silver price data from rdatasets.
Packages
The documentation for each package used in this tutorial is linked below:
Open up a Jupyter notebook and import the following:
import statsmodels.api as sm
import pandas as pd
Create Initial Dataset
The data is from rdatasets imported using the Python package statsmodels.
df = sm.datasets.get_rdataset('GoldSilver', 'AER').data.reset_index().rename(columns={'index': 'date'})
df.info()
df['date'] = pd.to_datetime(df.date)
Time series aggregation
The pandas function resample can be used to create aggregations on specified windows. Here, a weekly aggregate of the daily gold and silver price data will be created.
First, a datetime index needs to be created from the date column.
df.set_index('date', inplace=True)
The parameter origin is used to specify the day of week to start the aggregation on. January 1, 1978 is a Sunday, so each of these aggregations will start on Sunday and aggregate over the subsequent week.
weekly_resample = df.resample('7D', origin=pd.to_datetime('1978-01-01'))
aggregated_df = weekly_resample.agg(['min', 'mean', 'max', 'std'])
aggregated_df.columns = ['_'.join(col).strip() + '_week' for col in aggregated_df.columns.values]
aggregated_df.head(20)
gold_min_week gold_mean_week gold_max_week gold_std_week silver_min_week silver_mean_week silver_max_week silver_std_week
date
1977-12-25 100.00 100.000 100.00 NaN 223.42 223.420 223.42 NaN
1978-01-01 100.00 100.000 100.00 0.000000 223.42 226.604 229.84 2.596657
1978-01-08 100.88 101.586 102.62 0.796825 227.89 230.050 232.55 1.846334
1978-01-15 100.87 101.832 103.09 0.796976 227.96 229.686 232.49 1.755130
1978-01-22 102.49 103.306 103.62 0.475742 230.37 232.798 234.50 1.777630
1978-01-29 101.96 102.578 103.49 0.636530 226.91 229.116 231.61 1.936835
1978-02-05 101.90 102.572 103.15 0.505193 224.98 227.506 229.90 1.875361
1978-02-12 103.92 104.936 105.87 0.850312 228.16 231.506 234.01 2.412370
1978-02-19 105.14 105.956 107.23 0.822028 229.81 232.392 234.55 1.784550
1978-02-26 106.20 107.076 107.60 0.538451 229.83 231.092 232.22 1.043681
1978-03-05 108.89 109.916 111.03 0.791789 235.33 244.862 248.98 5.619695
1978-03-12 106.05 108.042 110.39 1.638740 248.09 251.940 254.95 2.875239
1978-03-19 102.65 104.458 105.53 1.355865 240.71 243.672 246.05 2.578317
1978-03-26 104.66 106.180 107.29 1.119710 246.81 249.628 253.55 2.764710
1978-04-02 103.33 103.968 105.27 0.771894 241.89 243.780 246.38 1.809627
1978-04-09 102.99 104.120 105.07 0.806908 241.23 243.816 247.27 2.410940
1978-04-16 99.11 99.996 100.93 0.852397 230.53 232.784 235.15 2.045527
1978-04-23 97.45 98.234 99.41 0.856756 225.99 228.588 230.49 1.848194
1978-04-30 98.38 99.778 100.92 1.143119 228.87 232.358 234.71 2.439912
1978-05-07 100.11 100.874 101.51 0.644306 232.08 233.124 234.55 0.905003