Rasgo can be configured to your data and dbt/git environments in under 20 minutes. Book time with your personal onboarding concierge and we'll get you all setup!
This tutorial explains how to create lag variables in pandas using gold and silver price data from rdatasets.
The documentation for each package used in this tutorial is linked below:
import statsmodels.api as sm
import pandas as pd
The data is from rdatasets imported using the Python package statsmodels.
df = sm.datasets.get_rdataset('GoldSilver', 'AER').data.reset_index().rename(columns={'index': 'date'})
df.info()
df['date'] = pd.to_datetime(df.date)
Create lag variables, using the shift function. shift(1) creates a lag of a single record, while shift(5) creates a lag of five records.
df.sort_values('date', inplace=True)
df['silver_lag_1'] = df['silver'].shift(1)
df['silver_lag_5'] = df['silver'].shift(5)
df.head(20)
date gold silver silver_lag_1 silver_lag_5
0 1977-12-30 100.00 223.42 NaN NaN
1 1978-01-02 100.00 223.42 223.42 NaN
2 1978-01-03 100.00 229.84 223.42 NaN
3 1978-01-04 100.00 224.58 229.84 NaN
4 1978-01-05 100.00 227.99 224.58 NaN
5 1978-01-06 100.00 227.19 227.99 223.42
6 1978-01-09 101.23 229.62 227.19 223.42
7 1978-01-10 100.95 228.97 229.62 229.84
8 1978-01-11 102.25 231.22 228.97 224.58
9 1978-01-12 100.88 227.89 231.22 227.99
10 1978-01-13 102.62 232.55 227.89 227.19
11 1978-01-16 103.09 232.49 232.55 229.62
12 1978-01-17 101.78 230.18 232.49 228.97
13 1978-01-18 100.87 227.96 230.18 231.22
14 1978-01-19 101.70 228.88 227.96 227.89
15 1978-01-20 101.72 228.92 228.88 232.55
16 1978-01-23 103.62 234.00 228.92 232.49
17 1978-01-24 103.59 233.63 234.00 230.18
18 1978-01-25 103.55 234.50 233.63 227.96
19 1978-01-26 103.28 231.49 234.50 228.88
This creates a lag variable based on the prior observations, but shift can also take a time offset to specify the time to use in shift. For example, 1D and 5D can be used to lag by 1 and 5 days respectively.
First, a datetime index must be created from date.
df.set_index('date', inplace=True)
df['silver_lag_1d'] = df['silver'].shift(freq='1D')
df['silver_lag_5d'] = df['silver'].shift(freq='5D')
df.head(20)
gold silver silver_lag_1 silver_lag_5 silver_lag_1d silver_lag_5d
date
1977-12-30 100.00 223.42 NaN NaN NaN NaN
1978-01-02 100.00 223.42 223.42 NaN NaN NaN
1978-01-03 100.00 229.84 223.42 NaN 223.42 NaN
1978-01-04 100.00 224.58 229.84 NaN 229.84 223.42
1978-01-05 100.00 227.99 224.58 NaN 224.58 NaN
1978-01-06 100.00 227.19 227.99 223.42 227.99 NaN
1978-01-09 101.23 229.62 227.19 223.42 NaN 224.58
1978-01-10 100.95 228.97 229.62 229.84 229.62 227.99
1978-01-11 102.25 231.22 228.97 224.58 228.97 227.19
1978-01-12 100.88 227.89 231.22 227.99 231.22 NaN
1978-01-13 102.62 232.55 227.89 227.19 227.89 NaN
1978-01-16 103.09 232.49 232.55 229.62 NaN 231.22
1978-01-17 101.78 230.18 232.49 228.97 232.49 227.89
1978-01-18 100.87 227.96 230.18 231.22 230.18 232.55
1978-01-19 101.70 228.88 227.96 227.89 227.96 NaN
1978-01-20 101.72 228.92 228.88 232.55 228.88 NaN
1978-01-23 103.62 234.00 228.92 232.49 NaN 227.96
1978-01-24 103.59 233.63 234.00 230.18 234.00 228.88
1978-01-25 103.55 234.50 233.63 227.96 233.63 228.92
1978-01-26 103.28 231.49 234.50 228.88 234.50 NaN
Open source data transformations, without having to write SQL. Choose from a wide selection of predefined transforms that can be exported to DBT or native SQL.