LATEST NEWS:

Grab your copy of the free SaaS Metric Playbook! : Vital metrics you’ll need from leaders in SaaS and VC. >>

Rasgo Feature Store for Data Science
Tutorials
Community Dropdown button
Rasgo Quick Start
Data Analysis
Accelerators
Feature Engineering
Docs
Docs Homepage
Rasgo Quickstart
Tutorials
Insight Accelerators
Data Analysis
Model Accelerators
Feature Engineering
Community
Slack
GitHub
Community
Community Dropdown button
Slack
GitHub
Blogs
close menu overlay button
Login
Try For Free
Try For Free!
menu icon button

Sign-Up For Your
Free 30-Day Trial!

Rasgo can be configured to your data and dbt/git environments in under 20 minutes. Book time with your personal onboarding concierge and we'll get you all setup!

Not ready for a free trial?
Not ready for a free trial?
Private Demo
Click here to schedule time for a private demo
Book Demo
Try Rasgo’s FREE SQL Generator
A low-code web app to construct a SQL Query
SQL Generator
Tutorials that help Data Scientists get their pandas on.

Data Cleaning

Data Cleaning

Tutorials

How To Identify Date Gaps in Time Series Data with Pandas

How To Identify Data Gaps with Pyrasgo

How To Handle Data Type Mismatch In Pandas

How To Handle Data Type Mismatch with PyRasgo

Identify Duplicate Data in Pandas

Identify Duplicate Data in PyRasgo

Identify Missing Data with Pandas

Identify Missing Data with PyRasgo

Additional Featured Engineering Tutorials

Model Selection

Feature Transformation

Feature Selection

Feature Profiling

Feature Importance

How to Create Lag Variables in Pandas

This tutorial explains how to create lag variables in pandas using gold and silver price data from rdatasets.

Packages

The documentation for each package used in this tutorial is linked below:

  • pandas
  • statsmodels
  • statsmodels.api

import statsmodels.api as sm
import pandas as pd

Create initial dataset

The data is from rdatasets imported using the Python package statsmodels.


df = sm.datasets.get_rdataset('GoldSilver', 'AER').data.reset_index().rename(columns={'index': 'date'})
df.info()

‍


df['date'] = pd.to_datetime(df.date)

Lag Variables

Create lag variables, using the shift function. shift(1) creates a lag of a single record, while shift(5) creates a lag of five records.


df.sort_values('date', inplace=True)
df['silver_lag_1'] = df['silver'].shift(1)
df['silver_lag_5'] = df['silver'].shift(5)

df.head(20)

	date	gold	silver	silver_lag_1	silver_lag_5
0	1977-12-30	100.00	223.42	NaN	NaN
1	1978-01-02	100.00	223.42	223.42	NaN
2	1978-01-03	100.00	229.84	223.42	NaN
3	1978-01-04	100.00	224.58	229.84	NaN
4	1978-01-05	100.00	227.99	224.58	NaN
5	1978-01-06	100.00	227.19	227.99	223.42
6	1978-01-09	101.23	229.62	227.19	223.42
7	1978-01-10	100.95	228.97	229.62	229.84
8	1978-01-11	102.25	231.22	228.97	224.58
9	1978-01-12	100.88	227.89	231.22	227.99
10	1978-01-13	102.62	232.55	227.89	227.19
11	1978-01-16	103.09	232.49	232.55	229.62
12	1978-01-17	101.78	230.18	232.49	228.97
13	1978-01-18	100.87	227.96	230.18	231.22
14	1978-01-19	101.70	228.88	227.96	227.89
15	1978-01-20	101.72	228.92	228.88	232.55
16	1978-01-23	103.62	234.00	228.92	232.49
17	1978-01-24	103.59	233.63	234.00	230.18
18	1978-01-25	103.55	234.50	233.63	227.96
19	1978-01-26	103.28	231.49	234.50	228.88

‍

This creates a lag variable based on the prior observations, but shift can also take a time offset to specify the time to use in shift. For example, 1D and 5D can be used to lag by 1 and 5 days respectively.

First, a datetime index must be created from date.


df.set_index('date', inplace=True)

df['silver_lag_1d'] = df['silver'].shift(freq='1D')
df['silver_lag_5d'] = df['silver'].shift(freq='5D')

df.head(20)

	gold	silver	silver_lag_1	silver_lag_5	silver_lag_1d	silver_lag_5d
date						
1977-12-30	100.00	223.42	NaN	NaN	NaN	NaN
1978-01-02	100.00	223.42	223.42	NaN	NaN	NaN
1978-01-03	100.00	229.84	223.42	NaN	223.42	NaN
1978-01-04	100.00	224.58	229.84	NaN	229.84	223.42
1978-01-05	100.00	227.99	224.58	NaN	224.58	NaN
1978-01-06	100.00	227.19	227.99	223.42	227.99	NaN
1978-01-09	101.23	229.62	227.19	223.42	NaN	224.58
1978-01-10	100.95	228.97	229.62	229.84	229.62	227.99
1978-01-11	102.25	231.22	228.97	224.58	228.97	227.19
1978-01-12	100.88	227.89	231.22	227.99	231.22	NaN
1978-01-13	102.62	232.55	227.89	227.19	227.89	NaN
1978-01-16	103.09	232.49	232.55	229.62	NaN	231.22
1978-01-17	101.78	230.18	232.49	228.97	232.49	227.89
1978-01-18	100.87	227.96	230.18	231.22	230.18	232.55
1978-01-19	101.70	228.88	227.96	227.89	227.96	NaN
1978-01-20	101.72	228.92	228.88	232.55	228.88	NaN
1978-01-23	103.62	234.00	228.92	232.49	NaN	227.96
1978-01-24	103.59	233.63	234.00	230.18	234.00	228.88
1978-01-25	103.55	234.50	233.63	227.96	233.63	228.92
1978-01-26	103.28	231.49	234.50	228.88	234.50	NaN
Try RasgoQL

Open source data transformations, without having to write SQL. Choose from a wide selection of predefined transforms that can be exported to DBT or native SQL.

Explore on Github
No items found.
How To Identify Date Gaps in Time Series Data with Pandas
How To Identify Data Gaps with Pyrasgo
How To Handle Data Type Mismatch In Pandas
How To Handle Data Type Mismatch with PyRasgo
Identify Duplicate Data in Pandas
Identify Duplicate Data in PyRasgo
Identify Missing Data with Pandas
Identify Missing Data with PyRasgo

© RASGO Intelligence, Inc. All rights reserved.

TUtorials
Rasgo Quick StartInsight AcceleratorsData AnalysisModel AcceleratorsFeature Engineering
COMMUNitY
GitHubSlackBlog
COMPANY
About Careers Privacy PolicyTerms of ServiceContact