Join us for thirty minutes and we'll show you the future of self-service analytics, powered by GPT.
This notebook explains how to use pyrasgo to create feature profiles of a pandas dataframe.
This tutorial uses:
Open up a Jupyter Notebook and enter the following:
import statsmodels.api as sm
import pandas as pd
import numpy as np
import pyrasgo
If you haven't done so already, head over to https://docs.rasgoml.com/rasgo-docs/onboarding/initial-setup and follow the steps outlined there to create your free account. This account gives you free access to the Rasgo API which will calculate dataframe profiles, generate feature importance score, and produce feature explainability for you analysis. In addition, this account allows you to maintain access to your analysis and share with your colleagues.
rasgo = pyrasgo.login(email='', password='')
The data is from rdatasets imported using the Python package statsmodels.
df = sm.datasets.get_rdataset('flights', 'nycflights13').data
Convert some of the fields into more meaningful fields to better understand the time flights depart and arrive. Next the original fields are dropped as they are now redundant.
df.dropna(inplace=True)
df['arr_hour'] = df.arr_time.apply(lambda x: int(np.floor(x/100)))
df['arr_minute'] = df.arr_time.apply(lambda x: int(x - np.floor(x/100)*100))
df['sched_arr_hour'] = df.sched_arr_time.apply(lambda x: int(np.floor(x/100)))
df['sched_arr_minute'] = df.sched_arr_time.apply(lambda x: int(x - np.floor(x/100)*100))
df['sched_dep_hour'] = df.sched_dep_time.apply(lambda x: int(np.floor(x/100)))
df['sched_dep_minute'] = df.sched_dep_time.apply(lambda x: int(x - np.floor(x/100)*100))
df.rename(columns={'hour': 'dep_hour',
'minute': 'dep_minute'}, inplace=True)
df.drop(columns=['time_hour', 'dep_time', 'sched_dep_time', 'arr_time', 'sched_arr_time', 'dep_delay'], inplace=True)
response = rasgo.evaluate.profile(df)
response
Open source data transformations, without having to write SQL. Choose from a wide selection of predefined transforms that can be exported to DBT or native SQL.