LATEST NEWS:

Grab your copy of the free SaaS Metric Playbook! : Vital metrics you’ll need from leaders in SaaS and VC. >>

Rasgo Feature Store for Data Science
Tutorials
Community Dropdown button
Rasgo Quick Start
Data Analysis
Accelerators
Feature Engineering
Docs
Docs Homepage
Rasgo Quickstart
Tutorials
Insight Accelerators
Data Analysis
Model Accelerators
Feature Engineering
Community
Slack
GitHub
Community
Community Dropdown button
Slack
GitHub
Blogs
close menu overlay button
Login
Try For Free
Try For Free!
menu icon button

Sign-Up For Your
Free 30-Day Trial!

Rasgo can be configured to your data and dbt/git environments in under 20 minutes. Book time with your personal onboarding concierge and we'll get you all setup!

Not ready for a free trial?
Not ready for a free trial?
Private Demo
Click here to schedule time for a private demo
Book Demo
Try Rasgo’s FREE SQL Generator
A low-code web app to construct a SQL Query
SQL Generator
Tutorials that help Data Scientists get their pandas on.

Feature Profiling

Feature Profiling

Tutorials

Feature Profiling with PyRasgo

Feature profiling using pandas-profiling

Feature Profiling using SweetViz

Additional Featured Engineering Tutorials

Data Cleaning

Model Selection

Feature Transformation

Feature Selection

Feature Importance

Feature Profiling with PyRasgo

Feature Profiling with PyRasgo

This notebook explains how to use pyrasgo to create feature profiles of a pandas dataframe.

‍

Packages

This tutorial uses:

  • pandas
  • statsmodels
  • statsmodels.api
  • numpy
  • PyRasgo

Open up a Jupyter Notebook and enter the following:


import statsmodels.api as sm
import pandas as pd
import numpy as np

import pyrasgo

Connect to Rasgo

If you haven't done so already, head over to https://docs.rasgoml.com/rasgo-docs/onboarding/initial-setup and follow the steps outlined there to create your free account. This account gives you free access to the Rasgo API which will calculate dataframe profiles, generate feature importance score, and produce feature explainability for you analysis. In addition, this account allows you to maintain access to your analysis and share with your colleagues.


rasgo = pyrasgo.login(email='', password='')

‍

Reading the data

The data is from rdatasets imported using the Python package statsmodels.


df = sm.datasets.get_rdataset('flights', 'nycflights13').data

‍

Feature Engineering

Convert the times from floats or ints to hour and minutes

Convert some of the fields into more meaningful fields to better understand the time flights depart and arrive. Next the original fields are dropped as they are now redundant.


df.dropna(inplace=True)
df['arr_hour'] = df.arr_time.apply(lambda x: int(np.floor(x/100)))
df['arr_minute'] = df.arr_time.apply(lambda x: int(x - np.floor(x/100)*100))
df['sched_arr_hour'] = df.sched_arr_time.apply(lambda x: int(np.floor(x/100)))
df['sched_arr_minute'] = df.sched_arr_time.apply(lambda x: int(x - np.floor(x/100)*100))
df['sched_dep_hour'] = df.sched_dep_time.apply(lambda x: int(np.floor(x/100)))
df['sched_dep_minute'] = df.sched_dep_time.apply(lambda x: int(x - np.floor(x/100)*100))
df.rename(columns={'hour': 'dep_hour',
                   'minute': 'dep_minute'}, inplace=True)
df.drop(columns=['time_hour', 'dep_time', 'sched_dep_time', 'arr_time', 'sched_arr_time', 'dep_delay'], inplace=True)

‍

Profile Features


response = rasgo.evaluate.profile(df)
response
Try RasgoQL

Open source data transformations, without having to write SQL. Choose from a wide selection of predefined transforms that can be exported to DBT or native SQL.

Explore on Github
No items found.
Feature Profiling with PyRasgo
Feature profiling using pandas-profiling
Feature Profiling using SweetViz

© RASGO Intelligence, Inc. All rights reserved.

TUtorials
Rasgo Quick StartInsight AcceleratorsData AnalysisModel AcceleratorsFeature Engineering
COMMUNitY
GitHubSlackBlog
COMPANY
About Careers Privacy PolicyTerms of ServiceContact