This tutorial explains how to identify and handle duplicate rows with pyrasgo.
This tutorial uses:
Open up a Jupyter Notebook and import the following:
If you haven't done so already, head over to https://docs.rasgoml.com/rasgo-docs/onboarding/initial-setup and follow the steps outlined there to create your free account. This account gives you free access to the Rasgo API which will calculate dataframe profiles, generate feature importance score, and produce feature explainability for you analysis. In addition, this account allows you to maintain access to your analysis and share with your colleagues.
We will create a dataframe that contains multiple occurrences of duplication for this example.
The function evaluate.duplicate_rows will identify duplicates in the data.