This tutorial explains how to identify data in columns with the wrong data type with pyrasgo.
This tutorial uses:
Open a new Jupyter Notebook and import the following:
If you haven't done so already, head over to https://docs.rasgoml.com/rasgo-docs/onboarding/initial-setup and follow the steps outlined there to create your free account. This account gives you free access to the Rasgo API which will calculate dataframe profiles, generate feature importance score, and produce feature explainability for you analysis. In addition, this account allows you to maintain access to your analysis and share with your colleagues.
We will create a dataframe that contains multiple occurrences of duplication for this example.
Next, add some mistyped data to the dataframe.
Your dataframe should look something like:
The function evaluate.type_mismatches will cast column to data_type and return a dataframe containing the recast column with elements that were of the wrong type as NaN.
Convert this to a Boolean series using the pandas function isnull and use that series to return the non-numeric data
Convert this to a Boolean series using the pandas function isnull and use that series to return the data that is not a datetime.