Pandas Missing Data

This tutorial explains how to identify missing data with pandas.

Packages

This tutorial uses:

pandas

‍

Open a Jupyter Notebook and enter the following:


import pandas as pd

Creating the data

‍

We will create a dataframe that contains multiple occurrences of duplication for this example.


df = pd.DataFrame({'A': ['text']*20,
                   'B': [1, 2.2]*10,
                   'C': [True, False]*10,
                   'D': pd.to_datetime('2020-01-01')
                  })

Next, delete some of the entries to create missing data.


df.iloc[0,0] = None
df.iloc[1,0] = None
df.iloc[10,0] = None
df.iloc[5,1] = None
df.iloc[7,1] = None
df.iloc[4,2] = None
df.iloc[5,2] = None
df.iloc[9,2] = None
df.iloc[12,2] = None
df.iloc[2,3] = None
df.iloc[12,3] = None
df

Identify missing data

The function isna will identify duplicates in the data.


missing = df.isna()
missing

Use sum to get the count of missing values in each column.


missing.sum()

The rows that contain missing data can be selected using the pandas function any with axis set to 1.


anymissing = missing.any(axis=1)
anymissing


df[anymissing]

Identify Missing Data with Pandas

Pandas Missing Data

Packages

Creating the data

Identify missing data

No-code/low-code data prep and visualization

Get your data science on.

Book a Enterprise GPT Demo

Identify Missing Data with Pandas

Pandas Missing Data

Packages

Creating the data

Identify missing data

No-code/low-code data prep and visualization

Get your data science on.

Book a
Enterprise GPT Demo