 This tutorial explains how to identify data in columns with the wrong data type with pandas.

### Packages

This tutorial uses:

Open up a Jupyter notebook and import the following:

``````
import pandas as pd
import datetime
import numpy as np
``````

## Creating the data

We will create a dataframe that contains multiple occurrences of duplication for this example.

``````
df = pd.DataFrame({'A': ['text']*20,
'B': [1, 2.2]*10,
'C': [True, False]*10,
'D': pd.to_datetime('2020-01-01')
})
``````

Next, add some mistyped data to the dataframe.

``````
df.iloc[0,0] = 1
df.iloc[1,0] = -2
df.iloc[10,0] = pd.to_datetime('2021-01-01')
df.iloc[5,1] = '2.2'
df.iloc[7,1] = 'A+B'
df.iloc[4,2] = 1
df.iloc[5,2] = 'False'
df.iloc[9,2] = -12.6
df.iloc[12,2] = 'text'
df.iloc[2,3] = 12
df.iloc[12,3] = '2020-01-01'
df
``````

## Identify mistyped data

The function applymap and isinstance will return a Boolean dataframe with True when the data type matches and False when the data type does not match.

Check numeric

``````
numeric = df.applymap(lambda x: isinstance(x, (int, float)))
numeric
``````

Since only column B is supposed to be numeric, this can be made more specific by running applymap only on column B.

``````
numeric = df.applymap(lambda x: isinstance(x, (int, float)))['B']
numeric
``````

Your output should look something like:

``````
0      True
1      True
2      True
3      True
4      True
5     False
6      True
7     False
8      True
9      True
10     True
11     True
12     True
13     True
14     True
15     True
16     True
17     True
18     True
19     True
Name: B, dtype: bool
``````

Using this Boolean series to return the non-numeric data

``````
df[~numeric]
``````

### Check datetime

``````
dt = df.applymap(lambda x: isinstance(x, (datetime.datetime)))['D']
dt
``````

Output:

``````
0      True
1      True
2     False
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12    False
13     True
14     True
15     True
16     True
17     True
18     True
19     True
Name: D, dtype: bool
``````

Using this Boolean series to return the non-numeric data

``````
df[~dt]
``````

### Check string

``````
strings = df.applymap(lambda x: isinstance(x, (str)))['A']
strings
``````

Your output should look something like this:

``````
0     False
1     False
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10    False
11     True
12     True
13     True
14     True
15     True
16     True
17     True
18     True
19     True
Name: A, dtype: bool
``````

Using this Boolean series to return the non-numeric data

``````
df[~strings]
``````

### Check Boolean

``````
torf = df.applymap(lambda x: isinstance(x, (bool)))['C']
torf
``````

You should see the following output:

``````
0      True
1      True
2      True
3      True
4     False
5     False
6      True
7      True
8      True
9     False
10     True
11     True
12    False
13     True
14     True
15     True
16     True
17     True
18     True
19     True
Name: C, dtype: bool
``````

Using this Boolean series to return the non-numeric data

``````
df[~torf]
``````