Close

How to check if any value is NaN in a pandas DataFrame

Posted by: AJ Welch

The official documentation for pandas defines what most developers would know as null values as missing or missing data in pandas. Within pandas, a missing value is denoted by NaN.

In most cases, the terms missing and null are interchangeable, but to abide by the standards of pandas, we’ll continue using missing throughout this tutorial.


Evaluating for missing data


At the base level, pandas offers two functions to test for missing data, isnull() and notnull(). As you may suspect, these are simple functions that return a boolean value indicating whether the passed in argument value is in fact missing data.

In addition to the above functions, pandas also provides two methods to check for missing data on Series and DataFrame objects. These methods evaluate each object in the Series or DataFrame and provide a boolean value indicating if the data is missing or not.

For example, let’s create a simple Series in pandas:

import pandas as pd
import numpy as np

s = pd.Series([2,3,np.nan,7,"The Hobbit"])

Now evaluating the Series s, the output shows each value as expected, including index 2 which we explicitly set as missing.

In [2]: s
Out[2]:
0             2
1             3
2           NaN
3             7
4    The Hobbit
dtype: object

To test the isnull() method on this series, we can use s.isnull() and view the output:

In [3]: s.isnull()
Out[3]:
0    False
1    False
2     True
3    False
4    False
dtype: bool

As expected, the only value evaluated as missing is index 2.

Determine if ANY value in a Series is missing


While the isnull() method is useful, sometimes we may wish to evaluate whether any value is missing in a Series.

There are a few possibilities involving chaining multiple methods together.

The fastest method is performed by chaining .values.any():

In [4]: s.isnull().values.any()
Out[4]:
True

In some cases, you may wish to determine how many missing values exist in the collection, in which case you can use .sum() chained on:

In [5]: s.isnull().sum()
Out[5]:
1

Count missing values in DataFrame


While the chain of .isnull().values.any() will work for a DataFrame object to indicate if any value is missing, in some cases it may be useful to also count the number of missing values across the entire DataFrame. Since DataFrames are inherently multidimensional, we must invoke two methods of summation.

For example, first we need to create a simple DataFrame with a few missing values:

In [6]: df = pd.DataFrame(np.random.randn(5,5))
df[df > 0.9] = pd.np.nan

Now if we chain a .sum() method on, instead of getting the total sum of missing values, we’re given a list of all the summations of each column:

In [7]: df.isnull().sum()
Out[7]:
0    3
1    0
2    1
3    1
4    0
dtype: int64

We can see in this example, our first column contains three missing values, along with one each in column 2 and 3 as well.

In order to get the total summation of all missing values in the DataFrame, we chain two .sum() methods together:

In [8]: df.isnull().sum().sum()
Out[8]:
5