How to check if any value is NaN in a pandas DataFrame

Posted by: AJ Welch

The official documentation for pandas defines what most developers would know as null values as missing or missing data in pandas. Within pandas, a missing value is denoted by NaN.

In most cases, the terms missing and null are interchangeable, but to abide by the standards of pandas, we’ll continue using missing throughout this tutorial.

Evaluating for missing data

At the base level, pandas offers two functions to test for missing data, isnull() and notnull(). As you may suspect, these are simple functions that return a boolean value indicating whether the passed in argument value is in fact missing data.

In addition to the above functions, pandas also provides two methods to check for missing data on Series and DataFrame objects. These methods evaluate each object in the Series or DataFrame and provide a boolean value indicating if the data is missing or not.

For example, let’s create a simple Series in pandas:

import pandas as pd
import numpy as np

s = pd.Series([2,3,np.nan,7,"The Hobbit"])

Now evaluating the Series s, the output shows each value as expected, including index 2 which we explicitly set as missing.

In [2]: s
Out[2]:
0             2
1             3
2           NaN
3             7
4    The Hobbit
dtype: object

To test the isnull() method on this series, we can use s.isnull() and view the output:

In [3]: s.isnull()
Out[3]:
0    False
1    False
2     True
3    False
4    False
dtype: bool

As expected, the only value evaluated as missing is index 2.

Determine if ANY value in a Series is missing

While the isnull() method is useful, sometimes we may wish to evaluate whether any value is missing in a Series.

There are a few possibilities involving chaining multiple methods together.

The fastest method is performed by chaining .values.any():

In [4]: s.isnull().values.any()
Out[4]:
True

In some cases, you may wish to determine how many missing values exist in the collection, in which case you can use .sum() chained on:

In [5]: s.isnull().sum()
Out[5]:
1

Count missing values in DataFrame

While the chain of .isnull().values.any() will work for a DataFrame object to indicate if any value is missing, in some cases it may be useful to also count the number of missing values across the entire DataFrame. Since DataFrames are inherently multidimensional, we must invoke two methods of summation.

For example, first we need to create a simple DataFrame with a few missing values:

In [6]: df = pd.DataFrame(np.random.randn(5,5))
df[df > 0.9] = pd.np.nan

Now if we chain a .sum() method on, instead of getting the total sum of missing values, we’re given a list of all the summations of each column:

In [7]: df.isnull().sum()
Out[7]:
0    3
1    0
2    1
3    1
4    0
dtype: int64

We can see in this example, our first column contains three missing values, along with one each in column 2 and 3 as well.

In order to get the total summation of all missing values in the DataFrame, we chain two .sum() methods together:

In [8]: df.isnull().sum().sum()
Out[8]:
5

Next Topic

How to execute raw SQL in SQLAlchemy

精选

Jira

Confluence

Jira Service Management

Trello

Rovo 全新

Jira Product Discovery 全新

Compass 全新

Guard 全新

Loom 全新

开发人员

Jira

Bitbucket

Compass 全新

产品经理

Jira

Confluence

Jira Product Discovery 全新

IT 专业人员

Jira Service Management

Guard 全新

业务团队

Jira

Confluence

Trello

Loom 全新

领导团队

Jira

Confluence

Loom 全新

Jira Align

Teams

软件

营销

IT

解决方案

按团队规模

按行业划分

为什么选择 Atlassian

集成

客户

FedRAMP

弹性

平台

Trust Center

资源

客户支持

查找合作伙伴

迁移计划

大学

支持

学习

Jira

Jira Service Management

Confluence

How to check if any value is NaN in a pandas DataFrame

Posted by: AJ Welch

Evaluating for missing data

Determine if ANY value in a Series is missing

Count missing values in DataFrame

Next Topic

产品

资源

学习