Description
Context
NaN equivalence comparison behaves differently from None equivalence comparison.
Problem
While None == None evaluates to True, np.nan == np.nan evaluates to False in NumPy. As Pandas treats None like np.nan for simplicity and performance reasons, a comparison of DataFrame elements with np.nan always returns False. If the developer is not aware of this, it may lead to unintentional bugs in the code.
Solution
Developers need to be careful when using the NaN comparison.
Type
Generic
Existing Stage
Data Cleaning
Effect
Error-prone
Example
### Pandas & NumPy
import pandas as pd
- import numpy as np
df = pd.DataFrame([1, None, 3])
- df_is_nan = df == np.nan
+ df_is_nan = df.isna()
Source:
Paper
- MPA Haakman. 2020. Studying the Machine Learning Lifecycle and ImprovingCode Quality of Machine Learning Application. (2020).