Description
Context
NaN
equivalence comparison behaves differently from None
equivalence comparison.
Problem
While None
== None
evaluates to True
, np.nan
== np.nan
evaluates to False
in NumPy. As Pandas treats None
like np.nan
for simplicity and performance reasons, a comparison of DataFrame
elements with np.nan
always returns False
. If the developer is not aware of this, it may lead to unintentional bugs in the code.
Solution
Developers need to be careful when using the NaN
comparison.
Type
Generic
Existing Stage
Data Cleaning
Effect
Error-prone
Example
### Pandas & NumPy
import pandas as pd
- import numpy as np
df = pd.DataFrame([1, None, 3])
- df_is_nan = df == np.nan
+ df_is_nan = df.isna()
Source:
Paper
- MPA Haakman. 2020. Studying the Machine Learning Lifecycle and ImprovingCode Quality of Machine Learning Application. (2020).