Description

Context

Developers may need a new empty column in DataFrame.

Problem

If they use zeros or empty strings to initialize a new empty column in Pandas, the ability to use methods such as .isnull() or .notnull() is retained. This might also happens to initializations in other data structure or libraries.

Solution

Use NaN value (e.g. np.nan) if a new empty column in a DataFrame is needed. Do not use “filler values” such as zeros or empty strings.

Type

Generic

Existing Stage

Data Cleaning

Effect

Robustness

Example

import pandas as pd
+ import numpy as np

df = pd.DataFrame([])
- df['new_col_int'] = 0
- df['new_col_str'] = ''
+ df['new_col_float'] = np.nan
+ df['new_col_int'] = pd.Series(dtype='int')
+ df['new_col_str'] = pd.Series(dtype='object')

Source:

Paper

Grey Literature

GitHub Commit

Stack Overflow

Documentation