Description

Context

Developers may need a new empty column in DataFrame.

Problem

If they use zeros or empty strings to initialize a new empty column in Pandas, the ability to use methods such as .isnull() or .notnull() is retained.

Solution

Use NaN value (e.g. np.nan) if a new empty column in a DataFrame is needed. Do not use “filler values” such as zeros or empty strings.

Type

API-Specific

Existing Stage

Data Cleaning

Effect

Robustness

Example

import pandas as pd
+ import numpy as np

df = pd.DataFrame([])
- df['new_col_int'] = 0
- df['new_col_str'] = ''
+ df['new_col_float'] = np.nan
+ df['new_col_int'] = pd.Series(dtype='int')
+ df['new_col_str'] = pd.Series(dtype='object')

Source:

Paper

Grey Literature

GitHub Commit

Stack Overflow

Documentation