Description
Context
In Pandas, df.to_numpy()
and df.values()
both can turn a DataFrame
to a NumPy array.
Problem
As noted in a Stack Overflow post, df.values()
has an inconsistency problem. With .values()
it is unclear whether the returned value would be the actual array, some transformation of it, or one of the Pandas custom arrays. However, the .values()
API has not been not deprecated yet. Although the library developers note it as a warning in the documentation, it does not log a warning or error when compiling the code if we use .value()
.
Solution
When converting DataFrame
to NumPy array, it is better to use df.to_numpy()
than df.values()
.
Type
API-Specific
Existing Stage
Data Cleaning
Effect
Consistency & Error-prone
Example
### NumPy & Pandas
import numpy as np
import pandas as pd
index = [1, 2, 3, 4, 5, 6, 7]
a = [np.nan, np.nan, np.nan, 0.1, 0.1, 0.1, 0.1]
b = [0.2, np.nan, 0.2, 0.2, 0.2, np.nan, np.nan]
c = [np.nan, 0.5, 0.5, np.nan, 0.5, 0.5, np.nan]
df = pd.DataFrame({'A': a, 'B': b, 'C': c}, index=index)
df = df.rename_axis('ID')
- arr = df.values
+ arr = df.to_numpy()