Adding StandardScaler() of values as new column to DataFrame returns partly NaNs
Asked Answered
S

1

10

I have a pandas DataFrame:

df['total_price'].describe()

returns

count    24895.000000
mean       216.377369
std        161.246931
min          0.000000
25%        109.900000
50%        174.000000
75%        273.000000
max       1355.900000
Name: total_price, dtype: float64

When I apply preprocessing.StandardScaler() to it:

x = df[['total_price']]
standard_scaler = preprocessing.StandardScaler()
x_scaled = standard_scaler.fit_transform(x)
df['new_col'] = pd.DataFrame(x_scaled)   

<y new column with the standardized values contains some NaNs:

df[['total_price', 'new_col']].head()

    total_price new_col
0   241.95      0.158596
1   241.95      0.158596
2   241.95      0.158596
3   81.95      -0.833691
4   81.95      -0.833691

df[['total_price', 'new_col']].tail()

        total_price new_col
28167   264.0       NaN
28168   264.0       NaN
28176   94.0        NaN
28177   166.0       NaN
28178   166.0       NaN

What's going wrong here?

Saying answered 14/11, 2018 at 18:44 Comment(4)
Your original column had 24895 entries, and your new DF has indices going all the way to 28178, so my first guess that some sort of join or concatenation may have resulted in an index mismatch between the old and new DFs. Were there any intermediate steps not shown, like a train-test split?Transpicuous
it's part of a larger df and I removed rows before. But this was not inbetween the steps aboveSaying
After reading your comment I did a df = df.reset_index() and the problem got resolvedSaying
Glad I could helpTranspicuous
R
4

The indices in your dataframe have gaps:

28167 
28168  
28176  
28177  
28178

When you call pd.DataFrame(x_scaled) you are creating a new contiguous index and hence when assigining this as a column in the original dataframe, many lines will not have a match. You can resolve this by resetting the index in the original dataframe (df.reset_index()) or by updating x inplace (x.update(x_scaled)).

Roseleeroselia answered 22/3, 2021 at 9:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.