Adding StandardScaler() of values as new column to DataFrame returns partly NaNs

About

Asked 14/11, 2018 at 18:44 Answered 22/3, 2021 at 9:52

I have a pandas DataFrame:

df['total_price'].describe()

returns

count    24895.000000
mean       216.377369
std        161.246931
min          0.000000
25%        109.900000
50%        174.000000
75%        273.000000
max       1355.900000
Name: total_price, dtype: float64

When I apply preprocessing.StandardScaler() to it:

x = df[['total_price']]
standard_scaler = preprocessing.StandardScaler()
x_scaled = standard_scaler.fit_transform(x)
df['new_col'] = pd.DataFrame(x_scaled)

<y new column with the standardized values contains some NaNs:

df[['total_price', 'new_col']].head()

    total_price new_col
0   241.95      0.158596
1   241.95      0.158596
2   241.95      0.158596
3   81.95      -0.833691
4   81.95      -0.833691

df[['total_price', 'new_col']].tail()

        total_price new_col
28167   264.0       NaN
28168   264.0       NaN
28176   94.0        NaN
28177   166.0       NaN
28178   166.0       NaN

What's going wrong here?

Saying answered 14/11, 2018 at 18:44 Comment(4)

Your original column had 24895 entries, and your new DF has indices going all the way to 28178, so my first guess that some sort of join or concatenation may have resulted in an index mismatch between the old and new DFs. Were there any intermediate steps not shown, like a train-test split? – Transpicuous 14/11, 2018 at 19:15

it's part of a larger df and I removed rows before. But this was not inbetween the steps above – Saying 14/11, 2018 at 19:39

After reading your comment I did a df = df.reset_index() and the problem got resolved – Saying 14/11, 2018 at 19:49

Glad I could help – Transpicuous 14/11, 2018 at 20:1

The indices in your dataframe have gaps:

When you call pd.DataFrame(x_scaled) you are creating a new contiguous index and hence when assigining this as a column in the original dataframe, many lines will not have a match. You can resolve this by resetting the index in the original dataframe (df.reset_index()) or by updating x inplace (x.update(x_scaled)).

Roseleeroselia answered 22/3, 2021 at 9:52 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags