I am reading a file into a Pandas DataFrame that may have invalid (i.e. NaN) rows. This is sequential data, so I have row_id+1 refer to row_id. When I use frame.dropna(), I get the desired structure, but the index labels stay as they were originally assigned. How can the index labels get reassigned 0 to N-1 where N is the number of rows after dropna()?
How to delete a row in a Pandas DataFrame and relabel the index?
Asked Answered
Use pandas.DataFrame.reset_index()
, the option drop=True
will do what you are looking for.
In [14]: df = pd.DataFrame(np.random.randn(5,4))
In [15]: df.ix[::3] = np.nan
In [16]: df
Out[16]:
0 1 2 3
0 NaN NaN NaN NaN
1 1.895803 0.532464 1.879883 -1.802606
2 0.078928 0.053323 0.672579 -1.188414
3 NaN NaN NaN NaN
4 -0.766554 -0.419646 -0.606505 -0.162188
In [17]: df = df.dropna()
In [18]: df.reset_index(drop=True)
Out[18]:
0 1 2 3
0 1.895803 0.532464 1.879883 -1.802606
1 0.078928 0.053323 0.672579 -1.188414
2 -0.766554 -0.419646 -0.606505 -0.162188
Yay! reset_index() was exactly what I needed! Thanks for the clear example! –
Lilli
note to others finding this: you need to do
df = df.reset_index(drop=True)
, otherwise the change won't "persist" in df. –
Kella In addition to an accepted answer:
You should also use inplace=True
as well:
df.reset_index(drop=True, inplace=True)
© 2022 - 2024 — McMap. All rights reserved.
df
is your dataframe after dropping NA, then trydf.index = range(len(df))
– Hightower