I have a large dataset that is one huge table that actually should be many tables. The headers are buried in the rows for the subsets.
My goal is to pull those headers out into a new column such that I can filter by that column to get the data I want (one header at a time). I've created an empty header column for this. There is always a series of 3 NaN values in the SCORE
column where the first value in the NAME
column in that series is the HEADER
I want. So I'm thinking something about that relationship could be leveraged.
Current Pandas data frame has this structure:
s = '''HEADER,NAME,SCORE
NaN,Header 1,NaN
NaN,Random Junk,NaN
NaN,Random Junk,NaN
NaN,Ed,98
NaN,Gary,78
NaN,Floyd,89
NaN,Header 2,NaN
NaN,Random Junk,NaN
NaN,Random Junk,NaN
NaN,Mary,96
NaN,Steve,78'''
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(s))
HEADER NAME SCORE
NaN Header 1 NaN
NaN Random Junk NaN
NaN Random Junk NaN
NaN Ed 98
NaN Gary 78
NaN Floyd 89
... ... ...
NaN Header 2 NaN
NaN Random Junk NaN
NaN Random Junk NaN
NaN Mary 96
NaN Steve 78
and I want this:
HEADER NAME SCORE
Header 1 Header 1 NaN
Header 1 Random Junk NaN
Header 1 Random Junk NaN
Header 1 Ed 98
Header 1 Gary 78
Header 1 Floyd 89
... ... ...
Header 2 Header 2 NaN
Header 2 Random Junk NaN
Header 2 Random Junk NaN
Header 2 Mary 96
Header 2 Steve 78
so I can then remove the NaN rows and get what I'm truly after which is this:
HEADER NAME SCORE
Header 1 Ed 98
Header 1 Gary 78
Header 1 Floyd 89
... ... ...
Header 2 Mary 96
Header 2 Steve 78
After much searching, I'm not able to figure out how to do this conditional editing like this. Would appreciate any help you can give.
98 nan 89
? – Purposeful