I have a simple dataframe as such:
ID Col1 Col2 Col3 Col4
1 NA NA NA NA
1 5 10 NA NA
1 NA NA 15 20
2 NA NA NA NA
2 25 30 NA NA
2 NA NA 35 40
And I would like to reformat it as such:
ID Col1 Col2 Col3 Col4
1 5 10 15 20
2 25 30 35 40
(please note: the real data set has thousands of rows and the values are from biological data -- the NA
s follow no simple pattern, except that the NA
s are disjoint, and yes there are exactly 3 rows for each ID
).
STEP ONE: get rid of rows that have only NA
values.
On the surface this looked simple, but I've run across some problems.
complete.cases(DF)
returns all FALSE
, so I can't really use this to remove the rows with all NA
s, as in DF[complete.cases(DF),]
. This is because all rows contain at least one NA
.
Since NA
s want to propagate themselves, other schemes using is.na
fail for the same reason.
STEP TWO: collapse the remaining two rows into one.
Thinking about using something like aggregate
to pull this off, but there has got to be an easier way than this, which doesn't work at all.
Thanks for any advice.