I want to calculate rowMeans of a range of column but I cannot give the hard-coded value for colnames (e.g c(C1,C3)) or range (e.g. C1:C3) as both names and range are variable. My df looks like:
> df
chr name age MGW.1 MGW.2 MGW.3 HEL.1 HEL.2 HEL.3
1 123 abc 12 10.00 19 18.00 12 13.00 -14
2 234 bvf 24 -13.29 13 -3.02 12 -0.12 24
3 376 bxc 17 -6.95 10 -18.00 15 4.00 -4
This is just a sample, in reality I have columns ranging in MGW.1 ... MGW.196 and so. Here Instead of giving the exact colnames or an exact range I want to pass initial of colnames and want to get average of all columns having that initials. Something like: MGW=rowMeans(df[,MGW.*]), HEL=rowMeans(df[,HEL.*])
So my final output should look like:
> df
chr name age MGW Hel
1 123 abc 12 10.00 19
2 234 bvf 24 13.29 13
3 376 bxc 17 -6.95 10
I know these values are not correct but it is just to give you and idea. Secondly I want to remove all those rows from data frame which contains NA in the entire row except the first 3 values.
Here is the dput for sample example:
> dput(df)
structure(list(chr = c(123L, 234L, 376L), name = structure(1:3, .Label = c("abc",
"bvf", "bxc"), class = "factor"), age = c(12L, 24L, 17L), MGW.1 = c(10,
-13.29, -6.95), MGW.2 = c(19L, 13L, 10L), MGW.3 = c(18, -3.02,
-18), HEL.1 = c(12L, 12L, 15L), HEL.2 = c(13, -0.12, 4), HEL.3 = c(-14L,
24L, -4L)), .Names = c("chr", "name", "age", "MGW.1", "MGW.2",
"MGW.3", "HEL.1", "HEL.2", "HEL.3"), class = "data.frame", row.names = c(NA,
-3L))
df
with a "logical" vector of positions wherenames(df)
?startsWith
"MGW"
etc. Also, see,?complete.cases
to find rows that contain onlyNA
(after subsetting all but first three columns). – ComparableMGW.*
,HEL.*
) ? Or do you need to regroup them programmatically ? – Autograph