I have a dataframe of expression data where gene are rows and columns are samples. I also have a dataframe containing metadata for each sample in the expression dataframe. In reality my expr dataframe has 30,000+ rows and 100+ columns. However, below is an example with smaller data.
expr <- data.frame(sample1 = c(1,2,2,0,0),
sample2 = c(5,2,4,4,0),
sample3 = c(1,2,1,0,1),
sample4 = c(6,5,6,6,7),
sample5 = c(0,0,0,1,1))
rownames(expr) <- paste0("gene",1:5)
meta <- data.frame(sample = paste0("sample",1:5),
treatment = c("control","control",
"treatment1",
"treatment2", "treatment2"))
I want to find the mean for each gene per treatment. From the examples I've seen with split() or group_by() people group based on a column already present in the data.frame. However, I have a separate dataframe (meta) that classifies the grouping for the columns in another dataframe (expr).
I would like my output to be a dataframe with genes as rows, treatment as columns, and values as the mean.
# control treatment1 treatment2
# gene1 mean mean mean
# gene2 mean mean mean
treatment1
simplyexpr["sample3"]
and fortreatment2
simplyrowMeans(expr[c("sample4", "sample5"])
? This can be done withcolnames(expr) = with(meta, paste0(sample, "_", treatment)); m1 = expr["sample3_treatment1"]; m2 = rowMeans(expr[c("sample4_treatment2", "sample5_treatment2")])
? – Plainlaidcolnames(expr)
and sample names is more complex in the real data at hand compared to the toy data given. – Plainlaid