You may want to think about using the by
or tapply
functions. This will allow you to skip the explicit call to split
. Here's an example, since you haven't provided data.
# some example data
set.seed(1)
df <- data.frame(x = as.factor(rep(1:5, each=10)), y1=rnorm(50), y2=rnorm(50))
# with `tapply`
a <- do.call(rbind, sapply(df[,2:3], function(i) tapply(i, df$x, summary)))
# with `by`
a <- do.call(rbind, sapply(df[,2:3], function(i) by(i, df$x, summary)))
Here's the output:
> a
Min. 1st Qu. Median Mean 3rd Qu. Max.
[1,] -0.8356 -0.54620 0.256600 0.1322 0.5537 1.5950
[2,] -2.2150 -0.03775 0.491900 0.2488 0.9132 1.5120
[3,] -1.9890 -0.39760 0.009218 -0.1337 0.5694 0.9190
[4,] -1.3770 -0.32140 -0.056560 0.1207 0.6693 1.3590
[5,] -0.7075 -0.23120 0.126100 0.1341 0.6619 0.8811
[6,] -1.1290 -0.55080 0.103000 0.1435 0.5268 1.9800
[7,] -1.8050 -0.02243 0.171000 0.4512 1.2720 2.4020
[8,] -1.2540 -0.67980 -0.221100 -0.2477 0.2372 0.6107
[9,] -1.5240 -0.26190 0.300000 0.1274 0.5380 1.1780
[10,] -1.2770 -0.56560 0.042540 0.1123 1.0450 1.5870
You might also want to combine this with the variable and level names to know what's going on:
b <- expand.grid(level=levels(df$x),var=names(df[,2:3]))
cbind(a,b)
Here's the output of that:
> cbind(b,a)
level var Min. 1st Qu. Median Mean 3rd Qu. Max.
1 1 y1 -0.8356 -0.54620 0.256600 0.1322 0.5537 1.5950
2 2 y1 -2.2150 -0.03775 0.491900 0.2488 0.9132 1.5120
3 3 y1 -1.9890 -0.39760 0.009218 -0.1337 0.5694 0.9190
4 4 y1 -1.3770 -0.32140 -0.056560 0.1207 0.6693 1.3590
5 5 y1 -0.7075 -0.23120 0.126100 0.1341 0.6619 0.8811
6 1 y2 -1.1290 -0.55080 0.103000 0.1435 0.5268 1.9800
7 2 y2 -1.8050 -0.02243 0.171000 0.4512 1.2720 2.4020
8 3 y2 -1.2540 -0.67980 -0.221100 -0.2477 0.2372 0.6107
9 4 y2 -1.5240 -0.26190 0.300000 0.1274 0.5380 1.1780
10 5 y2 -1.2770 -0.56560 0.042540 0.1123 1.0450 1.5870
by(final_data[,c(66:85)],Company,function(x) cor(x))
– Kissee