Breaking the tapply junkie habit
Asked Answered
A

1

9

I've learned R by toying, and I'm starting to think that I'm abusing the tapply function. Are there better ways to do some of the following actions? Granted, they work, but as they get more complex I wonder if I'm losing out on better options. I'm looking for some criticism, here:

tapply(var1, list(fac1, fac2), mean, na.rm=T)

tapply(var1, fac1, sum, na.rm=T) / tapply(var2, fac1, sum, na.rm=T)

cumsum(tapply(var1, fac1, sum, na.rm=T)) / sum(var1)

Update: Here's some example data...

     var1    var2 fac1           fac2
1      NA  275.54   10      (266,326]
2      NA  565.89   10      (552,818]
3      NA  815.41    6      (552,818]
4      NA  281.77    6      (266,326]
5      NA  640.24   NA      (552,818]
6      NA   78.42   NA     [78.4,266]
7      NA 1027.06   NA (818,1.55e+03]
8      NA  355.20   NA      (326,552]
9      NA  464.52   NA      (326,552]
10     NA 1397.11   10 (818,1.55e+03]
11     NA  229.82   NA     [78.4,266]
12     NA  542.77   NA      (326,552]
13     NA  829.32   NA (818,1.55e+03]
14     NA  284.78   NA      (266,326]
15     NA  194.97   10     [78.4,266]
16     NA  672.55    8      (552,818]
17     NA  348.01   10      (326,552]
18     NA 1550.79    9 (818,1.55e+03]
19 101.98  101.98    4     [78.4,266]
20     NA  292.80    6      (266,326]

Update data dump:

structure(list(var1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 101.98, NA), var2 = c(275.54, 
565.89, 815.41, 281.77, 640.24, 78.42, 1027.06, 355.2, 464.52, 
1397.11, 229.82, 542.77, 829.32, 284.78, 194.97, 672.55, 348.01, 
1550.79, 101.98, 292.8), fac1 = c(10L, 10L, 6L, 6L, NA, NA, NA, 
NA, NA, 10L, NA, NA, NA, NA, 10L, 8L, 10L, 9L, 4L, 6L), fac2 = structure(c(2L, 
4L, 4L, 2L, 4L, 1L, 5L, 3L, 3L, 5L, 1L, 3L, 5L, 2L, 1L, 4L, 3L, 
5L, 1L, 2L), .Label = c("[78.4,266]", "(266,326]", "(326,552]", 
"(552,818]", "(818,1.55e+03]"), class = "factor")), .Names = c("var1", 
"var2", "fac1", "fac2"), row.names = c(NA, -20L), class = "data.frame")
Amusement answered 16/9, 2009 at 17:32 Comment(5)
Just as a comment: while these are clear examples, it would be easier to help if you provided sample data for var1, fac1, etc.Neukam
Good point. Data sample added.Amusement
Suggestion: could you use the dput() function to extract the structure of that sample data, and then paste the results here? Makes it a breeze to import.Railhead
An other idea is to use something from the "datasets" package which comes with R: ?datasets. Then no extra work is required for replication.Neukam
I know I'm already in trouble if I can't even get the example right... added a dput of the example df. Keep in mind I'm unabashedly using attach() to get to the data in this scenario.Amusement
V
4

For part 1 I prefer aggregate because it keeps the data in a more R-like one observation per row format.

aggregate(var1, list(fac1, fac2), mean, na.rm=T)

Vermination answered 16/9, 2009 at 20:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.