Understanding ddply error message - argument "by" is missing, with no default
Asked Answered
G

3

19

I am trying to figure out why I am getting an error message when using ddply.

Example data:

data<-data.frame(area=rep(c("VA","OC","ES"),each=4),
    sex=rep(c("Male","Female"),each=2,times=3),
    year=rep(c(2009,2010),times=6),
    bin=c(110,120,125,125,110,130,125,80,90,90,80,140),
    shell_length=c(.4,4,1,2,.2,5,.4,4,.8,4,.3,4))

bin7<-ddply(data, .(area,year,sex,bin), summarize,n_bin=length(shell_length))

Error message: Error in .fun(piece, ...) : argument "by" is missing, with no default

I got this error message yesterday. I restarted R and reran the code and everything was fine. This morning I got the error message again and restarting R did not solve the problem.

I also tried to run some example code and got the same error message.

  # Summarize a dataset by two variables
require(plyr)
dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54)
)

# Note the use of the '.' function to allow
# group and sex to be used without quoting
ddply(dfx, .(group, sex), summarize,
 mean = round(mean(age), 2),
 sd = round(sd(age), 2))

R information

R version 3.2.1 (2015-06-18)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
 [1] Hmisc_3.17-0        ggplot2_1.0.1       Formula_1.2-1      
 [4] survival_2.38-1     car_2.0-26          MASS_7.3-40        
 [7] xlsx_0.5.7          xlsxjars_0.6.1      rJava_0.9-7        
[10] plyr_1.8.3          latticeExtra_0.6-26 RColorBrewer_1.1-2 
[13] lattice_0.20-31  

If someone could please explain why this is happening I would appreciate it.

Thanks

Grecoroman answered 19/11, 2015 at 15:12 Comment(8)
Not able to reproduce the error using the same version of plyr. Did you loaded plyr and dplyr at the same time?Mcglothlin
I dont get the error. But I named the dataframe d. Also the formula works: ddply(d, ~ area+year+sex+bin, summarize,n_bin=length(shell_length))Engle
Both dplyr and plyr have functions named summarize. You probably have dplyr loaded as well. detach both packages and then load plyr before loading dplyr.Mckelvey
Thank you for all of your help. I detached both packages and then reloaded them with plyr first. I did not know that the two packages needed to be loaded in a certain order.Grecoroman
Looks like you loaded Hmisc last, which also has a summarize function (that has a by argument as in your error). Watch those messages when loading packages - you get important info about masking.Actinon
Thank you for that information also. After thinking I got it figured out, I reran the code and got the same error message. I ended up loading the plyr package later on in my code and now everything is working. So, if I want to use plyr for the summarizing data I need to have that loaded last in my list of packages loaded? I prefer to load all of my packages at the beginning of my code. This is the first time I have had to deal with this type of issue with masking of different packages.Grecoroman
you can also specify plyr::summarize(...) to be on the safe sideZee
why isn't there a hint in the error message that there are several packages loaded having a function with the particular name?Kemme
A
27

As stated in Narendra's comment to the question, this error can be caused by loading other packages that have a function called summarize (or summarise) that does not work as the function in plyr. For instance:

library(plyr)
library(Hmisc)

ddply(iris, "Species", summarize, mean_sepal_length = mean(Sepal.Length))
#> Error in .fun(piece, ...) : argument "by" is missing, with no default

One solution is to call the correct function with :: and the correct namespace:

ddply(iris, "Species", plyr::summarize, mean_sepal_length = mean(Sepal.Length))
#> Species mean_sepal_length
#> 1     setosa             5.006
#> 2 versicolor             5.936
#> 3  virginica             6.588

Alternatively, one can detach the package that has the wrong function:

detach(package:Hmisc)
ddply(iris, "Species", summarize, mean_sepal_length = mean(Sepal.Length))
#> Species mean_sepal_length
#> 1     setosa             5.006
#> 2 versicolor             5.936
#> 3  virginica             6.588

Finally, if one needs both packages and does not want to bother with ::, one can load them in the other order:

library(Hmisc)
library(plyr)

ddply(iris, "Species", summarize, mean_sepal_length = mean(Sepal.Length))
#> Species mean_sepal_length
#> 1     setosa             5.006
#> 2 versicolor             5.936
#> 3  virginica             6.588
Analogue answered 10/2, 2016 at 19:33 Comment(0)
O
5

I had a similar problem (with a different data set, but same error message), but I discovered that ddplyr used the UK spelling "summarise". Once I made the spelling change, code worked.

Here's the code I used. When I used the "z" spelling, I got the error message Error in .fun(piece, ...) : argument "by" is missing, with no default; but changing to "s" solved it.

library(plyr)
ddply(InsectSprays,.(spray),summarise,sum=sum(count))
Otiose answered 15/8, 2016 at 13:40 Comment(1)
Issue is referenced here: github.com/tidyverse/dplyr/issues/505Sardanapalus
S
1

@CoderGuy123's answer is great, but I want to add one more solution which I prefer to those suggested.

If you want to load both packages that has name conflicts, you can control which specific function is used with simple assignment: summarize <- plyr::summarize.

Example:

library(plyr)
library(Hmisc)

ddply(iris, "Species", summarize, mean_sepal_length = mean(Sepal.Length))
#> Error in .fun(piece, ...) : argument "by" is missing, with no default

summarize <- plyr::summarize

ddply(iris, "Species", summarize, mean_sepal_length = mean(Sepal.Length))
#> Species mean_sepal_length
#> 1     setosa             5.006
#> 2 versicolor             5.936
#> 3  virginica             6.588
Scummy answered 11/5, 2021 at 10:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.