In aggregate: sum not meaningful for factors
Asked Answered
V

1

4

I am trying something that should be simple, any hint on what is going on is very welcomed.

I have a large data frame with country imports from some municipalities. For some countries I have 2 entries. I want to sum the imports from each municipality and having a unique row for each country. I am using the aggregate function. For example (I include a small part of the data frame):

municipalities<-c("country",1100056, 1100106,1100205,1100304,1200104,1200252)
c1<-c("Afghanistan",2,34,23.4,5,0,0)    
c2<-c("Afghanistan",0,20,11.1,5.4,2,0)    
c3<-c("Albania",12,120,11.4,5.1,12,10)    
c4<-c("Albania",0,40,61.1,65.4,652,2)
df<-as.data.frame(rbind(municipalities,c1,c2,c3,c4))

Basically I am trying

df<-df[-1,]    
aggregate(df[,2:7],list(df[,1]),sum)

but I receive a message:

Error in Summary.factor(c(4L, 1L), na.rm = FALSE) : 
  sum not meaningful for factors

I have tried to force the df to be numeric, declared the characters as characters etc. but nothing seems to help.

Vivanvivarium answered 21/10, 2013 at 11:25 Comment(1)
How are you creating your actual data.frame? The way you have given in the example is creating factors which can't be summed. Consider giving sample from your actual data. you can perhaps paste output of dput(head(df)) in the question.Aker
O
9

It is because of how you're creating your dataframe. For example, c1 is character because a vector can only have one class. When you put them into a dataframe, those character vectors are further coerced to factor. Thus you're trying to run sum on factors. You figured this out already, but then tried to convert factors to numeric, which is probably giving you nonsensical results.

The easy answer is to build your dataframe column-wise rather than row-wise, so you don't get into so many coercion problems.

Given the data you already have, this will solve your problem:

df[] <- lapply(df, function(x) type.convert(as.character(x)))
aggregate(. ~ V1, df, sum)

(Thanks to @AnandaMahto for the much cleaner way of doing that conversion than what I originally had.)

Result:

           V1 V2  V3   V4   V5  V6 V7
1 Afghanistan  2  54 34.5 10.4   2  0
2     Albania 12 160 72.5 70.5 664 12
Orthopter answered 21/10, 2013 at 11:33 Comment(4)
Try: df[] <- lapply(df, function(x) type.convert(as.character(x))) (where df is from the step "df <- df[-1, ]" in the original question. A bit cleaner than what you propose.Hermitage
Oh, and for the aggregate step, try aggregate(. ~ V1, df2, sum).Hermitage
@Orthopter - What if the data set is large and lapply() is slow on it?Tynan
@ChetanArvindPatil having factors where you want numeric is a problem that should be solved upstream. This answer shows a quick fix, but if you have a large data set that is being slow you should fix it at the time you read in the data.Speciosity

© 2022 - 2024 — McMap. All rights reserved.