I have a numeric column ("value") in a dataframe ("df"), and I would like to generate a new column ("valueBin") based on "value." I have the following conditional code to define df$valueBin:
df$valueBin[which(df$value<=250)] <- "<=250"
df$valueBin[which(df$value>250 & df$value<=500)] <- "250-500"
df$valueBin[which(df$value>500 & df$value<=1000)] <- "500-1,000"
df$valueBin[which(df$value>1000 & df$value<=2000)] <- "1,000 - 2,000"
df$valueBin[which(df$value>2000)] <- ">2,000"
I'm getting the following error:
"Error in
$<-.data.frame
(*tmp*
, "valueBin", value = c(NA, NA, NA, : replacement has 6530 rows, data has 6532"
Every element of df$value
should fit into one of my which()
statements. There are no missing values in df$value
. Although even if I run just the first conditional statement (<=250), I get the exact same error, with "...replacement has 6530 rows..."
although there are way fewer than 6530 records with value<=250, and value is never NA.
This SO link notes a similar error when using aggregate() was a bug, but it recommends installing the version of R I have. Plus the bug report says its fixed. R aggregate error: "replacement has <foo> rows, data has <bar>"
This SO link seems more related to my issue, and the issue here was an issue with his/her conditional logic that caused fewer elements of the replacement array to be generated. I guess that must be my issue as well, and figured at first I must have a "<=" instead of an "<" or vice versa, but after checking I'm pretty sure they're all correct to cover every value of "value" without overlaps. R error in '[<-.data.frame'... replacement has # items, need #
cut
. However, if you want to use your method, initialize the new column first than give your commands:df$valueBin<-""
and then the other assignments. – Guthrey