Error - replacement has [x] rows, data has [y]
Asked Answered
S

3

60

I have a numeric column ("value") in a dataframe ("df"), and I would like to generate a new column ("valueBin") based on "value." I have the following conditional code to define df$valueBin:

df$valueBin[which(df$value<=250)] <- "<=250"
df$valueBin[which(df$value>250 & df$value<=500)] <- "250-500"
df$valueBin[which(df$value>500 & df$value<=1000)] <- "500-1,000"
df$valueBin[which(df$value>1000 & df$value<=2000)] <- "1,000 - 2,000"
df$valueBin[which(df$value>2000)] <- ">2,000"

I'm getting the following error:

"Error in $<-.data.frame(*tmp*, "valueBin", value = c(NA, NA, NA, : replacement has 6530 rows, data has 6532"

Every element of df$value should fit into one of my which() statements. There are no missing values in df$value. Although even if I run just the first conditional statement (<=250), I get the exact same error, with "...replacement has 6530 rows..." although there are way fewer than 6530 records with value<=250, and value is never NA.

This SO link notes a similar error when using aggregate() was a bug, but it recommends installing the version of R I have. Plus the bug report says its fixed. R aggregate error: "replacement has <foo> rows, data has <bar>"

This SO link seems more related to my issue, and the issue here was an issue with his/her conditional logic that caused fewer elements of the replacement array to be generated. I guess that must be my issue as well, and figured at first I must have a "<=" instead of an "<" or vice versa, but after checking I'm pretty sure they're all correct to cover every value of "value" without overlaps. R error in '[<-.data.frame'... replacement has # items, need #

Strychnic answered 23/4, 2015 at 5:59 Comment(1)
You need to follow what @akrun said and use cut. However, if you want to use your method, initialize the new column first than give your commands: df$valueBin<-"" and then the other assignments.Guthrey
M
24

You could use cut

 df$valueBin <- cut(df$value, c(-Inf, 250, 500, 1000, 2000, Inf), 
    labels=c('<=250', '250-500', '500-1,000', '1,000-2,000', '>2,000'))

data

 set.seed(24)
 df <- data.frame(value= sample(0:2500, 100, replace=TRUE))
Maduro answered 23/4, 2015 at 6:15 Comment(5)
Hey thanks a lot. Still not sure what was wrong with my original code, but this is definitely cleaner, and works.Strychnic
@MaxPower Glad to know that it works. As nicola mentioned in the comments, when you assign valueBin to a subset (based on the condition) without first creating the valueBin as '' or NA, it will result in the length errorMaduro
@Maduro Any idea why this is giving such a special length error? Doesn't the the valueBin vector simply have length 0 if we investigate using R's length() function?Salzhauer
@Maduro This question has about 27K views, and this error is common. Could you edit and add a reason why they are getting the error. Your solution certainly works but it would be nice to have an explanation why OP was getting the error.Chemotaxis
@Chemotaxis I think there is an answer below explaining the reason. I must have not read the comment correctlyMaduro
C
91

The answer by @akrun certainly does the trick. For future googlers who want to understand why, here is an explanation...

The new variable needs to be created first.

The variable "valueBin" needs to be already in the df in order for the conditional assignment to work. Essentially, the syntax of the code is correct. Just add one line in front of the code chuck to create this name --

df$newVariableName <- NA

Then you continue with whatever conditional assignment rules you have, like

df$newVariableName[which(df$oldVariableName<=250)] <- "<=250"

I blame whoever wrote that package's error message... The debugging was made especially confusing by that error message. It is irrelevant information that you have two arrays in the df with different lengths. No. Simply create the new column first. For more details, consult this post https://www.r-bloggers.com/translating-weird-r-errors/

Comedy answered 4/2, 2017 at 3:12 Comment(1)
upvote doesn't convey enough thanks for that last paragraph.Spoken
M
24

You could use cut

 df$valueBin <- cut(df$value, c(-Inf, 250, 500, 1000, 2000, Inf), 
    labels=c('<=250', '250-500', '500-1,000', '1,000-2,000', '>2,000'))

data

 set.seed(24)
 df <- data.frame(value= sample(0:2500, 100, replace=TRUE))
Maduro answered 23/4, 2015 at 6:15 Comment(5)
Hey thanks a lot. Still not sure what was wrong with my original code, but this is definitely cleaner, and works.Strychnic
@MaxPower Glad to know that it works. As nicola mentioned in the comments, when you assign valueBin to a subset (based on the condition) without first creating the valueBin as '' or NA, it will result in the length errorMaduro
@Maduro Any idea why this is giving such a special length error? Doesn't the the valueBin vector simply have length 0 if we investigate using R's length() function?Salzhauer
@Maduro This question has about 27K views, and this error is common. Could you edit and add a reason why they are getting the error. Your solution certainly works but it would be nice to have an explanation why OP was getting the error.Chemotaxis
@Chemotaxis I think there is an answer below explaining the reason. I must have not read the comment correctlyMaduro
C
7

TL;DR ...and late to the party, but that short explanation might help future googlers..

In general that error message means that the replacement doesn't fit into the corresponding column of the dataframe.

A minimal example:

df <- data.frame(a = 1:2); df$a <- 1:3

throws the error

Error in $<-.data.frame(*tmp*, a, value = 1:3) : replacement has 3 rows, data has 2

which is clear, because the vector a of df has 2 entries (rows) whilst the vector we try to replace has 3 entries (rows).

Coastline answered 29/9, 2020 at 15:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.