I often have the problem that R converts my one column data frames into character vectors, which I solve by using the drop=FALSE
option.
However, there are some instances where I do not know how to put a solution to this kind of behavior in R, and this is one of them.
I have a data frame like the following:
mydf <- data.frame(ID=LETTERS[1:3], value1=paste(LETTERS[1:3], 1:3), value2=paste(rev(LETTERS)[1:3], 1:3))
that looks like:
> mydf
ID value1 value2
1 A A 1 Z 1
2 B B 2 Y 2
3 C C 3 X 3
The task I am doing here, is to replace spaces by _
in every column except the first, and I want to use an apply
family function for this, sapply
in this case.
I do the following:
new_df <- as.data.frame(sapply(mydf[,-1,drop=F], function(x) gsub("\\s+","_",x)))
new_df <- cbind(mydf[,1,drop=F], new_df)
The resulting data frame looks exactly how I want it:
> new_df
ID value1 value2
1 A A_1 Z_1
2 B B_2 Y_2
3 C C_3 X_3
My problem starts with some rare cases where my input can have one row of data only. For some reason I never understood, R has a completely different behavior in these cases, but no drop=FALSE
option can save me here...
My input data frame now is:
mydf <- data.frame(ID=LETTERS[1], value1=paste(LETTERS[1], 1), value2=paste(rev(LETTERS)[1], 1))
which looks like:
> mydf
ID value1 value2
1 A A 1 Z 1
However, when I apply the same code, my resulting data frame looks hideous like this:
> new_df
ID sapply(mydf[, -1, drop = F], function(x) gsub("\\\\s+", "_", x))
value1 A A_1
value2 A Z_1
How to solve this issue so that the same line of code gives me the same kind of result for input data frames of any number of rows?
A deeper question would be why on earth does R do this? I keep going back to my codes when I have some new weird inputs with one row/column cause they break everything... Thanks!