Why does the transpose function change numeric to character in R?
Asked Answered
T

1

16

I've constructed a simple matrix in Excel with some character values and some numeric values (Screenshot of data as set up in Excel). I read it into R using the openxlsx package like so:

library(openxlsx)
data <- read.xlsx('~desktop/data.xlsx)

After that I check the class:

sapply(data, class)
         x1         a         b          c
"character" "numeric" "numeric"  "numeric"

Which is exactly what I want. My problem occurs when I try to transpose the matrix, and then check for class again:

data <- t(data)

When i check with sapply now, all values are "character". Why are the classes not preserved when transposing?

Trengganu answered 19/10, 2015 at 13:49 Comment(10)
matrix can take only a single class ie. character when there are mixed class. If you specify what you really wanted, then there might be better options instead of transposing.Dreadful
Ok, thanks. But then why is the matrix fine with two different classes until i transpose it?Trengganu
Check the str of 'data'. It should be data.frameDreadful
What I really want: I have an xlsx-file similar to the data.xlsx example, except it has around 100k rows by 100 columns. I read it into R and select maybe 10 rows for further processing. In the end I want to be able to write these 10 rows to a new xlsx file, but for the readability I want to transpose the matrix.Trengganu
OK, so with my actual data I do read it as a data.frame. However, when I transpose the data.frame it's converted into a matrix. Isn't there a way to avoid this / an alternativ to transposing that performs the same task for a data.frame?Trengganu
As I mentoned, transposing result in matrix. But, you can wrap as.data.frame(t(data)) to make a data.frame. Still the problem is that when transposing,, you have at least one column that is character, which will result in all the columns to be 'character' or 'factor' in data.frame. One thing you can do is transpose only the 'numeric columns, i.e. t(data[-1])Dreadful
Thanks, this makes sense and the wrap works fine except, as you say, everything is converted to factors. I don't think it would work for me to transpose only the numeric columns, as I don't want to lose the relationship between "character" and "numeric" values.Trengganu
If the "character" column is not categorical, you could make it the "row.names" of your data and turn your data to "matrix"Ozenfant
Thanks. The problem is I actually have two columns with character values that I need to preserve.Trengganu
Each column of a data frame is of a single class. If the data frame you are transposing has two character columns, then all the columns of the transpose must be of class character.Waldenburg
L
12

First off, I don't get your result when I read in your spreadsheet due to the fact the the cells with comma separated numbers appear as characters.

data <- read.xlsx("data.xlsx")
data
#  X1   a b   c
#1  x 0,1 3 4,5
#2  y 2,4 0 6,5
#3  z  24 0   0
sapply(data,class)
#         X1           a           b           c 
#"character" "character"   "numeric" "character" 

But the issue you are really seeing is that by transposing the data frame you are mixing types in the same column so R HAS TO convert the whole column to the broadest common type, which is character in this case.

mydata<-data.frame(X1=c("x","y","z"),a=c(1,2,24),b=c(3,0,0),c=c(4,6,0),stringsAsFactors = FALSE)
sapply(mydata,class)
#         X1           a           b           c 
#"character"   "numeric"   "numeric"   "numeric" 
# what you showed
t(mydata)
#   [,1] [,2] [,3]
#X1 "x"  "y"  "z" 
#a  " 1" " 2" "24"
#b  "3"  "0"  "0" 
#c  "4"  "6"  "0" 

mydata_t<-t(mydata)
sapply(mydata_t,class)
#          x           1           3           4           y           2           #0           6           z          24 
#"character" "character" "character" "character" "character" "character" #"character" "character" "character" "character" 
#          0           0 
#"character" "character" 

Do you want to work on the numbers in the transposed matrix and transpose them back after? If so, transpose a sub-matrix that has the character columns temporarily removed, then reassemble later, like so:

sub_matrix<-t(mydata[,-1])
sub_matrix
#  [,1] [,2] [,3]
#a    1    2   24
#b    3    0    0
#c    4    6    0
sub_matrix2<-sub_matrix*2
sub_matrix2
#  [,1] [,2] [,3]
#a    2    4   48
#b    6    0    0
#c    8   12    0
cbind(X1=mydata[,1],as.data.frame(t(sub_matrix2)))
#  X1  a b  c
#1  x  2 6  8
#2  y  4 0 12
#3  z 48 0  0
Louanne answered 19/10, 2015 at 18:24 Comment(3)
Thanks a lot. I'm on a Danish machine, which is why the comma seperated numbers are read like numeric. Your solution seems to be almost what I'm looking for, except I don't want to transpose it back. I basically need to transpose it, then find a way to add the character column (that is not included in the sub_matrix cf. your example), as the first row of that newly transformed matrix. In the end the whole matrix including the character column should be transposed. Maybe I could create two different sub-matrices, tranpose them individually and then reassemble them rbind?Trengganu
A "matrix" must be all of the same type. A data frame can mix column types but not row types. You can have row names and column names, though, using rownames() and names(), respectively. Sorry I didn't pick up on the Euro decimal point convention.Louanne
I managed to solve my problem in a slightly different manner based on @Art's answer, and I believe that this answer will help others with a similar problem, so I'll throw a checkmark on it.Trengganu

© 2022 - 2024 — McMap. All rights reserved.