Is there a more compact way to print factor levels in a matrix than to use apply?

Asked 18/8 at 20:55 Answered 19/8 at 8:37

I have a matrix which would be a matrix of factors if R supported them. I want to print the matrix with the factor names, rather than integers, for readability. Using indexing loses the matrix structure. Is there a tidier workaround than the one I have below?

care_types = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 
6L, 5L, 4L, 1L, 6L, 4L, 4L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 6L, 5L, 
1L, 2L, 1L, 5L, 5L, 2L, 1L, 5L, 2L, 3L, 1L, 3L, 6L, 1L, 5L, 6L, 
5L, 5L, 1L, 5L, 6L, 4L, 5L, 3L, 1L, 2L, 2L, 1L, 3L, 5L, 5L), dim = c(10L, 6L))

care_type_names = c('M', 'F', 'O', 'I', 'H', 'C')


# This loses the dimensions
care_type_names[care_types]

# This works but is clunky
apply(care_types, 1:2, function(v) {return(care_type_names[[v]])})

# This doesn't work and I don't follow why
apply(care_types, 1:2, ~care_type_names[[.x]])

Retreat answered 18/8 at 20:55 Comment(3)

apply(care_types, 1:2, \(x) care_type_names[[x]])? – Rembrandt 18/8 at 21:2

Looks like apply(care_types, 2, \(x) care_type_names[x]) also works. – Retreat 18/8 at 21:23

@Retreat yes apply(care_types, 2, ...) works and saves a lot of time compared to apply(care_types, 1:2, ...). You can check the benchmark part in my answer! – Enloe 19/8 at 6:39

Though care_type_names[care_types] loses the dimensions, you can reshape it into an array() with the original dimensions.

array(care_type_names[care_types], dim(care_types))

#       [,1] [,2] [,3] [,4] [,5] [,6]
#  [1,] "M"  "H"  "M"  "F"  "O"  "I" 
#  [2,] "M"  "H"  "M"  "M"  "C"  "H" 
#  [3,] "M"  "H"  "H"  "H"  "M"  "O" 
#  [4,] "M"  "C"  "M"  "H"  "H"  "M" 
#  [5,] "M"  "H"  "F"  "F"  "C"  "F" 
#  [6,] "M"  "I"  "M"  "M"  "H"  "F" 
#  [7,] "M"  "M"  "M"  "H"  "H"  "M" 
#  [8,] "M"  "C"  "C"  "F"  "M"  "O" 
#  [9,] "M"  "I"  "H"  "O"  "H"  "H" 
# [10,] "M"  "I"  "M"  "M"  "C"  "H"

Benchmark on a larger matrix

array() slightly outperforms replace() and apply() when the matrix is large. (1000x1000 in this case)

Update!

The `dim<-`() method proposed by @ThomasIsCoding further improves the performance.

care_types <- matrix(sample(1:26, 1e6, replace = TRUE), nrow = 1e3, ncol = 1e3)
care_type_names <- LETTERS

microbenchmark::microbenchmark(
  tic = `dim<-`(care_type_names[care_types], dim(care_types)),
  darren = array(care_type_names[care_types], dim(care_types)),
  thelatemail_1 = replace(care_types, , care_type_names[care_types]),
  thelatemail_2 = `[<-`(care_types, , care_type_names[care_types]),
  Mohan = apply(care_types, 2, \(x) care_type_names[x]),
  Bernhard = apply(care_types, 1:2, \(x) care_type_names[[x]]),
  times = 100L,
  check = "identical",
  unit = "relative"
)

# Unit: relative
#           expr        min         lq       mean     median         uq        max neval cld
#            tic   1.000000   1.000000   1.000000   1.000000   1.000000  1.0000000   100  a 
#         darren   1.840252   1.768748   1.270138   1.785627   1.757518  0.2196492   100  a 
#  thelatemail_1   3.226985   3.226587   2.469337   3.169915   3.138248  1.3020976   100  a 
#  thelatemail_2   3.090107   3.088408   2.252608   3.032059   4.644815  0.3800733   100  a 
#          Mohan   3.729082   3.745017   2.834293   3.737331   5.164634  1.0882580   100  a 
#       Bernhard 268.241097 299.505030 180.258010 287.793907 283.245930 15.8452711   100  b

Enloe answered 19/8 at 6:18 Comment(1)

good array approach plus a benchmark, +1! I think array is a nice start point. If you want to boost the speed, you can try dim<- directly, which is shorter and seems faster. I guess array has some overhead when being initialized. – Delinquent 19/8 at 8:38

Still do your indexing directly, but replace the original structure:

replace(care_types, , care_type_names[care_types])

This is essentially the same as overwriting the original structure using [<-

out <- care_types
out[] <- care_type_names[care_types]
out

Both giving:

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "M"  "H"  "M"  "F"  "O"  "I" 
## [2,] "M"  "H"  "M"  "M"  "C"  "H" 
## [3,] "M"  "H"  "H"  "H"  "M"  "O" 
## [4,] "M"  "C"  "M"  "H"  "H"  "M" 
## [5,] "M"  "H"  "F"  "F"  "C"  "F" 
## [6,] "M"  "I"  "M"  "M"  "H"  "F" 
## [7,] "M"  "M"  "M"  "H"  "H"  "M" 
## [8,] "M"  "C"  "C"  "F"  "M"  "O" 
## [9,] "M"  "I"  "H"  "O"  "H"  "H" 
##[10,] "M"  "I"  "M"  "M"  "C"  "H"

Vamoose answered 18/8 at 22:0 Comment(1)

A one step variant might be `[<-`(care_types, care_type_names[care_types]) – Preferential 19/8 at 7:23

You can simply use `dim<-`

care_types <- matrix(sample(1:26, 1e6, replace = TRUE), nrow = 1e3, ncol = 1e3)
care_type_names <- LETTERS

microbenchmark(
  darren = array(care_type_names[care_types], dim(care_types)),
  tic = `dim<-`(care_type_names[care_types], dim(care_types)),
  check = "identical",
  unit = "relative"
)

which gives

Unit: relative
   expr      min       lq    mean   median       uq      max neval
 darren 2.402979 1.886056 1.78914 1.891474 1.822222 1.223973   100
    tic 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000   100

Delinquent answered 19/8 at 8:37 Comment(0)

You could write an anonymous function with \(x) as in

apply(care_types, 1:2, \(x) care_type_names[[x]])

which returns

 [1,] "M"  "H"  "M"  "F"  "O"  "I" 
 [2,] "M"  "H"  "M"  "M"  "C"  "H" 
 [3,] "M"  "H"  "H"  "H"  "M"  "O" 
 [4,] "M"  "C"  "M"  "H"  "H"  "M" 
 [5,] "M"  "H"  "F"  "F"  "C"  "F" 
 [6,] "M"  "I"  "M"  "M"  "H"  "F" 
 [7,] "M"  "M"  "M"  "H"  "H"  "M" 
 [8,] "M"  "C"  "C"  "F"  "M"  "O" 
 [9,] "M"  "I"  "H"  "O"  "H"  "H" 
[10,] "M"  "I"  "M"  "M"  "C"  "H"

The result is a matrix of type character:

> apply(care_types, 1:2, \(x) care_type_names[[x]]) |> typeof()
[1] "character"

However, the tidiest way to handle that might be writing your own print function for your kind of integer matrix that is supposed to be printed like a matrix of factors. It might start as simple as the following one or grow to your liking:

care_types = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 
                         6L, 5L, 4L, 1L, 6L, 4L, 4L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 6L, 5L, 
                         1L, 2L, 1L, 5L, 5L, 2L, 1L, 5L, 2L, 3L, 1L, 3L, 6L, 1L, 5L, 6L, 
                         5L, 5L, 1L, 5L, 6L, 4L, 5L, 3L, 1L, 2L, 2L, 1L, 3L, 5L, 5L), dim = c(10L, 
                                                                                              6L))
care_type_names = c('M', 'F', 'O', 'I', 'H', 'C')


mprint <- function(m, n = c('M', 'F', 'O', 'I', 'H', 'C')){
  for(row in 1:nrow(m)){
    cat("  | ")
    for(column in 1:ncol(m)){
      cat("  ")
      cat(n[m[row, column]])
    }
    cat("  |\n")
  }
}

mprint(care_types)
#>   |   M  H  M  F  O  I  |
#>   |   M  H  M  M  C  H  |
#>   |   M  H  H  H  M  O  |
#>   |   M  C  M  H  H  M  |
#>   |   M  H  F  F  C  F  |
#>   |   M  I  M  M  H  F  |
#>   |   M  M  M  H  H  M  |
#>   |   M  C  C  F  M  O  |
#>   |   M  I  H  O  H  H  |
#>   |   M  I  M  M  C  H  |

^{Created on 2024-08-18 with reprex v2.1.1}

Hexagon answered 18/8 at 21:0 Comment(0)

Benchmark on a larger matrix

Update!

Recommended topics

Hot tags