Why are R's sapply and switch functions treating a character value like a function?
Asked Answered
R

3

5

I am trying to use sapply and switch to apply descriptive names to data. I've used this methodology many times without issue, but for (only one!) column in my most recent project it is throwing an error. My best guess is that even though the value is saved as a character string, that value would otherwise be a reserved word in R. I've created a reproduceable example below.

The actual values in my project are not gender related and could have many possible options. Can someone please tell me how to make this work with sapply/switch to avoid many nested ifelse statements in my code?

# create test data
testdta <- as.data.frame(cbind(userid = c("1", "2", "3", "4"), gender = c("F", "M", "F", "M")))

# sapply/switch works with strings that are not reserved words
testdta$uiddescription <- sapply(testdta$userid, switch, "1" = "1 - first", "2" = "2 - second", "3+ - third or beyond")
testdta

# sapply/switch won't work when trying to interpret gender (possibly because F is reserved?)
testdta$gdescription <- sapply(testdta$gender, switch, "F" = "F - female", "M" = "M - male")

The error I am receiving is "Error in get(as.character(FUN), mode = "function", envir = envir) : object 'F - female' of mode 'function' was not found."

Roshan answered 19/12, 2023 at 13:53 Comment(1)
Side note: don't use as.data.frame(cbind(..)): it is inefficient at best (as here), and if you have mixed-class columns, it will convert all of them to strings.Pettit
L
7

It's happening because of partial argument matching in sapply. It reads the "F" = as the FUN argument in sapply. If you are explicit and do FUN = switch it will work.

Loosejointed answered 19/12, 2023 at 14:3 Comment(0)
P
6

I think @joran's answer resolves why "F" will trigger the problem.

However ... using sapply/switch for this purpose is perhaps not the most efficient way to do what you're doing (see the benchmark below to see how different they are).

dictionary lookup

vec <- c("F" = "F - female", "M" = "M - male")
vec[testdta$gender]
#            F            M            F            M 
# "F - female"   "M - male" "F - female"   "M - male" 

merge/join

genders <- data.frame(gender=c("F", "M"), gender2=c("F - female", "M - male"))
merge(testdta, genders, by="gender", all.x=TRUE)
#   gender userid    gender2
# 1      F      1 F - female
# 2      F      3 F - female
# 3      M      2   M - male
# 4      M      4   M - male

The concept of merge/join is great but can get complicated if you are not familiar, see How to join (merge) data frames (inner, outer, left, right), What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?, (data.table) Left join using data.table.

benchmark

bench::mark(
  sapply = sapply(testdta$gender, FUN=switch, "F" = "F - female", "M" = "M - male"),
  dict = vec[testdta$gender],
  join = merge(testdta, genders, by="gender", all.x=TRUE),
  check = FALSE)
# # A tibble: 3 × 13
#   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory time                gc                   
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list> <list>              <list>               
# 1 sapply       9.29µs     11µs    88949.        NA     8.90  9999     1    112.4ms <NULL> <NULL> <bench_tm [10,000]> <tibble [10,000 × 3]>
# 2 dict         1.35µs   1.48µs   656398.        NA     0    10000     0     15.2ms <NULL> <NULL> <bench_tm [10,000]> <tibble [10,000 × 3]>
# 3 join       165.12µs 194.11µs     5169.        NA     6.33  2450     3    473.9ms <NULL> <NULL> <bench_tm [2,453]>  <tibble [2,453 × 3]> 

An easy column to use visually is `itr/sec`, or iterations-per-second (more is better).

This will change a bit when dealing with larger data (4 rows is rather small), but even if this is truly representative of your real data, then it demonstrates clear performance differences.

Pettit answered 19/12, 2023 at 14:3 Comment(1)
dplyr::case_when is an option for >2 categories and just ifelse or dplyr::if_else if there are only two categories.Loosejointed
A
3

Some more options:

factor() with levels & labels

factor(testdta$gender, levels = c("F", "M"),
       labels = c("F - female", "M - male"))

dplyr::case_when/case_match

slightly clunkier in this case (since the testdta$gender == has to be repeated)

dplyr::case_when(testdta$gender == "F" ~ "F-female",
                 testdta$gender == "M" ~ "M-male")

A better alternative, from this Q&A, is dplyr::case_match():

dplyr::case_match(testdta$gender,
                   "F" ~ "F-female",
                   "M" ~ "M-male")
Akilahakili answered 19/12, 2023 at 14:17 Comment(1)
TIL dplyr::case_matchPettit

© 2022 - 2024 — McMap. All rights reserved.