Behaviour of case_when with numeric(0)

About

Asked 28/1, 2021 at 17:22 Answered 28/1, 2021 at 17:22

I have a problem understanding how dplyr::case_when works. Here with this pretty simple line :

library(tidyverse)
case_when(TRUE ~ 50,
          FALSE ~ numeric(0))

I get numeric(0) while obviously, TRUE is TRUE and so it should send back 50. Besides, FALSE is FALSE so it should never send back numeric(0). I have not the problem if I write :

case_when(TRUE ~ 50,
      FALSE ~ NaN)

Where I get 50, which is right. What do I miss ?

Beneficence answered 28/1, 2021 at 17:22 Comment(11)

I think the problem is that numeric(0) returns a vector of length 0. If you try numeric(1) (which is a vector of length 1 with a value of 0) then it works. case_when should be reporting an error I would say, but it's not. – Wavawave 28/1, 2021 at 17:35

For me this is unwanted behavour and I wasn't aware of it. Maybe you can notice the dplyr team on github. Generally, every outcome of case_when should have the same type and the same length. For example, case_when(TRUE ~ 1:3, FALSE ~ 1:2) throws an error. – Concettaconcettina 28/1, 2021 at 18:2

Huh, on rereading the question, I was assuming (and mis-reading) that the first code block failed. It should, in my mind. I'm with @Cettt, this is unwanted behavior. – Problematic 28/1, 2021 at 18:8

Apparently the dplyr team sees this as a feature? – Gauntlett 28/1, 2021 at 18:10

It is complicated though. My immediate reaction is that I don't want case_when evaluating things it doesn't need to. I'd forego length checking for efficiency. case_when(TRUE ~ 1, FALSE ~ {Sys.sleep(10); 0}) takes 10 seconds to return, but it could be instant. – Gauntlett 28/1, 2021 at 18:11

if_else and case_when are not short-circuited, @GregorThomas; while I agree that it would be a great thing, I don't think it's in the cards to make it so. :-( – Problematic 28/1, 2021 at 18:13

Apparently not. I had assumed that was one of the things if_else did to improve performance over ifelse, but base::ifelse(TRUE, 1, {Sys.sleep(10); 0}) actually is short-circuited! – Gauntlett 28/1, 2021 at 18:15

I am opening a new issue, because the documentation seems murky at the very least. – Gauntlett 28/1, 2021 at 18:16

@GregorThomas, I disagree about optimizing out length-checking: R recycling, as long as its been around, has led to so many bugs when not recognized. When recycling is not desired but it just happens to be that the one vector length is a multiple of the other, recycling happens and likely corrupts the data. In my head, recycling should be length-same or length-1, nothing else unless explicitly allowed </rant>. Unlikely to change in base R, unfortunately. But dplyr makes intentional effort on things similar to this (enforcing class, e.g., when ifelse does not), surprised about this. – Problematic 28/1, 2021 at 18:17

I agree with you 100% on recycling - I love data.table's approach there as well. But this seems more restrictive. Why does this throw warnings? x <- 1:-1; case_when(x > 0 ~ log(x), TRUE ~ as.numeric(x)). – Gauntlett 28/1, 2021 at 18:34

fcase warns, too ... and it does no recycling (a problem in my book), so TRUE would need to be rep(TRUE,3) here (c.f., github.com/Rdatatable/data.table/issues/4258, still open). – Problematic 28/1, 2021 at 19:5

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags