Count number of non-NA values by group
Asked Answered
T

3

5

For example, I have this data frame (df):

Color    X1      X2    X3    X4
Red      1       1     0     2
Blue     0       NA    4     1 
Red      3       4     3     1
Green    2       2     1     0

I would like to create a function that counts up the number of non-NAs in "X2" by group (i.e. by "color"). I would like the output of this function in a new data frame named newdf. This is what I would like for output:

Color    X2     
Red      2      
Blue     NA    
Green    1

So far, I have this code:

Question <- function(Color){
  Result <-
    rowsum((df[c("X2")] > 0) + 0, df[["X2"]], na.rm = TRUE) 
  rowSums(Result)[[Color]]
  }
  Question("Red") 

The output this function gives is just Question("Red")= 2 and I would like to instead get my results of all the colors in a new data frame (newdf). Can anyone help with this? Thanks!

Trochee answered 25/1, 2017 at 13:52 Comment(2)
You can also have a named vector tapply(ifelse(is.na(df$X2), NA, 1), df$Color, FUN=sum)Molini
using rowsum: rowsum(as.numeric(!is.na(dt$X2)), dt$Color)Josi
H
5

Or if you wanted to use data.table:

library(data.table)

dt[,sum(!is.na(X2)),by=.(Color)]

  Color V1
1:   Red  2
2:  Blue  0
3: Green  1

Also its easy enough to use an ifelse() in your data.table to get an NA for blue instead of 0. See:

dt[,ifelse(sum(!is.na(X2)==0),as.integer(NA),sum(!is.na(X2))),by=.(Color)]

   Color V1
1:   Red  2
2:  Blue NA
3: Green  1

Data:

 dt <- as.data.table(fread("Color    X1      X2    X3    X4
Red      1       1     0     2
Blue     0       NA    4     1 
Red      3       4     3     1
Green    2       2     1     0"))
Homeomorphism answered 25/1, 2017 at 13:58 Comment(0)
E
7
library(dplyr)
df1 <-  df %>%
           group_by(Color) %>%
           summarise(sum(!is.na(X2)))
df1
#  (chr)           (int)
#1   Red               2
#2  Blue               0
#3 Green               1

and if you really want NA instead of that 0 then

df1[df1 ==0]<-NA
Exothermic answered 25/1, 2017 at 13:57 Comment(0)
H
5

Or if you wanted to use data.table:

library(data.table)

dt[,sum(!is.na(X2)),by=.(Color)]

  Color V1
1:   Red  2
2:  Blue  0
3: Green  1

Also its easy enough to use an ifelse() in your data.table to get an NA for blue instead of 0. See:

dt[,ifelse(sum(!is.na(X2)==0),as.integer(NA),sum(!is.na(X2))),by=.(Color)]

   Color V1
1:   Red  2
2:  Blue NA
3: Green  1

Data:

 dt <- as.data.table(fread("Color    X1      X2    X3    X4
Red      1       1     0     2
Blue     0       NA    4     1 
Red      3       4     3     1
Green    2       2     1     0"))
Homeomorphism answered 25/1, 2017 at 13:58 Comment(0)
H
0

With base R we can use aggregate with na.action parameter as na.pass to allow NA values

aggregate(X2~Color, df, function(x) sum(!is.na(x)), na.action = na.pass)

#  Color X2
#1  Blue  0
#2 Green  1
#3   Red  2
Headstall answered 25/1, 2017 at 14:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.