Replace all NA values for variable with one row equal to 0
Asked Answered
B

9

17

Slightly difficult to phrase, as far as I saw none of the similar questions answered my problem.

I have a data.frame such as:

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))

df1

   id val
1   a  NA
2   a  NA
3   a  NA
4   a  NA
5   b   1
6   b   2
7   b   2
8   b   3
9   c  NA
10  c   2
11  c  NA
12  c   3

and I want to get rid of all the NA values (easy enough using e.g. filter() ) but make sure that if this removes all of one id value (in this case it removes every instance of "a") that one extra row is inserted of (e.g.) a = 0

so that:

  id val
1  a   0
2  b   1
3  b   2
4  b   2
5  b   3
6  c   2
7  c   3

obviously easy enough to do this in a roundabout way but I was wondering if there's a tidy/elegant way to do this. I thought tidyr::complete() might help but not entirely sure how to apply it to a case like this

I don't care about the order of the rows

Cheers!

edit: updated with clearer desired output. might make desired answers submitted before that a bit less clear

Breakwater answered 3/1, 2019 at 12:43 Comment(3)
So you want to add rows with 0 only if all the values for particular id is 0?Christos
only if they're all NA for a particular idBreakwater
@RobertHickman There seems to be some confusion about your desired output. Could you update your question with the expected output based on this df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) ? Thanks to @VivekKalyanarangan for the data.Jonjona
H
9

Another idea using dplyr,

library(dplyr)

df1 %>% 
 group_by(id) %>% 
 mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>% 
 na.omit()

which gives,

# A tibble: 5 x 2
# Groups:   id [2]
  id      val
  <fct> <dbl>
1 a         0
2 b         1
3 b         2
4 b         2
5 b         3
Hypocorism answered 3/1, 2019 at 13:34 Comment(2)
(+1) Seems like the most robust answer here. Would be marginally more concise using replace(val, all(is.na(val)) * 1, 0) instead of the ifelse(...).Ulyssesumayyad
@MikkoMarttila Good suggestion. I usually try and avoid ifelse in generalHypocorism
C
3

We may do

df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))
# A tibble: 5 x 2
# Groups:   id [2]
#   id      val
#   <fct> <dbl>
# 1 a         0
# 2 b         1
# 3 b         2
# 4 b         2
# 5 b         3

After grouping by id, if everything in val is NA, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit.

In a more readable format that would be

df1 %>% group_by(id) %>% 
  do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))

(Here I presume that you indeed want to get rid of all NA values; otherwise there is no need for na.omit.)

Crossly answered 3/1, 2019 at 13:11 Comment(3)
@markus, right, I had assumed that that's the goal. Thanks!Crossly
It looks like op wants to retain the first row and replace the val column of that row with 0 where all val is NA for a group. Check my ans pls. Agree with @markus, it does seem trickyReturn
@VivekKalyanarangan, that's what I initially thought, but "and I want to get rid of all the NA values" suggests otherwise.Crossly
Q
2
df1[is.na(df1)] <- 0
df1[!(duplicated(df1$id) & df1$val == 0), ]

  id val
1  a   0
5  b   1
6  b   2
7  b   2
8  b   3
Quillon answered 3/1, 2019 at 13:2 Comment(2)
Would this work for ids that contain NAs and non-NAs? Try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3))Jonjona
I think this is the best so far (I'll leave it open for another hour or so to see) would maybe change to df %>% replace(is.na(.), 0) %>% .[!(duplicated(.$id) & .$val == 0), ]Breakwater
C
1

Base R option is to find groups with all NAs and transform them by changing their val to 0 and select only unique rows so that there is only one row per group. We rbind this dataframe with the groups which are !all_NA.

all_NA <- with(df1, ave(is.na(val), id, FUN = all))
rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])

#  id val
#1  a   0
#5  b   1
#6  b   2
#7  b   2
#8  b   3

dplyr option looks ugly but one way is to make two groups of dataframes one with groups of all NA values and other with groups of all non-NA values. For groups with all NA values we add row with it's id and val as 0 and bind this to the other group.

library(dplyr)

bind_rows(df1 %>%
            group_by(id) %>%
            filter(all(!is.na(val))), 
          df1 %>%
             group_by(id) %>%
             filter(all(is.na(val))) %>%
             ungroup() %>%
             summarise(id = unique(id), 
                       val = 0)) %>%
arrange(id)


#   id      val
#  <fct> <dbl>
#1  a         0
#2  b         1
#3  b         2
#4  b         2
#5  b         3
Christos answered 3/1, 2019 at 12:56 Comment(0)
M
1

Here is an option too:

df1 %>% 
  mutate_if(is.factor,as.character) %>% 
 mutate_all(funs(replace(.,is.na(.),0))) %>% 
  slice(4:nrow(.))

This gives:

 id val
1  a   0
2  b   1
3  b   2
4  b   2
5  b   3

Alternative:

df1 %>% 
  mutate_if(is.factor,as.character) %>% 
 mutate_all(funs(replace(.,is.na(.),0))) %>% 
  unique()

UPDATE based on other requirements: Some users suggested to test on this dataframe. Of course this answer assumes you'll look at everything by hand. Might be less useful if you have to look at everything by "hand" but here goes:

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))


df1 %>% 
  mutate_if(is.factor,as.character) %>% 
  mutate(val=ifelse(id=="a",0,val)) %>% 
  slice(4:nrow(.))

This yields:

 id val
1  a   0
2  b   1
3  b   2
4  b   2
5  b   3
6  c  NA
7  c   2
8  c  NA
9  c   3
Mccullum answered 3/1, 2019 at 13:18 Comment(6)
where did 4 come from?Hypocorism
The solution produces four 0s. We're only interested in having 1?Mccullum
What if one group has 4 and another 3?Hypocorism
Sorry I only answered based on the question. Maybe then we could twist things up, not sure though!Mccullum
Consider this example - df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3)) I think here OP wants to remove NA values for A group only, not the restReturn
Plus not sure what OP's intention is yet. Seems everyone interpreted the question differently.Mccullum
R
1

Changed the df to make example more exhaustive -

df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
library(dplyr)
df1 %>%
  group_by(id) %>%
  mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%
  mutate(val=ifelse(is.na(val)&case,0,val)) %>%
  filter( !(case&row_num!=1) ) %>%
  select(id, val)

Output

  id      val
  <fct> <dbl>
1 a         0
2 b         1
3 b         2
4 b         2
5 b         3
6 c        NA
7 c         2
8 c        NA
9 c         3
Return answered 3/1, 2019 at 13:25 Comment(0)
G
1

Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:

df1 <- na.omit(df1)

df1 <- rbind(
  df1, 
  data.frame(
    id  = levels(df1$id)[!levels(df1$id) %in% df1$id], 
    val = 0)
  )

I do personally prefer the dplyr approach given by Sotos, as I don't like rbind-ing data.frames back together so it's a matter of taste, but this isn't unbearably complicated by my eye. It's easy enough to adapt to a character id column with a unique(df1$id) variable.

Graphics answered 3/1, 2019 at 16:9 Comment(0)
T
0

Here is a base R solution.

res <- lapply(split(df1, df1$id), function(DF){
  if(anyNA(DF$val)) {
    i <- is.na(DF$val)
    DF$val[i] <- 0
    DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])
  }
  DF
})
res <- do.call(rbind, res)
row.names(res) <- NULL
res
#  id val
#1  a   0
#2  b   1
#3  b   2
#4  b   2
#5  b   3

Edit.

A dplyr solution could be the following. It was tested with the original dataset posted by the OP, with the dataset in Vivek Kalyanarangan's answer and with the dataset in markus' comment, renamed df2 and df3, respectively.

library(dplyr)

na2zero <- function(DF){
  DF %>%
    group_by(id) %>%
    mutate(val = ifelse(is.na(val), 0, val),
           crit = val == 0 & duplicated(val)) %>%
    filter(!crit) %>%
    select(-crit)
}

na2zero(df1)
na2zero(df2)
na2zero(df3)
Tupper answered 3/1, 2019 at 13:3 Comment(2)
Rui, try with df1 <- data.frame(id = rep(c("a", "b"), each = 2), val = c(NA, 1, 2, 3)). Unfortunately your solution doesn't return a data frame with only three rows.Jonjona
@Jonjona No, it doesn't. The NA is replaced by a 0 and the other value of val is not NA so both must be in the output. At least that's how I'm understanding the OP's problem.Tupper
G
0

One may try this :

df1 = data.frame(id = rep(c("a", "b","c"), each = 4),
                  val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
#   id val
#1   a  NA
#2   a  NA
#3   a  NA
#4   a  NA
#5   b   1
#6   b   2
#7   b   2
#8   b   3
#9   c  NA
#10  c   2
#11  c  NA
#12  c   3

Task is to remove all rows corresponding to any id IFF val for the corresponding id is all NAs and add new row with this id and val = 0.
In this example, id = a.

Note : val for c also has NAs but all the val corresponding to c are not NA therefore we need to remove the corresponding row for c where val = NA.

So lets create another column say, val2 which indicates 0 means its all NAs and 1 otherwise.

library(dplyr)

df1 = df1 %>% 
     group_by(id) %>%
     mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false =  1))
df1

# A tibble: 12 x 3
# Groups:   id [3]
#   id      val  val2
#   <fct> <dbl> <dbl>
#1 a        NA     0
#2 a        NA     0
#3 a        NA     0
#4 a        NA     0
#5 b         1     1
#6 b         2     1
#7 b         2     1
#8 b         3     1
#9 c        NA     1
#10 c        2     1
#11 c       NA     1
#12 c        3     1

Get the list of ids with corresponding val = NA for all.

all_na = unique(df1$id[df1$val2 == 0])

Then remove theids from the dataframe df1 with val = NA.

df1 = na.omit(df1)
df1
# A tibble: 6 x 3
# Groups:   id [2]
# id      val  val2
# <fct> <dbl> <dbl>
# 1 b         1     1
# 2 b         2     1
# 3 b         2     1
# 4 b         3     1
# 5 c         2     1
# 6 c         3     1

And create a new dataframe with ids in all_na and val = 0

all_na_df = data.frame(id = all_na, val = 0) 
all_na_df
# id val
# 1  a   0

then combine these two dataframes.

df1 = bind_rows(all_na_df, df1[,c('id', 'val')])
df1

#    id val
# 1  a   0
# 2  b   1
# 3  b   2
# 4  b   2
# 5  b   3
# 6  c   2
# 7  c   3

Hope this helps and Edits are most welcomed :-)

Gelignite answered 8/1, 2019 at 10:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.