How can I fill the empty cells in each row of a dataframe with the value in the last non-empty cell in that row?
Asked Answered
S

7

5

I have a dataframe (or datatable, if that's easier) with incomplete rows:

Input

ID Var1 Var2 Var3
1     2    5    1
2    12    3
3     8
4     4

Code

d <- data.frame(
  ID = 1:4,
  Var1 = c(2, 12, 8, 4),
  Var2 = c(5, 3, NA, NA),
  Var3 = c(1, NA, NA, NA)
)

library(data.table)
d <- fread("
  ID Var1 Var2 Var3
  1 2 5 1
  2 12 3 NA
  3 8 NA NA
  4 4 NA NA
")

The empty cells are always at the end of a row.

I would like to fill the empty cells in each row with the value in the last non-empty cell in that row, e.g.:

ID Var1 Var2 Var3
1     2    5    1
2    12    3 -> 3
3     8 -> 8 -> 8
4     4 -> 4 -> 4

How do I do that?


I don't want to use dplyr and I don't want to fill columns.

Surrogate answered 28/5, 2024 at 10:17 Comment(0)
T
4

Another answer using the collapse package, which has the advantage of being particularly fast(er than data.table):

library(collapse)
dapply(d, na_locf, MARGIN = 1)

#   ID Var1 Var2 Var3
# 1  1    2    5    1
# 2  2   12    3    3
# 3  3    8    8    8
# 4  4    4    4    4

Microbenchmark:

# Unit: microseconds
#        expr     min       lq      mean   median      uq     max neval
#    collapse    69.5   112.95   244.847   135.45   161.7  9964.4   100
#          dt   592.9   788.70  1237.643   874.70  1186.6 14563.1   100
#       tidyr 32283.2 36170.80 41293.420 40501.55 43809.1 75417.8   100
#   Reduce_dt   645.0   803.70  1083.373   954.05  1222.6  2367.5   100
#  Reduce_TiC   383.9   499.25   661.475   586.40   687.6  5179.1   100

Code for benchmark:

microbenchmark::microbenchmark(
  collapse = dapply(d, na_locf, MARGIN = 1),
  dt = data.table::transpose(
    setnafill(data.table::transpose(d, keep.names = "ID"), 
              type = "locf", cols = 1:nrow(d) + 1), make.names = "ID"),
  tidyr = d %>% 
    pivot_longer(-ID) %>% 
    fill(value, .direction = "down") %>% 
    pivot_wider(),
  Reduce_dt = d[, (sprintf("Var%d", 1:3)) := Reduce(\(x, y) ifelse(is.na(y), x, y), .SD, accumulate = TRUE), .SDcols = sprintf("Var%d", 1:3)],
  Reduce_TiC = Reduce(\(x, y) ifelse(is.na(y), x, y), d[-1], accumulate = TRUE)
)
Tuddor answered 28/5, 2024 at 13:2 Comment(3)
Nice one, collapse - new package for me, it is part of fastverse which includes data.table, too! :DRenfroe
collapse is a whole new world where every new function is a blessing. I really like this package! :)Voyles
I wonder if that dataset is not too small for benchmarking. it only contains 4 rows. You are likely benchmarking mostly the overheads of these functions than the algorithm. I suggest you increase the number of rowsUninspired
R
3

Transpose and fill missing, then transpose again.

library(data.table)

transpose(
  setnafill(transpose(d, keep.names = "ID"), 
            type = "locf", cols = 1:nrow(d) + 1),
  make.names = "ID")

#       ID  Var1  Var2  Var3
#    <int> <int> <int> <int>
# 1:     1     2     5     1
# 2:     2    12     3     3
# 3:     3     8     8     8
# 4:     4     4     4     4
Renfroe answered 28/5, 2024 at 10:50 Comment(0)
P
2

Here is an approach in base:

(start <- structure(list(
  ID = 1:4,
  Var1 = c(2L, 12L, 8L, 4L),
  Var2 = c(5L, 3L, NA, NA),
  Var3 = c(1L, NA, NA, NA)
),
class = "data.frame",
row.names = c(NA, -4L)))


start$last <- apply(start,MARGIN = 1,FUN = \(row){row |> 
                                                  as.matrix() |> 
                                                  na.omit() |> 
                                                  tail(n=1)})

start$last

(fin <- apply(start,MARGIN = 1,FUN = \(row){ 
  n <- length(row)
  p <- ifelse(is.na(row),row[n] ,row)
  head(p,-1)
}) |> t() |> data.frame())
Pushup answered 28/5, 2024 at 10:51 Comment(0)
A
2

A tidyr::pivot/fill take:

library(tidyverse) # `tidyr`

# toy data
aux <- tibble::tribble(
  ~ID, ~Var1, ~Var2, ~Var3,
  1,     2,     5,     1,
  2,    12,     3,    NA,
  3,     8,    NA,    NA,
  4,     4,    NA,    NA)

# Pivot down, fill down and and pivot back
new_aux <- aux %>% 
  pivot_longer(-ID) %>% 
  fill(value, .direction = "down") %>% 
  pivot_wider()

The output:

> new_aux
# A tibble: 4 × 4
     ID  Var1  Var2  Var3
  <dbl> <dbl> <dbl> <dbl>
1     1     2     5     1
2     2    12     3     3
3     3     8     8     8
4     4     4     4     4

Created on 2024-05-28 with reprex v2.1.0

Acrilan answered 28/5, 2024 at 11:15 Comment(0)
T
1
d[, (cols) := Reduce(\(x, y) ifelse(is.na(y), x, y), .SD, accumulate = TRUE), .SDcols = cols]

Or

d[, (cols) := Reduce(fcoalesce, .SD, right = TRUE, accumulate = TRUE), .SDcols = cols]

Output

#       ID  Var1  Var2  Var3
#    <int> <int> <int> <int>
# 1:     1     2     5     1
# 2:     2    12     3     3
# 3:     3     8     8     8
# 4:     4     4     4     4

Where

cols <- sprintf("Var%d", 1:3)
d <- data.table(
  ID = 1:4,
  Var1 = c(2L, 12L, 8L, 4L),
  Var2 = c(5L, 3L, 8L, 4L),
  Var3 = c(1L, 3L, 8L, 4L)
)
Typebar answered 28/5, 2024 at 11:16 Comment(0)
B
1

You can simply run Reduce like below

> df[-1] <- Reduce(\(x, y) ifelse(is.na(y), x, y), df[-1], accumulate = TRUE)

or

df[-1] <- Reduce(\(x, y) rowSums(cbind(x, y), TRUE) - x * !is.na(y), df[-1], accumulate = TRUE)

which gives

> df
  ID Var1 Var2 Var3
1  1    2    5    1
2  2   12    3    3
3  3    8    8    8
4  4    4    4    4
Belenbelesprit answered 28/5, 2024 at 12:31 Comment(2)
Interesting! Wasn't aware of Reduce. Could you explain \(x, y)?Surrogate
@Ben it is short-hand expression of function(x,y) in recent R versionsBelenbelesprit
U
0

Using the commands set and fcoalesce from data.table package:

library(data.table)

# way 1
cols = c("Var1", "Var2", "Var3")
set(d, j=cols, value=Reduce(\(x, y) fcoalesce(y, x), d[cols], accumulate=TRUE))


# way 2
cols = c("Var2", "Var3")
p = d$Var1
for(cl in cols) set(d, j=cl, value=(p <- fcoalesce(d[[cl]], p)))

  ID Var1 Var2 Var3
1  1    2    5    1
2  2   12    3    3
3  3    8    8    8
4  4    4    4    4
Uninspired answered 29/5, 2024 at 11:8 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.