Remove duplicated elements from list

Asked 26/7, 2017 at 5:18 Answered 19/7, 2021 at 8:21

I have a list of character vectors:

my.list <- list(e1 = c("a","b","c","k"),e2 = c("b","d","e"),e3 = c("t","d","g","a","f"))

And I'm looking for a function that for any character that appears more than once across the list's vectors (in each vector a character can only appear once), will only keep the first appearance.

The result list for this example would therefore be:

res.list <- list(e1 = c("a","b","c","k"),e2 = c("d","e"),e3 = c("t","g","f"))

Note that it is possible that an entire vector in the list is eliminated so that the number of elements in the resulting list doesn't necessarily have to be equal to the input list.

Celestina answered 26/7, 2017 at 5:18 Comment(0)

We can unlist the list, get a logical list using duplicated and extract the elements in 'my.list' based on the logical index

un <- unlist(my.list)
res <- Map(`[`, my.list, relist(!duplicated(un), skeleton = my.list))
identical(res, res.list)
#[1] TRUE

Manganite answered 26/7, 2017 at 5:21 Comment(1)

n.b. this does not work for lists of mixed type: x=list(a=1,b='2'); relist(unlist(x),skel=x) yields list(a='1',b='2') – Leucoderma 31/7 at 9:7

Here is an alternative using mapply with setdiff and Reduce.

# make a copy of my.list
res.list <- my.list
# take set difference between contents of list elements and accumulated elements
res.list[-1] <- mapply("setdiff", res.list[-1],
                                  head(Reduce(c, my.list, accumulate=TRUE), -1))

Keeping the first element of the list, we compute on subsequent elements and the a list of the cumulative vector of elements produced by Reduce with c and the accumulate=TRUE argument. head(..., -1) drops the final list item containing all elements so that the lengths align.

This returns

res.list
$e1
[1] "a" "b" "c" "k"

$e2
[1] "d" "e"

$e3
[1] "t" "g" "f"

Note that in Reduce, we could replace c with function(x, y) unique(c(x, y)) and accomplish the same ultimate output.

Romaineromains answered 26/7, 2017 at 13:48 Comment(0)

I found the solutions here very complex for my understanding and sought a simpler technique. Suppose you have the following list.

my_list <- list(a = c(1,2,3,4,5,5), b = c(1,2,2,3,3,4,4), 
                
                d = c("Mary", "Mary", "John", "John"))

The following much simpler piece of code removes the duplicates.

sapply(my_list, unique)

You will end up with the following.

$a
[1] 1 2 3 4 5

$b
[1] 1 2 3 4

$d
[1] "Mary" "John"

There is beauty in simplicity!

Teryl answered 19/7, 2021 at 8:21 Comment(1)

This is not what was asked by the OP. – Ealasaid 1/3, 2022 at 20:17

Recommended topics

Hot tags