C# remove duplicates from List<List<int>>

Asked 8/10, 2012 at 15:29 Answered 10/2, 2018 at 14:26

I'm having trouble coming up with the most efficient algorithm to remove duplicates from List<List<int>>, for example (I know this looks like a list of int[], but just doing it that way for visual purposes:

my_list[0]= {1, 2, 3};
my_list[1]= {1, 2, 3};
my_list[2]= {9, 10, 11};
my_list[3]= {1, 2, 3};

So the output would just be

new_list[0]= {1, 2, 3};
new_list[1]= {9, 10, 11};

Let me know if you have any ideas. I would really appreciate it.

Goodnatured answered 8/10, 2012 at 15:29 Comment(3)

Is {1, 2, 3} equal to {3, 2, 1}? – Coadjutor 8/10, 2012 at 15:36

Well I know I can sort each element in that instance and those two will end up the same, so for the purposes here I'll say no. – Goodnatured 8/10, 2012 at 15:47

I would look at the answers below that use Linq, as that greatly simplifies your code (vs the ones using EqualityComparers). – Memorial 8/10, 2012 at 18:0

Build custom of EqualityComparer<List<int>>:

public class CusComparer : IEqualityComparer<List<int>>
{
    public bool Equals(List<int> x, List<int> y)
    {
        return x.SequenceEqual(y);
    }

    public int GetHashCode(List<int> obj)
    {
        int hashCode = 0;

        for (var index = 0; index < obj.Count; index++)
        {
            hashCode ^= new {Index = index, Item = obj[index]}.GetHashCode();
        }

        return hashCode;
    }
}

Then you can get the result by using Distinct with custom comparer method:

var result = my_list.Distinct(new CusComparer());

Edit:

Include the index into method GetHashCode to make sure different orders will not be equal

Thrum answered 8/10, 2012 at 15:37 Comment(3)

That hashcode will lead to a lot of collisions - e.g. {a,a} will collide for any a, {a,b} will collide with {b,a} as will any permutation... (Although it may be that you want permutations to collide, in which case, nice answer!) – Eskew 8/10, 2012 at 15:42

@Rawling: you are correct, waiting the answer form Tim's comment, I am trying to fix – Thrum 8/10, 2012 at 15:46

Here is a hash code generator I feel is better: return obj.Take(5).Aggregate(1, (current, item) => (current * 37) + item.GetHashCode()); First off, I wouldn't iterate the whole sequence. Hashes are only efficient if they are generated quickly; iterating the whole list defeats that purpose. The first 5 or so (edit that number as needed) should be good. If the first several are the same the lists are likely different. Next, a good general algorithm form combining N different hashes into one is to go through each, multiply the current a prime number, and then add the next hash. – Kwang 8/10, 2012 at 16:9

This simple program does what you want:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication6
{
    class Program
    {
        static void Main(string[] args)
        {
            List<List<int>> lists = new List<List<int>>();

            lists.Add(new List<int> { 1, 2, 3 });
            lists.Add(new List<int> { 1, 2, 3 });
            lists.Add(new List<int> { 9, 10, 11 });
            lists.Add(new List<int> { 1, 2, 3 });

            var distinct = lists.Select(x => new HashSet<int>(x))
                    .Distinct(HashSet<int>.CreateSetComparer());

            foreach (var list in distinct)
            {
                foreach (var v in list)
                {
                    Console.Write(v + " ");
                }

                Console.WriteLine();
            }
        }
    }
}

Aegisthus answered 8/10, 2012 at 15:40 Comment(2)

This is really be best answer as it makes full use of Linq to simplify the problem. – Memorial 8/10, 2012 at 18:0

Very fast solution. This is the correct answer. Thanks! +1 – Kwiatkowski 10/2, 2018 at 14:9

    var finalList = lists.GroupBy(x => String.Join(",", x))
                         .Select(x => x.First().ToList())
                         .ToList();

Mehalek answered 8/10, 2012 at 15:40 Comment(0)

You can use the LINQ Distinct overload that takes a comparer. The comparer should see if the lists are equal. Note that the default equals operations of lists won't do what you're really looking for, so the comparer will need to loop through each for you. Here's an example of such a comparer:

public class SequenceComparer<T> : IEqualityComparer<IEnumerable<T>>
{
    IEqualityComparer<T> itemComparer;
    public SequenceComparer()
    {
        this.itemComparer = EqualityComparer<T>.Default;
    }

    public SequenceComparer(IEqualityComparer<T> itemComparer)
    {
        this.itemComparer = itemComparer;
    }

    public bool Equals(IEnumerable<T> x, IEnumerable<T> y)
    {
        if (object.Equals(x, y))
            return true;
        if (x == null || y == null)
            return false;
        return x.SequenceEqual(y, itemComparer);
    }

    public int GetHashCode(IEnumerable<T> obj)
    {
        if (obj == null)
            return -1;
        int i = 0;
        return obj.Aggregate(0, (x, y) => x ^ new { Index = i++, ItemHash = itemComparer.GetHashCode(y) }.GetHashCode());
    }
}

Update: I got the idea of using an anonymous type to make a better hash from Cuong Le's answer, and I LINQ-ified it and made it work in my class.

Fulgor answered 8/10, 2012 at 15:31 Comment(5)

Note that T in List<T> has to implement IComparable. If T is a custom type, you'll have to do this yourself. – Allianora 8/10, 2012 at 15:32

@MichaelSallmen I've shown an implementation that optionally takes an IEqualityComparer<T> to specify how to compare T object. Good point, though. (the default equality comparer on a type without appropriate comparison interfaces defined will only check for reference equality) – Fulgor 8/10, 2012 at 15:47

@Kwang yes, indeed. It's just an example implementation, though. Writing a good one is not trivial (e.g. see Coung Le's better implementation, that still has issues). Perhaps multiplying each item's hash by the next larger prime number before XORing would be better? – Fulgor 8/10, 2012 at 15:49

@TimS. I would just leave a //TODO generate hash rather than putting what you have; at least that way readers will know they need to find a good algorithm on their own without thinking that this is fine. – Kwang 8/10, 2012 at 15:56

@Kwang I've replaced it with a good implementation. Otherwise, I'd agree with you. – Fulgor 8/10, 2012 at 16:6

For small sets of data, a comparer could be useful, but if you have 1000 or more List> then trying to compare them all could begin to take a long amount of time.

I suggest that you instead use your data to build a distinct tree. The building of the tree will be much faster and when you are done you can always bring your data back into your old data structure.

Berar answered 14/10, 2012 at 21:56 Comment(0)

I wanted to compare the performance of the answers of @Leniel Macaferi and @L.B as I wasn't sure which would be more performant, or if the difference would be significant. It turns out that the difference is very significant:

Method 1: 00:00:00.0976649 @Leniel Macaferi
Method 2: 00:00:32.0961650 @L.B

Here is the code I used to benchmark them:

public static void Main(string[] args)
        {
            var list = new List<List<int>> {new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11, 1}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6, 7}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11}};

            var sw1 = new Stopwatch();
            sw1.Start();

            for (var i = 0; i < 1_000_000; i++)
            {
                var distinct = list.Select(x => new HashSet<int>(x)).Distinct(HashSet<int>.CreateSetComparer());
            }

            sw1.Stop();
            Console.WriteLine($"Method 1: {sw1.Elapsed}");

            var sw2 = new Stopwatch();
            sw2.Start();
            for (var i = 0; i < 1_000_000; i++)
            {
                var distinct = list.GroupBy(a => string.Join(",", a)).Select(a => a.First()).ToList();

            }
            sw2.Stop();
            Console.WriteLine($"Method 2: {sw2.Elapsed}");

            Console.ReadKey();
        }

Kwiatkowski answered 10/2, 2018 at 14:26 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags