Most efficient way to remove multiple items from a IList<T>
Asked Answered
D

4

14

What is the most efficient way to remove multiple items from an IList<T> object. Suppose I have an IEnumerable<T> of all the items I want to remove, in the same order of occurrence that in the original list.

The only way I have in mind is:

IList<T> items;
IEnumerable<T> itemsToDelete;
...

foreach (var x in itemsToDelete)
{
    items.Remove(x);
}

But I guess it's not efficient, because it has to go over the list from the beggining every time the method Remove is called.

Douceur answered 2/8, 2013 at 23:23 Comment(7)
Have you profiled your code? how much performance improvement do you need?Salami
As John Skeet says: Downvoter, care to comment?Aplacental
It because I work with many lists, and I will have to do it many times.Aplacental
Computer's can usually do things "many times" very quickly. Are you sure there is even a performance issue here?Panada
Is items in order? Is itemsToDelete in order? Have you considered if you can use an ISet for fast exclusions? Have you benchmarked creating a new list instead of editing one in place with var set = new HashSet<T>(itemsToDelete); var newList = items.Where(i => !set.Contains(i)).ToList();? Have you benchmarked at all?Elka
Both could be not ordered, but both have the same order of appearance. I can't create a new List, because I don't know if the IList<T> instance is actually a List<T>, and maybe the underlying libraries that use it has a custom implementation of IList<T>, not List<T>. And no, I didn't have benchmarked at all. Could not be a significant performance issue, but as some teacher once said me: "It's just for the sake of my own soul".Aplacental
but as some teacher once said me: "It's just for the sake of my own soul" Yes and a wiser teacher said "Premature optimization is the root of all evil".Elka
T
13

As the number of items to remove gets larger, you will probably find traversing the list and checking each item against a hashset of "items to remove" is more efficient. An extension method like this might help:

static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove)
{
    var set = new HashSet<T>(itemsToRemove);

    var list = iList as List<T>;
    if (list == null)
    {
        int i = 0;
        while (i < iList.Count)
        {
            if (set.Contains(iList[i])) iList.RemoveAt(i);
            else i++;
        }
    }
    else
    {
        list.RemoveAll(set.Contains);
    }
}

I benchmarked using this little program below. (Note that it uses an optimized path if IList<T> is actually a List<T>.)

On my machine (and using my test data), this extention method took 1.5 seconds to execute vs 17 seconds for the code in your question. However, I have not tested with different sizes of data. I'm sure for removing just a couple of items RemoveAll2 will be faster.

static class Program
{
    static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove)
    {
        var set = new HashSet<T>(itemsToRemove);

        var list = iList as List<T>;
        if (list == null)
        {
            int i = 0;
            while (i < iList.Count)
            {
                if (set.Contains(iList[i])) iList.RemoveAt(i);
                else i++;
            }
        }
        else
        {
            list.RemoveAll(set.Contains);
        }
    }

    static void RemoveAll2<T>(this IList<T> list, IEnumerable<T> itemsToRemove)
    {
        foreach (var item in itemsToRemove)
            list.Remove(item);
    }

    static void Main(string[] args)
    {
        var list = Enumerable.Range(0, 10000).ToList();
        var toRemove = new[] { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 
                              43,  47,  53,  59,  61,  67,  71,  73,  79,  83,  89,  97, 101,
                             103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167,
                             173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239,
                             241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313,
                             317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397,
                             401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467,
                             479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569,
                             571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643,
                             647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733,
                             739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823,
                             827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
                             919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997};
        list.RemoveAll(toRemove); // JIT 
        //list.RemoveAll2(toRemove); // JIT 

        var sw = Stopwatch.StartNew();
        for (int i = 0; i < 10000; i++)
        {
            list.RemoveAll(toRemove);
            //list.RemoveAll2(toRemove);
        }
        sw.Stop();
        Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
        Console.ReadKey();
    }
}

UPDATE (for @KarmaEDV's & Mark Sowul's comments below): If you need to use a custom equality comparer, the extension method could have an overload that takes such a comparer:

public static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove, IEqualityComparer<T> comparer = null)
{
    var set = new HashSet<T>(itemsToRemove, comparer ?? EqualityComparer<T>.Default);

    if (iList is List<T>)
    {
        list.RemoveAll(set.Contains);
    }
    else
    {
        int i = iList.Count - 1;
        while (i > -1)
        {
            if (set.Contains(iList[i])) iList.RemoveAt(i);
            else i--;
        }
    }
}
Tandratandy answered 3/8, 2013 at 2:17 Comment(6)
I would recommend iterating through the list in the opposite order. If you remove the first item from the list, you have to shift all the remaining items. It's more efficient to remove from back to front.Diver
This solution is great, but the optimized List-Path doesn't work reliably if the List overrides the Contains Method.Soper
@Soper could you maybe clarify? List<T>.RemoveAll doesn't call into Contains, and you can't override Contains since it's not virtual.Prole
@ErenErsönmez Im not sure what you mean with "doesnt call into Contains". Yes, sorry, not override, but overload. IEnumerable<T>.Contains(item, customComparer) is an overload if you need a custom comparer. If you use that you need to slightly adapt the proposed solution.Soper
@Soper If understand you, yes, you would need a slight modification in case you need to use a custom comparer. I've posted such a variant in the answer.Prole
Small improvement: Run the list form behind, this saves the most copy operations when multiple elements are removed.Isidore
P
7

If the IList<T> reference happens to refer to an instance of List<T>, casting to that type and using RemoveAll is apt to yield better performance than any other approach that doesn't rely upon the particulars of its implementation.

Otherwise, while the optimal approach will depend upon the relative fraction of items that are going to be removed and the nature of the IList<T>, I would suggest that your best bet might be to copy the IList<T> to a new List<T>, clear it, and selectively re-add items. Even if the items in the list are not conducive to efficient hashing, the fact that the items in the IEnumerable<T> are in the same sequence as those in the IList<T> would render that irrelevant. Start by reading an item from the IEnumerable<T>. Then copy items from the array to the list until that one is found. Then read the next item from the IEnumerable<T> and copy from the array to the list until that one is found, etc. Once the IEnumerable<T> is exhausted, copy the balance of the array to the List<T>.

This approach will be fast with many implementations of IList<T>. It has one major disadvantage, though: the fact that it deletes and re-adds each item might have unwanted side-effects on things like observable lists. If a list might be observable, one may have to use a much slower N^2 algorithm to ensure correctness. [BTW, it irks me that IList<T> has a Remove(T) method but lacks a much more useful RemoveAll(Func<T,bool>) method. The Remove(T) is largely redundant with IndexOf and RemoveAt, while RemoveAll would allow O(N) implementations of many operations that are O(N^2) in its absence if one isn't allowed to remove and re-add items.

Parcenary answered 2/8, 2013 at 23:44 Comment(0)
D
1

Maybe this helps. Other ideas of the same type could be included.

IList<T> items;

IEnumerable<T> itemsToDelete;
...
{
   if(items.Equals(itemsToDelete)) //Equal lists?
     {
      items.Clear(); 
      return true;
     }


   if(  (double) items.Count/itemsToDelete.Count < 1){
      /* It is faster to iterate the small list first. */ 
              foreach (var x in items)
              {
                if(itemsToDelete.Contains(x)){/**/} 

              }
    }
   else{
           foreach (var x in itemsToDelete)
              {
               items.Remove(x);
              }
   }
}
Dereism answered 2/8, 2013 at 23:55 Comment(2)
You might want to try comparing items.Count < itemsToDelete.Count() rather than using doubles. Also it avoids a compiler error.Elka
I didn't put the explanation: the division is in case you want a percentage close: you could replace the 1 for .9 for 90%.Dereism
M
0

This problem would be easier to solve if there was available an extension method RemoveAll for the IList<T> interface. So here is one:

/// <summary>
/// Removes all the elements that match the conditions defined by the
/// specified predicate.
/// </summary>
public static int RemoveAll<T>(this IList<T> list, Func<T, int, bool> predicate)
{
    ArgumentNullException.ThrowIfNull(list);
    ArgumentNullException.ThrowIfNull(predicate);

    int i = 0, j = 0;
    try
    {
        for (; i < list.Count; i++)
        {
            if (predicate(list[i], i)) continue;
            if (j < i) list[j] = list[i];
            j++;
        }
    }
    finally
    {
        if (j < i)
        {
            for (; i < list.Count; i++, j++)
                list[j] = list[i];
            while (list.Count > j)
                list.RemoveAt(list.Count - 1);
        }
    }
    return i - j;
}

This is a modified version of a custom List<T>.RemoveAll implementation that is found in this answer. Because of the absence of the RemoveRange method in the IList<T> interface, the rightmost leftover slots in the IList<T> are cleared with repeated removals of the last element. This should be a pretty fast operation in most IList<T> implementations.

Now the original problem of removing multiple items from a IList<T> can be solved efficiently like this:

IList<T> items;
IEnumerable<T> itemsToDelete;
//...

HashSet<T> itemsToDeleteSet = new(itemsToDelete);
items.RemoveAll((x, _) => itemsToDeleteSet.Contains(x));

Online demo.

Moussaka answered 7/1, 2023 at 19:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.