ArrayList remove vs removeAll
Asked Answered
D

2

6

What is better to use, if i want to remove a collection from an arraylist? I think the removeAll method in ArrayList is written for this task, but in a test I wrote, just iterating through the objects and removing them individual was a few seconds faster.

What are you using for this purpose?

edit:

the code of removeAll i found on grepcode calls batchRemove (c, false):

private boolean More ...batchRemove(Collection c, boolean complement) {

700         final Object[] elementData = this.elementData;
701         int r = 0, w = 0;
702         boolean modified = false;
703         try {
704             for (; r < size; r++)
705                 if (c.contains(elementData[r]) == complement)
706                     elementData[w++] = elementData[r];
707         } finally {
708             // Preserve behavioral compatibility with AbstractCollection,
709             // even if c.contains() throws.
710             if (r != size) {
711                 System.arraycopy(elementData, r,
712                                  elementData, w,
713                                  size - r);
714                 w += size - r;
715             }
716             if (w != size) {
717                 // clear to let GC do its work
718                 for (int i = w; i < size; i++)
719                     elementData[i] = null;
720                 modCount += size - w;
721                 size = w;
722                 modified = true;
723             }
724         }
725         return modified;
726     }

i actually dont understand it..

my test code was this:

public class RemoveVsRemovall {

    public static void main(String[] args){
        ArrayList<String> source = new ArrayList<>();
        ArrayList<String> toRemove = new ArrayList<>();
        for(int i = 0; i < 30000; i++){
            String s = String.valueOf(System.nanoTime());
            source.add(s);
            if(i % 2 == 0) toRemove.add(s);
        }
        long startTime = System.nanoTime();
        removeList1(source, toRemove);
        long endTime = System.nanoTime();
        System.out.println("diff: " + (endTime - startTime) * 1e-9);
    }

    static void removeList1(ArrayList<String> source, ArrayList<String> toRemove){
        source.removeAll(toRemove);
    }

    static void removeList2(ArrayList<String> source, ArrayList<String> toRemove){
        for(String s : toRemove){
            source.remove(s);
        }
    }
}

calling it a few times with different list sizes and switching beteween the two methods.

Draughts answered 1/3, 2015 at 12:50 Comment(10)
I expect that there was a flaw in your test. Show us your test code. (I find it hard to believe that there is really a significant difference in performance. And writing benchmarks that give accurate results in Java is rather hard.)Avion
Why don't you look into the code for the remove and removeAll methods? Nevertheless, this question doesn't deserve a downvote. +1 from me. There are worse questions on SO with 200 + upvotes than this one..Karyotype
@bot and may I ask where to progress is?Chelton
@Chelton I am not sure what you mean.Karyotype
This questions somewhat discusses removeAll. Complexity of O(n^2). Maybe the use of an Iterator makes it a bit slower, but not sure you can do really betterTrabzon
@bot I find it unjust to promote a question based on the fact there are apparently inferior posts but rated significantly better; different times, different scopes - unfair judgement imo. Then again, this is just one side of the view.Chelton
@Chelton Fair enough. Even if we don't compare this question to other questions, a little tolerance before downvoting is not much to ask for. This question was downvoted the second it was posted.Karyotype
As I suspected, the benchmark code is flawed. You are not "warming" up the JVM properly, and this probably affecting the removeAll case more than the remove case.Avion
@StephenC, can you please explain what "warming up the JVM" means? Are you referring to JIT?Thermodynamic
@Thermodynamic - Read this: #504603Avion
R
4

There are several reason it's hard to give a general answer to that question.

First, you have to understand that these performance characteristics are implementation-dependent. It's quite possible that the implementation varies depending on the platform and version of the JDK.

Having said that, there are mostly 2 strategies for implementing removeAll:

  1. For each element of your ArrayList, check if it is in the other Collection; if so, remove it.
  2. For each element of the Collection, check if it is in the ArrayList; if so, remove it.

If the Collection performs contains in constant-time, strategy 1 (asymptotically) wins. On the other hand, if contains is performed by scanning the whole connection and Collection iterates very slowly, strategy 2 generally has an edge, because it iterates on Collection only once; but even in that case, if the Collection is very big and most of the elements of the ArrayList are among the first elements of the Collection, strategy 1 wins again... there's no end to it.

You're probably better off trusting the implementation of removeAll(); if that fails, try changing data structures; and if that fails too, implement your own method from empirical benchmarks.

Redblooded answered 1/3, 2015 at 13:9 Comment(0)
T
2

Another thing to consider :

The code of Java has been battletested for ages, and is written so as to adapt to a lot of different and special cases (see the comment Preserve behavioral compatibility with AbstractCollection).

So, actually it's likely you can write your own implementation of the methods, that will run faster. But on the other hand, are you sure you can treat all the special cases the Java devs have been confronted to since the birth of Java ?

Also take into consideration that some Java functions might be using some C implementation to speed things up. This is apparently not the case here, but it could.

Trabzon answered 1/3, 2015 at 13:17 Comment(2)
so you're recommending using removeAll ?Draughts
Unless you really care about optimal performance, and you know you only have to deal with specific datasets, that would behave similarly to your benchmark code (hell I'm not even sure it's possible to check that easily), yes.Trabzon

© 2022 - 2024 — McMap. All rights reserved.