Optimizing ArrayList.removeAll
Asked Answered
F

2

9

A lot of people have said that ArrayList.removeAll is really slow with large size arrays.

This article provides two optimized solutions to the ArrayList.removeAll speed, but requires implementing them in the class itself, and cannot be used externally as a fix.

Is there any way to apply this sort of fix short of copying the ArrayList source code and using my own version of it?

Edit: I suppose I should add my need for this, as there is probably a way to do what I want without ArrayList.removeAll.

I have two lists of around 70,000 longs each. They are almost identical, but one list has a few more numbers that the second list doesn't have, and I want to find them. The only way I know to find them is to do first.removeAll(second) to find the difference. Is there another way?

Fulmer answered 24/7, 2011 at 0:39 Comment(3)
you can submit a bugreport and hope it gets addressed before the next releaseNd
It has already been submitted as a bug, and will be fixed, but I need a fix that will work now, and in non-updated vms.Fulmer
make your own LongArray then. this will even be more efficient then a ArrayList<Long> as you won't need to (un)box the valuesNd
C
9

What about using a data structure that has a much better removal time, such as HashSet or TreeSet? So the big reason to use an arraylist is due to the fast access time O(1) to access records. But if you are trying to set difference then maybe you should use sets. Just a thought.

Claret answered 24/7, 2011 at 2:35 Comment(4)
I guess I spent to much time trying to fix ArrayList rather than just using HashSet, which took no time at all to do what I wanted it to do. Thanks :DFulmer
It makes sense why ArrayList is slow. When you remove an item in the middle of an array, you need to copy all the values after it to fill in the gap.Mendel
@Sam Barnum Yes, but done right you need to copy each element at most once. Under the (quite realistic) assumption that lookup in the other collection is O(1), you get O(n) instead of O(n*n).Undulant
Sets if the passed collection is smaller than the current what does is check that every element of the set is in the passed collection. If the passed collection is a list that means iterating through it.Sallysallyann
G
1

You can create a subclass of ArrayList to optimize that method (and possibly others).

Greenberg answered 24/7, 2011 at 0:48 Comment(2)
I considered that, but the optimized methods have to have access to private fields, so I can't just subclass it.Fulmer
I'm convinced that you could devise a reasonably efficient implementation using just existing ArrayList methods in your new optimized removeAll method (if you copy smartly). The performance problem is most probably that removeAll calls remove in a loop, resulting in potentially O(n^2) runtimeBerkeleianism

© 2022 - 2024 — McMap. All rights reserved.