Performance of traditional for loop vs Iterator/foreach in Java
Asked Answered
K

9

63

Is there any performance testing results available in comparing traditional for loop vs Iterator while traversing a ArrayList,HashMap and other collections?

Or simply why should I use Iterator over for loop or vice versa?

Kelson answered 10/12, 2009 at 7:39 Comment(2)
Note that the Reason a for loop is slower with a linked list, is that each call to get(i) iterates from the head of the list i times. I'm sure that is intuitively obvious to everyone else here, but it took me a minute to figure out the why of it.Dickman
@Kelson InsightfulTorchier
T
91

Assuming this is what you meant:

// traditional for loop
for (int i = 0; i < collection.size(); i++) {
  T obj = collection.get(i);
  // snip
}

// using iterator
Iterator<T> iter = collection.iterator();
while (iter.hasNext()) {
  T obj = iter.next();
  // snip
}

// using iterator internally (confirm it yourself using javap -c)
for (T obj : collection) {
   // snip
}

Iterator is faster for collections with no random access (e.g. TreeSet, HashMap, LinkedList). For arrays and ArrayLists, performance differences should be negligible.

Edit: I believe that micro-benchmarking is root of pretty much evil, just like early optimization. But then again, I think it's good to have a feeling for the implications of such quite trivial things. Hence I've run a small test:

  • iterate over a LinkedList and an ArrayList respecively
  • with 100,000 "random" strings
  • summing up their length (just something to avoid that compiler optimizes away the whole loop)
  • using all 3 loop styles (iterator, for each, for with counter)

Results are similar for all but "for with counter" with LinkedList. All the other five took less than 20 milliseconds to iterate over the whole list. Using list.get(i) on a LinkedList 100,000 times took more than 2 minutes (!) to complete (60,000 times slower). Wow! :) Hence it's best to use an iterator (explicitly or implicitly using for each), especially if you don't know what type and size of list your dealing with.

Tess answered 10/12, 2009 at 7:48 Comment(6)
Your LinkedList result shows what happens when you go from O(n) to O(n^2) (or more)Warhead
All the other five took less than 20 milliseconds to iterate over the whole list looks like the JVM dead-code optimization kicked in... The difference between iteration of LinkedList and ArrayList are significant (in favor of the ArrayList)Gullet
@Gullet no, it certainly didn't. I've generated 100,000 random strings (actually UUIDs) and summed their lengths which was printed to stdout after the loop. Sure, UUIDs have the same length which makes the output predictable, but the compiler isn't that smart. Believe it or not, but a modern CPU can do that in 20 ms. To give another perspective: my CPU has 4,000 BogoMips per core. So we're talking about billions of instructions per s or millions per ms. Thus, iterating over 100,000 strings with several millions of instructions is feasible. CPUs are faster than most developers think :)Tess
summing up it's a viable option and the compiler won't optimize anything (besides prefetching like mad). The case would fit perfectly into the L2 cache too (even w/ LinkedList). If not all the elements are added consequentially, going out of the L2 cache would have more effect on the LinkedList.Gullet
what about mixed way? )) Iterator<T> iter = collection.iterator(); int l = collection.size(); for (int i = 0, i < l; i++) { T obj = iter.next(); // snip }Filmore
@Filmore performance is basically the same as using iterator as hasNext() shouldn't be a heavy operation regardless of actual implementation. It should certainly be considered bad use of the API though.Tess
W
24

The first reason to use an iterator is obvious correctness. If you use a manual index, there may be very innocuous off-by-one errors that you can only see if you look very closely: did you start at 1 or at 0? Did you finish at length - 1? Did you use < or <=? If you use an iterator, it is much easier to see that it is really iterating the whole array. "Say what you do, do what you say."

The second reason is uniform access to different data structures. An array can be accessed efficiently through an index, but a linked list is best traversed by remembering the last element accessed (otherwise you get a "Shlemiel the painter"). A hashmap is even more complicated. By providing a uniform interface from these and other data structures (e.g., you can also do tree traversals), you get obvious correctness again. The traversing logic has to be implemented only once, and the code using it can concisely "say what it does, and do what it says."

Warnke answered 10/12, 2009 at 8:16 Comment(0)
S
6

Performance is similar in most cases.

However, whenever a code receives a List, and loops on it, there is well-known case:
the Iterator is way better for all List implementations that do not implement RandomAccess (example: LinkedList).

The reason is that for these lists, accessing an element by index is not a constant time operation.

So you can also consider the Iterator as more robust (to implementation details).


As always, performance should not be hide readability issues.
The java5 foreach loop is a big hit on that aspect :-)

Shockley answered 10/12, 2009 at 7:50 Comment(4)
Thanks but what about ArrayList?Kelson
ArrayList implements RandomAccess, hence list.get(i) is fast. performance differences should be pretty much negligible.Tess
Note: While I don't know if the LinkedList in the JDK is written in such a way, it would be trivial to write a LinkedList implementation where a traditional for loop would perform as fast as random access. All that would be need would be to keep an internal pointer to the last element where random access to requested. This seems like such a trivial implementation which would speed up so many pieces of code that I can't image it not being in there.Koontz
@tster: actually that's exactly what the iterator does.Tess
T
3

Yes, it does make a difference on collections which are not random access based like LinkedList. A linked list internally is implemented by nodes pointing to the next(starting at a head node).

The get(i) method in a linked list starts from the head node and navigates through the links all the way to the i'th node. When you iterate on the linked list using a traditional for loop, you start again from the head node each time, thus the overall traversal becomes quadratic time.

for( int i = 0; i< list.size(); i++ ) {
    list.get(i); //this starts everytime from the head node instead of previous node
}

While the for each loop iterates over the iterator obtained from the linked list and calls its next() method. The iterator maintains the states of the last access and thus does not start all the way from head everytime.

for( Object item: list ) {
    //item element is obtained from the iterator's next method.
}
Tripody answered 5/2, 2015 at 12:19 Comment(0)
C
1

Use JAD or JD-GUI against your generated code, and you will see that there is no real difference. The advantage of the new iterator form is that it looks cleaner in your codebase.

Edit: I see from the other answers that you actually meant the difference between using get(i) versus an iterator. I took the original question to mean the difference between the old and new ways of using the iterator.

Using get(i) and maintaining your own counter, especially for the List classes is not a good idea, for the reasons mentioned in the accepted answer.

Carioca answered 10/12, 2009 at 7:44 Comment(0)
F
1

One of the best reasons to use an iterator over the i++ syntax is that not all data structures will support random access let alone have it perform well. You should also be programming to the list or collection interface so that if you later decided that another data structure would be more efficient you'd be able to swap it out without massive surgery. In that case (the case of coding to an interface) you won't necessarily know the implementation details and it's probably wiser to defer that to the data structure itself.

Fioritura answered 10/12, 2009 at 8:2 Comment(0)
R
1

One of the reasons I've learned to stick with the for each is that it simplifies nested loops, especially over 2+ dimensional loops. All the i's, j's, and k's that you may end up manipulating can get confusing very quickly.

Renteria answered 10/12, 2009 at 8:7 Comment(0)
S
1

I don't believe that

for (T obj : collection) {

calculates .size() each time thru the loop and is therefore faster than

for (int i = 0; i < collection.size(); i++) {
Shaddock answered 12/12, 2009 at 6:13 Comment(2)
Easily remedied with for (int i = 0, l = collection.size(); i < l; i++) {Urbane
the first one obtains the collections iterator by calling collection.iterator() method and then iterates by calling iterator's next() and hasNext() method.Tripody
K
0

+1 to what sfussenegger said. FYI, whether you use an explicit iterator or an implicit one (i.e. for each) won't make a performance difference because they compile to the same byte code.

Kandrakandy answered 10/12, 2009 at 10:38 Comment(2)
They do not get compiled to same byte code. The forEach loop iterates over an iterable and obtains iterator which iterates through the list. For linked list get(i) method starts from the first node, traverses all the way and returns the object. So if you are using i=1 to 5 each time it starts from beginning. see my answer below.Tripody
My answer was comparing forEach to explicitly using an Iterator, not comparing it to a traditional for loop using index variables. docs.oracle.com/javase/specs/jls/se7/html/…Kandrakandy

© 2022 - 2024 — McMap. All rights reserved.