Why doesn't the String class in Java implement Iterable?
Asked Answered
C

8

59

Many Java framework classes implement Iterable, however String does not. It makes sense to iterate over characters in a String, just as one can iterate over items in a regular array.

Is there a reason why String does not implement Iterable?

Cheri answered 5/5, 2010 at 10:57 Comment(7)
Wheres the problem to iterate through the string's char Array? (strInput.ToCharArray)Cocci
Tim: String#toCharArray creates an array with a copy of the String's characters. Even if it works, it imposes unnecessary overhead just to iterate over the characters.Promiscuity
@jambjo Iterator<Character> would be less overhead???Latisha
@Tom: Depending on the situation Iterator<Character> could have a MUCH smaller overhead than toCharArrayCarmarthenshire
@Carmarthenshire No, I don't think that is reasonable. An iterator optimised for generating long sequences of the same character?Latisha
@Tom: If the iterator would use autoboxing (e.g. Character c = 'c'), the resulting code would use Character.valueOf('c'), which according to the Java docs should use a cache instead of creating new instances for all characters. In Suns VM, Character instances are cached for all chars with a value <= 127.Promiscuity
@Tom: As I said it depends on the situation: If you have a long string and use the enumerator to only get a few entries it would be MUCH better. Extreme sample: E.g. if you had a 1GB string and used an enumerator to get the first 100 chars 100 times, then your would have basically 10,000 accesses in the enumerator case, but when using toCharArray you would have 100 copies of the string which alone result in 5,000,000,000 accesses and you still need the iteration so it would be 10,000 vs 5,000,010,000. Pretty clear which is better, isn't it (and yes, this is a constructed extreme case)Carmarthenshire
R
31

There really isn't a good answer. An iterator in Java specifically applies to a collection of discrete items (objects). You would think that a String, which implements CharSequence, should be a "collection" of discrete characters. Instead, it is treated as a single entity that happens to consist of characters.

In Java, it seems that iterators are only really applied to collections and not to a string. There is no reason why it is this way (near as I can tell - you would probably have to talk to Gosling or the API writers); it appears to be convention or a design decision. Indeed, there is nothing preventing CharSequence from implementing Iterable.

That said, you can iterate over the characters in a string like so:

for (int i = 0; i < str.length(); i++) {
  System.out.println(str.charAt(i));
}

Or:

for(char c : str.toCharArray()) {
  System.out.println(c);
}

Or:

"Java 8".chars().forEach(System.out::println);

Also note that you cannot modify a character of a String in place because Strings are immutable. The mutable companion to a String is StringBuilder (or the older StringBuffer).

EDIT

To clarify based on the comments on this answer. I'm trying to explain a possible rationale as to why there is no Iterator on a String. I'm not trying to say that it's not possible; indeed I think it would make sense for CharSequence to implement Iterable.

String provides CharSequence, which, if only conceptually, is different from a String. A String is usually thought of as a single entity, whereas CharSequence is exactly that: a sequence of characters. It would make sense to have an iterator on a sequence of characters (i.e., on CharSequence), but not simply on a String itself.

As Foxfire has rightly pointed out in the comments, String implements the CharSequence interface, so type-wise, a String is a CharSequence. Semantically, it seems to me that they are two separate things - I'm probably being pedantic here, but when I think of a String I usually think of it as a single entity that happens to consist of characters. Consider the difference between the sequence of digits 1, 2, 3, 4 and the number 1234. Now consider the difference between the string abcd and the sequence of characters a, b, c, d. I'm trying to point out this difference.

In my opinion, asking why String doesn't have an iterator is like asking why Integer doesn't have an iterator so that you can iterate over the individual digits.

Rasmussen answered 5/5, 2010 at 11:2 Comment(14)
Surely treating a string as a collection of letters isn't entirely without precedent, and to argue it on a "makes sense" case seems a little spurious.Walters
@Walters that's true - I was actually at a loss for words - I think I wanted to say "it doesn't make sense in some cases" or even "it doesn't make sense in most cases" considering what iterators really are. I will edit my answer.Rasmussen
"A String is not really a "collection" of discrete characters.". Well it is. In fact it even implements CharSequence, which is exactly that: An orderd collection of discrete characters!Carmarthenshire
@Vivin: there is no specific implication that Iterator must act on a collection. Infinite iterators seem to be acceptable in the right context.Sixpence
@Foxfire, agreed - but a String by itself is not a CharSequence. A CharSequence is a sequence of characters that is created from a String. It would make sense to have an iterator on a CharSequence but not on just the String itself.Rasmussen
@Sixpence I am not saying that you can't have an Iterator on a string. I'm only trying to explain why. You can have an iterator on anything you want. The question is if it makes sense.Rasmussen
You can do foreach (char c in s) in C#, just beautiful!Solecism
@Vivin: CharSequence is an INTERFACE (exactly as Iterable). So it is the String itself implementing the interface. It is not created from the String.Carmarthenshire
@Foxfire, point noted. I realize I may be pedantic here, but to me a String and a CharSequence are two separate things.Rasmussen
@Vivin: Then imho you should just try to answer the original question as: "Why does CharSequence not implement Iterable". (Which of course technically still means "Why does String not implement Iterable")Carmarthenshire
@Foxfire, indeed. It would make sense to have it on CharSequence imho (I've alluded to that in my answer). If the CharSequence interface specified an iterator, I think it would make more sense rather than String having it. Thanks for the fruitful discussion :)Rasmussen
I don't think this was pointed out so far, but using Iterable<Char> would not be efficient. Since generics only exist at compile time, Iterable<Char> get's compiled down to Iterable (or in another sense, Iterable<Object>). Creating a new Char for each item in large string would get quite ridiculous (O(n))Collazo
Don't quite agree with the assertion that a String should only be thought of as a single entity in the same sense as the number 1234. If it were so, there wouldn't be a method charAt(). There is no such method in class Integer for instance. charAt() shows that the String is indeed, even conceptually, a CharSequence. So except for efficiency reasons, it should implement Iterable.Idiotic
@Idiotic Since I've written this, I've come to change my mind regarding that assertion as well.Rasmussen
C
13

The reason is simple: The string class is much older than Iterable.

And obviously nobody ever wanted to add the interface to String (which is somewhat strange because it does implement CharSequence which is based on exactly the same idea).

However it would be somewhat imperformant because Iterable returns an object. So it would have to Wrap every Char returned.

Edit: Just as comparison: .Net does support enumerating on String, however in .Net Iterable also works on native types so there is no wrapping required as it would be required in Java.

Carmarthenshire answered 5/5, 2010 at 11:11 Comment(7)
"adding Iterable to String class makes it imperformant", makes sense; but nobody added Itreable to String class just because it was old, seems a bit odd. can you please explain some more?Gremial
String existed long before Iterable. So you would have to add the interface later. While that is possible it may - in some corner cases - be a breaking change. And taking into consideration how often String is used this might have been something considered risky. This is just guessing. I have no knowledge if these considerations were really affecting that decision. But it seems most likely.Carmarthenshire
I can't see adding Iterable (or any type) to String as being a breaking change. It's not like you can subclass String (thank god).Latisha
@Tom: Surely in 99.9% of the cases it won't be. But it is easy enough to construct cases (e.g. reflecting on the interfaces) where it could break. Taking into account that basically EVERY application uses String somewhere that still might be a reason.Carmarthenshire
Any code like that which gets broken, deserves to be broken. I think I am safe in saying it is not a reason brought into consideration.Latisha
Your main reason "string class is much older than Iterable" is not correct. Prior to Java 1.2 there was a Vector class, almost the same as ArrayList. Java 1.2 introduced the Collections framework, and Vector was backfitted into this framework (it was made to implement List). They added methods to it to implement the interface, without breaking its legacy API.Lustrate
Rather, I guess it's a design decision: in Java, String is treated almost like a primitive type -- not a collection (unlike Python, for example).Lustrate
A
13

For what it's worth, my coworker Josh Bloch strongly wishes to add this feature to Java 7:

for (char c : aString) { ... }

and

for (int codePoint : aString) { ... }

This would be the easiest way to loop over chars and over logical characters (code points) ever. It wouldn't require making String implement Iterable, which would force boxing to happen.

Without that language feature, there's not going to be a really good answer to this problem. And he seems very optimistic that he can get this to happen, but I'm not sure.

Advise answered 5/5, 2010 at 22:56 Comment(5)
Too bad that did not make it into Java 7’s project coin.Jointer
If they were someday planed to do so, make sure it works for any objects inherits CharSequence rather than implemented for String only.Incunabula
@Jointer Neither into Java 8 or 9... RIP.Batson
@JoaaoVerona Not directly, as in having CharSequence or String implement Iterable. But Java 8 extended the CharSequence interface with (default) methods chars() and codePoints() which return an IntStream. That interface has a forEach(IntConsumer action) method which is the next best thing. You can write "test".chars().forEach(c -> ...) and it wouldn't be very different from a for loop. I suspect one reason for not having String or CharSequence implement Iterable is that you can iterate over its characters or its code points. An important distinction.Artur
You worked with Josh Bloch?!?!!?!Clarendon
O
2

One of the main reasons for making String implement Iterable is to enable the simple for(each) loop, as mentioned above. So, a reason for not making String implement Iterable could be the inherent inefficiency of a naïve implementation, since it requires boxing the result. However, if the implementation of the resulting Iterator (as returned by String.iterator()) is final, the compiler could special-case it and generate byte-code free from boxing/unboxing.

Onega answered 4/6, 2010 at 7:4 Comment(0)
L
1

If you are really instrested in iterating here:

String str = "StackOverflow";

for (char c: str.toCharArray()){
     //here you go
}
Lorenzalorenzana answered 5/5, 2010 at 11:4 Comment(4)
-1 Sorry, but I don't see what this answer has to do with the question asked.Promiscuity
A problem might be that toCharArray creates a new array. So this is VERY inefficient.Carmarthenshire
@Helper: String is immutable. However the returned Array is not. And changinig the Array must not affect the String. So it DOES make a complete copy.Carmarthenshire
+1 - For small strings, creating a char[] is roughtly as expensive as creating an Iterator - it's an object allocation (and a small amount of memory initialization and copy). As the strings become longer then the memory initialize/copy overhead becomes significant, but still nowhere near as significant as boxing each character.Prakrit
J
1

They simply forgot to do so.

Jointer answered 26/5, 2010 at 10:19 Comment(1)
Do you have any evidence for this assertion? It seems more likely to me that it is because String predates the Iterable interface (Strings presumable date back to Java 1.0, Iterable dates back to Java 1.5), and once the language specifiers had gotten used to not treating String as one of the collections, they continued to treat it that way.Impedance
C
0

I'm not sure why this is still not implemented in 2020, my guess would be that Strings are given a lot of special treatment in Java (with compiler overloading the + operator for string concatenation, string literals, string constants stored in a common pool, etc.) that this feature might be harder to implement than it looks (or it might mess up with too many things to be worth the effort from the implementers' point of view).

On the other hand, implementing something close to this is not too much work. I wanted this in one of my project, so I wrote these simple classes:

public class CharIterable implements Iterable<Character> {
  public CharIterable(CharSequence seq) {
    this.seq = seq;
  }

  @Override
  public Iterator<Character> iterator() {
    return new CharIterator(seq);
  }

  private final CharSequence seq;
}

public class CharIterator implements Iterator<Character> {
  public CharIterator(CharSequence sequence) {
    this.sequence = sequence;
  }

  @Override
  public synchronized boolean hasNext() {
    return position < sequence.length();
  }

  @Override
  public synchronized Character next() {
    return sequence.charAt(position++);
  }

  /**
   * Character sequence to iterate over
   */
  private final CharSequence sequence;

  /**
   * Current position of iterator which is the position of the item that
   * will be returned by {@link #next()}.
   */
  private int position = 0;
}

With these I can do this:

for (Character c: new CharIterable("This is a test")) {
  \\ do something with c
}

Now this looks like a lot for such a simple thing but it then allows strings to be treated like an iterable array of characters and work transparently with methods designed to work on collection of things (lists, sets, etc.).

Cupriferous answered 14/11, 2020 at 6:38 Comment(0)
L
-2

Iterable of what? Iterable<Integer> would make most sense, where each element represents a Unicode codepoint. Even Iterable<Character> would be slow and pointless when we have toCharArray.

Latisha answered 5/5, 2010 at 11:59 Comment(3)
I know it's late but toCharArray always copys the whole string. If you only ever iterate over a small part of a long string toCharArray is a greater overhead than autoboxing (that might get optimized away anyway).Stantonstanway
@RedCrafterLP I would be very surprised if there were an interesting collection of cases where you are iterating over a very small head of a large string. And if there were, you would do something other than creating an Iterator<Integer> or Iterator<Character>.Latisha
@ Tom Hawtin for example you want to interpret the first few characters of a string to interpret a command. Using String.split is pretty wasteful. It would be arguably better to provide a way to split of string slices of strings without copying. Something that can be done for example in C# with ReadOnlySpan.Stantonstanway

© 2022 - 2024 — McMap. All rights reserved.