Your original code is fine and I do not recommend changing it.
The original version allocates once: inside String::with_capacity
.
The second version allocates at least twice: first, it creates a Vec<&str>
and grows it by push
ing &str
s onto it. Then, it counts the total size of all the &str
s and creates a new String
with the correct size. (The code for this is in the join_generic_copy
method in str.rs
.) This is bad for several reasons:
- It allocates unnecessarily, obviously.
- Grapheme clusters can be arbitrarily large, so the intermediate
Vec
can't be usefully sized in advance -- it just starts at size 1 and grows from there.
- For typical strings, it allocates way more space than would actually be needed just to store the end result, because
&str
is usually 16 bytes in size while a UTF-8 grapheme cluster is typically much less than that.
- It wastes time iterating over the intermediate
Vec
to get the final size where you could just take it from the original &str
.
On top of all this, I wouldn't even consider this version idiomatic, because it collect
s into a temporary Vec
in order to iterate over it, instead of just collect
ing the original iterator, as you had in an earlier version of your answer. This version fixes problem #3 and makes #4 irrelevant but doesn't satisfactorily address #2:
input.graphemes(true).rev().collect()
collect
uses FromIterator
for String
, which will try to use the lower bound of the size_hint
from the Iterator
implementation for Graphemes
. However, as I mentioned earlier, extended grapheme clusters can be arbitrarily long, so the lower bound can't be any greater than 1. Worse, &str
s may be empty, so FromIterator<&str>
for String
doesn't know anything about the size of the result in bytes. This code just creates an empty String
and calls push_str
on it repeatedly.
Which, to be clear, is not bad! String
has a growth strategy that guarantees amortized O(1) insertion, so if you have mostly tiny strings that won't need to be reallocated often, or you don't believe the cost of allocation is a bottleneck, using collect::<String>()
here may be justified if you find it more readable and easier to reason about.
Let's go back to your original code.
let mut result = String::with_capacity(input.len());
for gc in input.graphemes(true).rev() {
result.push_str(gc);
}
This is idiomatic. collect
is also idiomatic, but all collect
does is basically the above, with a less accurate initial capacity. Since collect
doesn't do what you want, it's not unidiomatic to write the code yourself.
There is a slightly more concise, iterator-y version that still makes only one allocation. Use the extend
method, which is part of Extend<&str>
for String
:
fn reverse(input: &str) -> String {
let mut result = String::with_capacity(input.len());
result.extend(input.graphemes(true).rev());
result
}
I have a vague feeling that extend
is nicer, but both of these are perfectly idiomatic ways of writing the same code. You should not rewrite it to use collect
, unless you feel that expresses the intent better and you don't care about the extra allocation.
Related
Graphemes
hassize_hint()
, which is being used byString
for buffer size estimation in itsFromIterator
implementation, so I don't think there will be huge overhead due to use ofcollect()
. – Anarchism.fold(String::with_capacity(input.len()), |result, gc| result + gc)
instead of.collect::<Vec<&str>>().concat()
? – Incidentsize_hint
; didn't know about it. Would the number of memory allocation requests/calls be one like in the first approach? I think for every grapheme-cluster there'll be one allocation due to the correspondingVec::push
and then a final allocation forconcat
. The reason I ask isn't specific to this toy example, I'm trying to understand how the second approach works. Knowing it will be helpful in a larger project. – Incidentsize_hint()
implementation I realized that it uses1
as the lower bound, and the code which reserves the space based on the hint relies on the lower bound too (both forString
andVec
), so it feels like there actually will be problems with excessive allocations with this particular type (Graphemes
). – Anarchism.collect::<String>()
instead of.collect::<Vec<&str>>().concat()
– Fbi.collect()
like in trentcl's answer. – Incident