The concept of a "character" is very ambiguous and can mean many different things depending on the type of data you are working with. The most obvious answer is the chars
method. However, this does not work as advertised. What looks like a single "character" to you may actually be made up of multiple Unicode code points, which can lead to unexpected results:
"a̐".chars() // => ['a', '\u{310}']
For a lot of string processing, you want to work with graphemes. A grapheme consists of one or more unicode code points represented as a string slice. These map better to the human perception of "characters". To create an iterator of graphemes, you can use the unicode-segmentation
crate:
use unicode_segmentation::UnicodeSegmentation;
for grapheme in my_str.graphemes(true) {
// ...
}
If you are working with raw ASCII then none of the above applies to you, and you can simply use the bytes
iterator:
for byte in my_str.bytes() {
// ...
}
Although, if you are working with ASCII then arguably you shouldn't be using String
/&str
at all and instead use Vec<u8>
/&[u8]
directly.
O(n*m)
). en.wikipedia.org/wiki/String_searching_algorithm – Colima