Is there any difference between range over string and range over rune slice?
Asked Answered
E

2

3

Ranging over string

func main() {
    str := "123456"
    for _, s := range str {
        fmt.Printf("type of v: %s, value: %v, string v: %s \n", reflect.TypeOf(s), s, string(s))
    }
}

https://play.golang.org/p/I1JCUJnN41h

And ranging over rune slice ([]rune(str))

func main() {
    str := "123456"
    for _, s := range []rune(str) {
        fmt.Printf("type : %s, value: %v ,string : %s\n", reflect.TypeOf(s), s, string(s))
    }
}

https://play.golang.org/p/rJvyHH6lkl_t

I got the same results, are they the same?

Eckmann answered 2/3, 2018 at 2:54 Comment(1)
Also: Converting to rune allocates.Martian
S
9

Yes there is a difference. Given

for i, c := range v {

c will be the same whether v is a string or a rune slice, but i will vary if the string contains multibyte characters.

String Indexing

Strings are sequences of bytes and indexing is appropriate to a slice of bytes. Unless you are intentionally reading or manipulating bytes instead of code points or characters, or are sure your input contains no multibyte characters, wherever you are inclined to index a string you should use a rune slice instead.

Range Loops are Special

for i, c := range str {

Range loops over strings are special. Instead of treating the string simply as a slice of bytes, range treats the string partly like a slice of bytes and partly like a slice of runes.

The i will be the byte index of the beginning of the code point. The c will be a rune that can contain more than one byte. This means i can increase by more than one in an iteration because the prior code point was a multibyte character.

Besides the axiomatic detail that Go source code is UTF-8, there's really only one way that Go treats UTF-8 specially, and that is when using a for range loop on a string. We've seen what happens with a regular for loop. A for range loop, by contrast, decodes one UTF-8-encoded rune on each iteration. Each time around the loop, the index of the loop is the starting position of the current rune, measured in bytes, and the code point is its value.

See more in the official Go Blog post the above is excerpted from: Strings, bytes, runes and characters in Go

Scaife answered 2/3, 2018 at 3:21 Comment(0)
H
9

You got the same results only because you didn't include any multi-byte characters, and ignored the indexes.

// ranges over rune as indexed in the string
for i, r := range s {

// ranges over runes as indexed in the []rune
for i, r := range []rune(s)

For example: https://play.golang.org/p/ZLCc3UNL2dR

s := "こんにちは世界"

fmt.Println("range s")
for i, r := range s {
    fmt.Printf("%d: %q\n", i, r)
}

fmt.Println("\nrange []rune(s)")
for i, r := range []rune(s) {
    fmt.Printf("%d: %q\n", i, r)
}

Which prints

range s
0: 'こ'
3: 'ん'
6: 'に'
9: 'ち'
12: 'は'
15: '世'
18: '界'

range []rune(s)
0: 'こ'
1: 'ん'
2: 'に'
3: 'ち'
4: 'は'
5: '世'
6: '界
Hydro answered 2/3, 2018 at 3:21 Comment(4)
i think this issue has been fixed in goRussel
@FlashNoob: there is no issue to fix, what are you talking about?Hydro
if im running your example locally, both are giving me same resultsRussel
@FlashNoob, I assure you nothing has changed, iteration over runes in a string is part of the language specification. Click the playground link which will use the latest Go release, and the same indexes are still printedHydro

© 2022 - 2024 — McMap. All rights reserved.