Slice unicode/ascii strings in golang?
Asked Answered
D

2

17

I need to slice a string in Go. Possible values can contain Latin chars and/or Arabic/Chinese chars. In the following example, the slice annotation [:1] for the Arabic string alphabet is returning a non-expected value/character.

    package main
    
    import "fmt"
    
    func main() {
        a := "a"
        fmt.Println(a[:1]) // works
        
        b := "ذ"
        fmt.Println(b[:1]) // does not work
        fmt.Println(b[:2]) // works
    
        fmt.Println(len(a) == len(b)) // false
    }

http://play.golang.org/p/R-JxaxbfNL

Dragster answered 14/7, 2015 at 22:17 Comment(1)
You can make use of At method of golang.org/x/exp/utf8string.Midstream
M
36

First of all, you should really read about strings, bytes and runes in Go.

And here is how you can achieve what you want: Go playground (I was not able to properly paste arabic symbols, but if Chinese works, arabic should work too).

    s := "abcdefghijklmnop" 
    fmt.Println(s[2:9]) 

    s = "维基百科:关于中文维基百科" 
    fmt.Println(string([]rune(s)[2:9]))

The output is:

cdefghi
百科:关于中文
Metagnathous answered 14/7, 2015 at 22:41 Comment(1)
It worked. Thanks. Note: Instead use len(s), I have used utf8.RuneCountInString(s) to get string size. Function len(s) counts bytes, not chars. golang.org/pkg/builtin/#lenDragster
O
1

You can use the utf8string package:

package main
import "golang.org/x/exp/utf8string"

func main() {
   a := utf8string.NewString("🎈🎄🎀🎢👓")
   // example 1
   r := a.At(1)
   // example 2
   s := a.Slice(1, 3)
   // example 3
   n := a.RuneCount()
   // print
   println(r == '🎄', s == "🎄🎀", n == 5)
}

https://pkg.go.dev/golang.org/x/exp/utf8string

Osmo answered 5/1, 2021 at 3:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.