It has a function instead of an operator, strings.Repeat
. Here's a port of your Python example, which you can run here:
package main
import (
"fmt"
"strings"
"unicode/utf8"
)
func main() {
x := "my new text is this long"
y := strings.Repeat("#", utf8.RuneCountInString(x))
fmt.Println(x)
fmt.Println(y)
}
Note that I've used utf8.RuneCountInString(x)
instead of len(x)
; the former counts "runes" (Unicode code points), while the latter, when called on a string, counts bytes. In the case of "my new text is this long"
, the difference doesn't matter since all the runes are only one byte each, but it's good to get into the habit of specifying what you mean:
len("ā") //=> 2
utf8.RuneCountInString("ā") //=> 1
An alternative to calling RuneCountInString
is to convert the string to an array of runes and then call len
on that:
y := strings.Repeat("#", len([]rune(x)))
But if all you're doing with the runes is counting them, I think it's clearer to use the utf8 function.
Since this was a Python comparison question, note that the Python version of len
also counts different things depending on what you call it on. In Python 2, it counted bytes on plain strings and runes on Unicode strings (u'...'
):
Python 2.7.18 (default, Sep 10 2022, 16:30:21)
>>> len('ā') #=> 2
>>> len(u'ā') #=> 1
Whereas in modern Python, plain strings are Unicode strings; if you want to count bytes, you need to encode the string into a bytearray
first:
Python 3.12.0 (main, Oct 13 2023, 15:35:30)
>>> len('ā') #=> 1
>>> len('ā'.encode('UTF-8')) #=> 2
So Python has multiple types of string; Go has only one kind of string, but different ways of dealing with its contents.
Oh, it's also worth noting that the Golang concept of a "rune" doesn't (and can't) solve the problem that in Unicode, the question "How much string is one character?" does not always have a well-defined answer. I used "ā"
above as an example of a string that's two bytes long containing only one rune (specifically U+0101 LATIN SMALL LETTER A WITH MACRON). But you could get what looks like that same string ("ā"
) by instead combining two runes (U+0061 LATIN SMALL LETTER A and U+0304 COMBINING MACRON), giving rune count 2 and byte len()
4. Proper Unicode processing will treat both forms as equal to each other (and convert one to the other depending on which Normalization Form is selected) but there's not really any sense in which the Platonic ideal string they're both equivalent to can be said to contain a definite number of runes.