Golang complex fold grüßen
Asked Answered
C

1

20

I'm trying to get case folding to be consistent between three languages (C++, Python and Golang) because I need to be able to check if a string matches the one saved no matter the language.

An example problematic word is the German word "grüßen" which in uppercase is "GRÜSSEN" (Note the 'ß' becomes two characters as 'SS').

Is there some way to do this that I'm missing, or does this bug at the end of unicode's documentation apply to all usages of text conversion in golang? If so, what are my options for case folding other than writing it in cgo?

Concur answered 28/3, 2017 at 2:59 Comment(7)
Given golang implements the capitalisation function as func to(_case int, r rune, caseRange []CaseRange) rune { is it even possible to return multiple rules at all.Seeker
Yeah, that's what I'm trying to get at. There are languages where one "rune" can become two through case folding / capitalization, so there should be a way to handle such a thing in golang.Concur
If you end up creating an issue could you please post a link here (since I don't think there is something there to properly convert it)Seeker
Will do. I just didn't want to create an issue until I had done more research / reached out for help.Concur
Interesting, and kinda relevant: unicode.org/Public/UCD/latest/ucd/CaseFolding.txt full case folding are the tricky unicode codepoints (and won't work in Go)Seeker
Not in the core: please look at what golang.org/x/text can do for you.Marceline
Awesome! Thanks kostix. If you turn that into an answer I will accept it. Basically using import "golang.org/x/text/cases" I can do c := cases.Fold() then c.String("grüßen") and it works.Concur
M
11

Advanced (Unicode-enabled) text processing is not part of the Go stdlib,¹ and exists in the form of a host of ("blessed") third-party packages under the golang.org/x/text/ umbrella.

As Shawn figured out by himself, one can do

import (
  "golang.org/x/text/cases"
)

c := cases.Fold()
c.String("grüßen")

to get "grüssen" back.


¹ That's because whatever is shipped in the stdlib is subject to the Go 1 compatibility promise, and at the time Go 1 was shipped certain functionality wasn't available or was incomplete or its APIs were in flux etc, so such bits were kept out of the core to let them mature.

Marceline answered 28/3, 2017 at 7:28 Comment(1)
c.String("grüßen") actually returns grüssen, not GRÜSSEN.Concur

© 2022 - 2024 — McMap. All rights reserved.