How to create a case insensitive map in Go?
Asked Answered
Y

3

18

I want to have a key insensitive string as key. Is it supported by the language or do I have to create it myself? thank you

Edit: What I am looking for is a way to make it by default instead of having to remember to convert the keys every time I use the map.

Yeargain answered 20/6, 2012 at 17:12 Comment(5)
Map to Unicode foldcase each time, manually.Oma
SCL, are you concerned with Unicode in this case? That is, do your needs include either rejection of unexpected Unicode code points or careful attention to handling of expected Unicode code points?Selfrighteous
@sonia, hi, I was considering only ASCII. But since you are asking, how would I handle Unicode?Yeargain
@SCL For non-ASCII, you have a problem, because Go does not to my knowledge provide a toFoldcase map to make this feasible. Sonya’s code only works on ASCII, but screws up on Unicode.Oma
I understand that there are a number of issues. I think it deserves a separate question, ideally based on your case. Tell where your data is coming from, what you expect to be in it, what kinds of outcomes you want.Selfrighteous
S
12

Edit: My initial code actually still allowed map syntax and thus allowed the methods to be bypassed. This version is safer.

You can "derive" a type. In Go we just say declare. Then you define methods on your type. It just takes a very thin wrapper to provide the functionality you want. Note though, that you must call get and set with ordinary method call syntax. There is no way to keep the index syntax or optional ok result that built in maps have.

package main

import (
    "fmt"
    "strings"
)

type ciMap struct {
    m map[string]bool
}

func newCiMap() ciMap {
    return ciMap{m: make(map[string]bool)}
}

func (m ciMap) set(s string, b bool) {
    m.m[strings.ToLower(s)] = b
}

func (m ciMap) get(s string) (b, ok bool) {
    b, ok = m.m[strings.ToLower(s)]
    return
}

func main() {
    m := newCiMap()
    m.set("key1", true)
    m.set("kEy1", false)
    k := "keY1"
    b, _ := m.get(k)
    fmt.Println(k, "value is", b)
}
Selfrighteous answered 20/6, 2012 at 18:16 Comment(3)
Mapping to lowercase doesn’t work for Unicode data, only for ASCII. You should be mapping to Unicode foldcase here, not lowercase. Otherwise yours is a Sisyphean task, since lowercase of Σίσυφος is σίσυφος, while lowercase of its uppercase, ΣΊΣΥΦΟΣ, is the correct σίσυφοσ, which is indeed the foldcase of all of those. Do you now understand why Unicode has a separate map? The casemappings are too complex for blindly mapping to anything not designed for that explicit purpose, and hence the presence of a 4th casemap in the Unicode casing tables: uppercase, titlecase, lowercase, foldcase.Oma
The requirement was strings. Go uses Unicode for strings, not ASCII. They asked for a case-insensitive map. You provided an ASCII-only solution without evening bothering to mention this. My comments are perfectly on topic, because you did not answer the question as asked and worded, which had no ASCII-only restriction. Now, it turns out that this person actually had nothing but ASCII, and so your solution sneaked by even though it is wrong in the general case. Write solutions that work for Unicode, and they’ll work for ASCII too — but the reverse does not hold, which is why your code is buggy.Oma
Note that ToUpper should be preferred - see my answerWandie
I
4

Two possiblities:

  1. Convert to uppercase/lowercase if you're input set is guaranteed to be restricted to only characters for which a conversion to uppercase/lowercase will yield correct results (may not be true for some Unicode characters)

  2. Convert to Unicode fold case otherwise:

Use unicode.SimpleFold(rune) to convert a unicode rune to fold case. Obviously this is dramatically more expensive an operation than simple ASCII-style case mapping, but it is also more portable to other languages. See the source code for EqualsFold to see how this is used, including how to extract Unicode runes from your source string.

Obviously you'd abstract this functionality into a separate package instead of re-implementing it everywhere you use the map. This should go without saying, but then you never know.

Indult answered 20/6, 2012 at 17:21 Comment(3)
But that would be error prone since maybe it is exposed as a library or I can forget to do it. Is there any way to create a derived type that can do it automatically?Yeargain
@Oma Read groups.google.com/d/msg/golang-nuts/0sS1VCdK8UU/KtG8DAsRm8YJ. Thanks.Primitivism
Downvote converted to upvote. I’m still a little unsure on unicode.SimpleFold because it seems to iterate through the simple fold possibilities, rather than actually producing the foldcase map as you would want with a toSimpleFold or toFullFold string mapping. The thing about "tschüß" and "tschüss" is that it is not a locale-specific thing. This is the casefold for any language, according to the tables. "tschuess" equivalence, on the other hand, would be a locale-specific thing. These are different matters, actually.Oma
W
2

Here is something more robust than just strings.ToLower, you can use the golang.org/x/text/cases package. Example:

package main
import "golang.org/x/text/cases"

func main() {
   s := cases.Fold().String("March")
   println(s == "march")
}

If you want to use something from the standard library, I ran this test:

package main

import (
   "strings"
   "unicode"
)

func main() {
   var (
      lower, upper int
      m = make(map[string]bool)
   )
   for n := '\u0080'; n <= '\u07FF'; n++ {
      q, r := n, n
      for {
         q = unicode.SimpleFold(q)
         if q == n { break }
         for {
            r = unicode.SimpleFold(r)
            if r == n { break }
            s, t := string(q), string(r)
            if m[t + s] { continue }
            if strings.ToLower(s) == strings.ToLower(t) { lower++ }
            if strings.ToUpper(s) == strings.ToUpper(t) { upper++ }
            m[s + t] = true
         }
      }
   }
   println(lower == 951, upper == 989)
}

So as can be seen, ToUpper is the marginally better choice.

Wandie answered 24/12, 2020 at 2:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.