How to convert ansi text to utf8
Asked Answered
S

5

9

How to convert ansi text to utf8 in Go? I am trying to convert ansi string to utf8 string.

Stinkstone answered 3/8, 2011 at 13:56 Comment(1)
D
6

Go only has UTF-8 strings. You can convert something to a UTF8 string using the conversion described here from a byte[]:

http://golang.org/doc/go_spec.html#Conversions

Dieppe answered 3/8, 2011 at 14:5 Comment(0)
L
5

Here is newer method.

package main    
import (
    "bytes"
    "fmt"
    "io/ioutil"    
    "golang.org/x/text/encoding/traditionalchinese"
    "golang.org/x/text/transform"
)    
func Decode(s []byte) ([]byte, error) {
    I := bytes.NewReader(s)
    O := transform.NewReader(I, traditionalchinese.Big5.NewDecoder())
    d, e := ioutil.ReadAll(O)
    if e != nil {
        return nil, e
    }
    return d, nil
}
func main() {
    s := []byte{0xB0, 0xAA}
    b, err := Decode(s)
    fmt.Println(string(b))
    fmt.Println(err)
}

I were use iconv-go to do such convert, you must know what's your ANSI code page, in my case, it is 'big5'.

package main
import (
    "fmt"
    //iconv "github.com/djimenez/iconv-go"
    iconv "github.com/andelf/iconv-go"
    "log"
)

func main() {
    ibuf := []byte{170,76,80,67}
    var obuf [256]byte

    // Method 1: use Convert directly
    nR, nW, err := iconv.Convert(ibuf, obuf[:], "big5", "utf-8")
    if err != nil {
        log.Fatalln(err)
    }
    log.Println(nR, ibuf)
    log.Println(obuf[:nW])
    fmt.Println(string(obuf[:nW]))

    // Method 2: build a converter at first
    cv, err := iconv.NewConverter("big5", "utf-8")
    if err != nil {
        log.Fatalln(err)
    }
    nR, nW, err = cv.Convert(ibuf, obuf[:])
    if err != nil {
        log.Fatalln(err)
    }
    log.Println(string(obuf[:nW]))
}
Ludwigg answered 31/5, 2013 at 4:13 Comment(0)
T
3

I've written a function that was useful for me, maybe someone else can use this. It converts from Windows-1252 to UTF-8. I've converted some code points that Windows-1252 treats as chars but Unicode considers to be control characters (http://en.wikipedia.org/wiki/Windows-1252)

func fromWindows1252(str string) string {
    var arr = []byte(str)
    var buf bytes.Buffer
    var r rune

    for _, b := range(arr) {
        switch b {
        case 0x80:
            r = 0x20AC
        case 0x82:
            r = 0x201A
        case 0x83:
            r = 0x0192
        case 0x84:
            r = 0x201E
        case 0x85:
            r = 0x2026
        case 0x86:
            r = 0x2020
        case 0x87:
            r = 0x2021
        case 0x88:
            r = 0x02C6
        case 0x89:
            r = 0x2030
        case 0x8A:
            r = 0x0160
        case 0x8B:
            r = 0x2039
        case 0x8C:
            r = 0x0152
        case 0x8E:
            r = 0x017D
        case 0x91:
            r = 0x2018
        case 0x92:
            r = 0x2019
        case 0x93:
            r = 0x201C
        case 0x94:
            r = 0x201D
        case 0x95:
            r = 0x2022
        case 0x96:
            r = 0x2013
        case 0x97:
            r = 0x2014
        case 0x98:
            r = 0x02DC
        case 0x99:
            r = 0x2122
        case 0x9A:
            r = 0x0161
        case 0x9B:
            r = 0x203A
        case 0x9C:
            r = 0x0153
        case 0x9E:
            r = 0x017E
        case 0x9F:
            r = 0x0178
        default:
            r = rune(b)
        }

        buf.WriteRune(r)
    }

    return string(buf.Bytes())
}
Tayler answered 13/9, 2012 at 15:36 Comment(0)
P
2

There is no way to do it without writing the conversion yourself or using a third-party package. You could try using this: http://code.google.com/p/go-charset

Predisposition answered 3/8, 2011 at 21:17 Comment(0)
L
2

golang.org/x/text/encoding/charmap package has functions exactly for this problem

import "golang.org/x/text/encoding/charmap"

func DecodeWindows1250(enc []byte) string {
    dec := charmap.Windows1250.NewDecoder()
    out, _ := dec.Bytes(enc)
    return string(out)
}

func EncodeWindows1250(inp string) []byte {
    enc := charmap.Windows1250.NewEncoder()
    out, _ := enc.String(inp)
    return out
}

Edit: undefined: ba is replace enc

Latinity answered 13/7, 2017 at 9:12 Comment(1)
I just use: out, _ := charmap.Windows1250.NewDecoder().String(input)Sadness

© 2022 - 2024 — McMap. All rights reserved.