How to detect current terminal encoding and convert user input to and from utf8?
Asked Answered
S

2

12

I am writing a golang command line program that accepts user input. This input string has to be converted to UTF-8 and sent to another server for processing. On Linux, the terminal encoding is almost always UTF-8 but this does not seem to be the case in Windows. I tried setting the codepage on windows to 65001 using

chcp 65001

and also ensured the terminal font is set to Lucida console. However, the bytes read by

fmt.Scanf()

is not in UTF-8 format. I want to be able to detect the character encoding and convert the strings to UTF-8. Similarly, I should be able to convert from UTF-8 to the local encoding before printing to the screen.

Python seems to have "locale" package which can get the default encoding, decode and encode strings to any specified encoding. Is there an equivalent of this for golang?

Most of the stackoverflow discussions pointed at using chcp 65001 to change the encoding on windows terminal to UTF-8. This doesn't seem to work for me.

func main() {
    foo := ""
    fmt.Printf("Enter: ")
    if _, err := fmt.Scanln(&foo) ; err != nil {
        fmt.Println("Error while scanning: ", err)
    }
    fmt.Printf("Scanned bytes: % x", foo)
    fmt.Println()
}

On Linux:

// ASCII
$ go run test.go
Enter: hello
Scanned bytes: 68 65 6c 6c 6f

// Unicode
$ go run test.go
Enter: ©
Scanned bytes: c2 a9

// Unicode
$ go run test.go
Enter: ΆΏΑΓΔΘΞ
Scanned bytes: ce 86 ce 8f ce 91 ce 93 ce 94 ce 98 ce 9e ce a3 ce a8 ce a9 ce aa ce ad ce b1 ce b2 ce ba

On Windows:

PS C:\> chcp
Active code page: 437

PS C:\> go run .\test.go
Enter: hello
Scanned bytes: 68 65 6c 6c 6f

PS C:\> go run .\test.go
Enter: ΆΏΑΓΔΘΞ
Scanned bytes: 3f 3f 61

// Change to Unicode
PS C:\> chcp 65001
Active code page: 65001
PS C:\> go run .\test.go
Enter: ΆΏΑΓΔΘΞ
Error while scanning:  EOF
Scanned bytes:

Appreciate any help/pointers.

Scrimp answered 6/10, 2016 at 1:23 Comment(7)
I don't think there is a standard way for terminal environment to expose the encoding (since you included Windows). The best solution I can think of is to manually add environment variable for this, then you read the variable with os.Getenv().Periphery
Thanks, @KoalaYeung. Do you know how is unicode handled in golang in Windows environment, in general? The Scanf/bufio.read don't seem to read non-UTF-8 input.Scrimp
Sorry. I have no experience in Windows with Golang. I read some article says you can do chcp 65001 to change the code page to UTF-8. Do you think this helps?Periphery
Apparently it did not. I mentioned that in the question.Scrimp
Sorry. Missed that part.Periphery
Have you tried fmt.Fscanf with iconv's Reader? You'd need to know before hand what the encoding is (maybe with environment variables).Periphery
@KoalaYeung Yes I tried iconv but again I need to first determine the terminal encoding. For now, I plan to use nl_langinfo(CODESET) with some C code in the go program.Scrimp
D
0

You could maybe do something like this so you know when it is windows and what encoding it is and then use this information to convert it to UTF-8 if it is not already. If I am understanding correctly at least. I assume you would have to change to fit your code.

package main

//Import Stuff
import (
    "fmt"
    "os"
    "os/exec"
    "runtime"
    "strings"
)

func main() {

    //Check Operating System
    operatingSystem := runtime.GOOS

    if operatingSystem == "windows" {

        fmt.Println("You are on windows!")

        //Run chcp
        // See links for more info on chcp
        command := exec.Command("cmd", "/C", "chcp")
        output, err := command.Output()

        //Rip
        if err != nil {
            fmt.Println("Error:", err)

            //1 means failure if stack over flow guy is correct
            os.Exit(1)
        }

        cleanOutput := strings.TrimSpace(string(output))
        outputCode := strings.Fields(cleanOutput)[3]

        //See microsoft doc to see what this means
        fmt.Println(outputCode)

        switch outputCode {
        case "65001":
            fmt.Println("UTF-8")
        case "437":
            fmt.Println("US-ASCII")
        default:
            fmt.Printf("Rip")
        }
    }
}

Linking to microsoft docs on command in case you need it.

https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/chcp

Dekker answered 22/8 at 13:56 Comment(0)
M
-1

I run this code in window and check it in terminal with git bash and poweshell and etc. and it's work correctly.

maybe your problem for your system language setting.

for more information about utf8 i recommend you read this package document

Mackay answered 15/10, 2022 at 8:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.