Encoding issue with python3 and click package
Asked Answered
V

4

15

When the lib click detects that the runtime is python3 but the encoding is ASCII then it ends the python program abruptly:

RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Either switch to Python 2 or consult http://click.pocoo.org/python3/ for mitigation steps.

I found the cause of this issue in my case, when I connect to my Linux host from my Mac, the Terminal.app set the SSH session locale to my Mac locale (es_ES.UTF-8) However my Linux host hasn't installed such locale (only en_US.utf-8).

I applied an initial workaround to fix it (but It had many issues, see accepted answer):

import locale, codecs
# locale.getpreferredencoding() == 'ANSI_X3.4-1968'
if codecs.lookup(locale.getpreferredencoding()).name == 'ascii':
    os.environ['LANG'] = 'en_US.utf-8'

EDIT: For a better patch see my accepted answer.

All my linux hosts have installed 'en_US.utf-8' as locale (Fedora uses it as default).

My question is: Is there a better (more robust) way to choose/force the locale in a python3 script ? For instance, setting one of the available locales in the system.

Maybe there is a different approach to fix this issue but I didn't find it.

Venitavenite answered 26/8, 2015 at 18:46 Comment(1)
FYI for those wondering "why python3 can error out because of something small like unset locale (i.e. env vars LANG, LC_ALL)" --> read PEP 538 and the related PEP 540. The error appears to only be an issue for python 3.0 to 3.6 because PEP 538 fixes the issues for python >= 3.7.Pickings
V
5

If you have python version >= 3.7, then you should not need to do anything. If you have python 3.6 see the original solution.

EDIT 2017-12-08

I've seen that there is a PEP 538 for py3.7, that will change the entire behavior of python3 encoding management during startup, I think that the new approach will fix the original problem: https://www.python.org/dev/peps/pep-0538/

IMHO the changes targeted to python 3.7 for encoding issues, should have been planed years ago, but better late than never, I guess.

EDIT 2015-09-01

There is an opened issue (enhancement), http://bugs.python.org/issue15216, that will allow to change the encoding in a created (not-used) stream easily (sys.std*). But is targeted to python 3.7 So, we'll have to wait for a while.

Original solution that targets python version 3.6

NOTE: this solution should not be needed for anyone running python version >= 3.7 see PEP 538

Well, my initial workaround had many flaws, I got to pass the click library check about the encoding, but the encoding itself was not fixed, so I get exceptions when the input parameters or output had non-ascii characters.

I had to implement a more complex method, with 3 steps: set locale, correct encoding in std in/out and re-encode the command line parameters, besides I've added a "friendly" exit if the first try to set the locale doesn't work as expected:

def prevent_ascii_env():
    """
    To avoid issues reading unicode chars from stdin or writing to stdout, we need to ensure that the 
    python3 runtime is correctly configured, if not, we try to force to utf-8, 
    but It isn't possible then we exit with a more friendly message that the original one.
    """
    import locale, codecs, os, sys
    # locale.getpreferredencoding() == 'ANSI_X3.4-1968'
    if codecs.lookup(locale.getpreferredencoding()).name == 'ascii':
        os.environ['LANG'] = 'en_US.utf-8'
        if codecs.lookup(locale.getpreferredencoding()).name == 'ascii':
            print("The current locale is not correctly configured in your system")
            print("Please set the LANG env variable to the proper value before to call this script")
            sys.exit(-1)
        #Once we have the proper locale.getpreferredencoding() We can change current stdin/out streams
        _, encoding = locale.getdefaultlocale()
        import io
        sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding=encoding, errors="replace", line_buffering=True)
        sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding=encoding, errors="replace", line_buffering=True)
        sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding=encoding, errors="replace", line_buffering=True)
        # And finally we need to re-encode the input parameters
        for i, p in enumerate(sys.argv):
            sys.argv[i] = os.fsencode(p).decode() 

This patch solves almost all issues, however it has a caveat, the method shutils.get_terminal_size() raises a ValueError because the sys.__stdout__ has been detached, click lib uses that method to print the help, to fix it I had to apply a monkey-patch on click lib

def wrapper_get_terminal_size():
    """
    Replace the original function termui.get_terminal_size (click lib) by a new one 
    that uses a fallback if ValueError exception has been raised
    """
    from click import termui, formatting
    
    old_get_term_size = termui.get_terminal_size
    def _wrapped_get_terminal_size():
        try:
            return old_get_term_size()
        except ValueError:
            import os
            sz = os.get_terminal_size()
            return sz.columns, sz.lines
    termui.get_terminal_size = _wrapped_get_terminal_size
    formatting.get_terminal_size = _wrapped_get_terminal_size

With this changes all my scripts work fine now when the environment has a wrong locale configured but the system supports en_US.utf-8 (It's the Fedora default locale).

If you find any issue on this approach or have a better solution, please add a new answer.

Venitavenite answered 27/8, 2015 at 15:36 Comment(3)
Looks like the issue is no longer there in Python 3.7. Also, if it were then could use the new sys.stdout/err/in.reconfigure(encoding='utf-8') calls instead.Nomi
A much simpler solution i found is two lines of bash shell export LANG=en_US.utf8 and export LC_ALL=en_US.utf8 before your python code has to execute (replace the values with whatever your locale is configured to) --> this solution is even given in the PEP 538 document seen here.Pickings
No, that is not a solution for this question, the question is about how to fix it within the python code, do it before the python code is not an option.Venitavenite
S
3

It's an aged thread, however this answer might help other in the future or myself. If it's *nux

env | grep LC_ALL

if it's set, do the follows. That's all of it.

unset LC_ALL

Sullyprudhomme answered 11/6, 2018 at 16:1 Comment(5)
The problem never was to fix it from console, but from python code, once the application is running.Venitavenite
@reberto need to unset LC_ALL before python is running if it exists.Sullyprudhomme
Thanks @Sullyprudhomme , but as I said, fixing the environment before the python program is launched is not an option, the question is about how to deal with a broken locale environment inside a python program that is already running.Venitavenite
I understand that this is not the answer you wanted, but it is by far the easiest way to fix the problem. Use a wrapper-script to launch your app, and inside it set export LC_ALL=en_US.UTF-8; export LANG=en_US.UTF-8 on macOS, or export LC_ALL=C.UTF-8; export LANG=C.UTF-8 on Linux, before starting your Python program.Saltatorial
@Venitavenite you can try import locale locale.setlocale(locale.LC_ALL, '') Reference: docs.python.org/2/library/locale.html and docs.python.org/3/library/locale.htmlSullyprudhomme
P
1

If you are running python 3.6 then you will still get this error. Here is a simple solution that the authors of click recommend:

#!/bin/bash
# before your python code executes set two environment variables
export LANG=en_US.utf8
export LC_ALL=en_US.utf8
Pickings answered 18/10, 2021 at 15:25 Comment(1)
The question is about how to fix it within python code, not before que program is calledVenitavenite
S
0

I haven't found this simple method (re-exec script with proper environment before doing anything) so I'll add it for future travellers using old Python version for some reason. Add it bellow imports to be that first :

if os.environ["LC_ALL"] != "C.UTF-8" or os.environ["LANG"] != "C.UTF-8":
    os.execve(sys.executable,
              [os.path.realpath(__file__)] + sys.argv,
              {"LC_ALL": "C.UTF-8", "LANG": "C.UTF-8"})

Sachet answered 7/12, 2020 at 18:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.