Problems with umlauts in python appdata environvent variable
Asked Answered
C

1

9

I can't find a correct way to get the environment variable for the appdata path in python.

The problem is that my user name includes special characters (the german ae and ue). I made a workaround wit PyQt for Vista and Windows 7 but it doesn't work for XP Systems.

Does anybody know the correct encoding of these environment variables or another solution for this problem?

Coronel answered 9/4, 2010 at 14:19 Comment(0)
I
10

As Mike says, you can get the system codepage from getfilesystemencoding. This encoding is used to convert Windows's native Unicode strings into bytes for all C stdio functions used by Python, including the filesystem calls that use byte string filepaths, and os.environ.

What this means is that you will be able to read a string with non-ASCII characters from os.environ and use it directly as a filepath without any special encode/decode step.

Unfortunately, if the %APPDATA% variable contains Unicode characters that are not present in the system codepage — for example, if on a German (cp1252) Windows install, your path was C:\Documents and Settings\αβγ\Application Data — then those characters will have already been mangled before you get the chance to use them. Decoding the byte string you get to Unicode using the filesystemencoding won't help in that case.

Here's a function you can use on recent Python versions that have the ctypes extension, to read Windows native Unicode environment variables.

def getEnvironmentVariable(name):
    name= unicode(name) # make sure string argument is unicode
    n= ctypes.windll.kernel32.GetEnvironmentVariableW(name, None, 0)
    if n==0:
        return None
    buf= ctypes.create_unicode_buffer(u'\0'*n)
    ctypes.windll.kernel32.GetEnvironmentVariableW(name, buf, n)
    return buf.value

In Python 3, the os.environ dictionary contains Unicode strings taken directly from Windows with no codepage encoding, so you don't have to worry about this problem there.

Ize answered 9/4, 2010 at 14:45 Comment(9)
Hi, thanks bobince for your answer. In this way I get the correct Appdata path, but it doesn't solve the problem with the umlauts. I can't find a way to decode the unicode string from buf.value in a right way.Coronel
buf.value is already a Unicode string. You don't need to decode it. You can use Unicode strings directly as filenames on Windows in Python 2.3 onwards (PEP277).Ize
but os.path.exist(buf.value) returns false... if i try it with a name without umlauts it's workingCoronel
getEnvironmentVariable(u'TEMP') works fine and i get: E:\DOKUME~1\SUPERH~1\LOKALE~1\Temp (my user name for testing is SuperHöäüßnz) But when i call getEnvironmentVariable(u'APPDATA') i get the wrong path: E:\Dokumente und Einstellungen\SuperH”„�ánz\Anwendungsdaten do you know how to fix this? and thanks for the answers before... they are already a big helpCoronel
What does repr(getEnvironmentVariable(u'APPDATA')) look like?Ize
E:\\Dokumente und Einstellungen\\SuperH\xe2\x80\x9d\xe2\x80\x9e\xef\xbf\xbd\xc3\xa1nz\\AnwendungsdatenCoronel
ok i just found out, that it works in console but it doesn't work in eclipse (pydev etc.)... thats strangeCoronel
Aaahhh. OK. The IDE is having trouble reading/writing environment variables itself, which just goes to show how common this problem is. Java has a whole host of problems with reading environment variables... although TBH I have no idea how it managed to end up with quite such a badly mangled string as that!Ize
yes i think this is the problem... if i type in the path directly (with the correct encoding specified) it doesn't work too... hmmm maybe bad for testing, but it will work in the release... but thank you very much for your help ;)Coronel

© 2022 - 2024 — McMap. All rights reserved.