subprocess.Popen with a unicode path
Asked Answered
Y

4

9

I have a unicode filename that I would like to open. The following code:

cmd = u'cmd /c "C:\\Pok\xe9mon.mp3"'
cmd = cmd.encode('utf-8')
subprocess.Popen(cmd)

returns

>>> 'C:\Pokיmon.mp3' is not recognized as an internal or external command, operable program or batch file.

even though the file do exist. Why is this happening?

Yvette answered 30/3, 2012 at 10:15 Comment(5)
I take it that 'cmd' stands in for something else?Savina
I removed the double quotes, even though they are not related to the question.Yvette
Have you included the python path to you PATH environment variable? Assuming your Python installation is in C:\Python25, your new path variable should be : %PATH%;C:\Python25Mcgregor
Yes it does, but what does the PATH has to do with anything?Yvette
related: Unicode filename to python subprocess.call()Philosophy
M
12

It looks like you're using Windows and Python 2.X. Use os.startfile:

>>> import os
>>> os.startfile(u'Pokémon.mp3')

Non-intuitively, getting the command shell to do the same thing is:

>>> import subprocess
>>> import locale
>>> subprocess.Popen(u'Pokémon.mp3'.encode(locale.getpreferredencoding()),shell=True)

On my system, the command shell (cmd.exe) encoding is cp437, but for Windows programs is cp1252. Popen wanted shell commands encoded as cp1252. This seems like a bug, and it also seems fixed in Python 3.X:

>>> import subprocess
>>> subprocess.Popen('Pokémon.mp3',shell=True)
Montmartre answered 31/3, 2012 at 0:14 Comment(5)
Thanks! i didnt know about os.startfile.Yvette
On Windows on Python 2, Popen(u'Pokémon.mp3'.encode(encoding)) works iff Popen(u'Pokémon.mp3'.encode('mbcs')) works i.e., it should succeed with cp1252 and it should fail with cp437 in your case. Does shell=True change it? What are values for sys.getfilesystemencoding() and locale.getpreferredencoding()? In general, u"é" might be unrepresentable using mbcs. Python 3 uses Unicode API directly.Philosophy
On windows on python 2, if you want to use unicode command line (as python 3), you can use this workaround leveraging ctypes to patch subprocess.Popen(..).Primeval
os.startfile works, but u'Pokémon.mp3'.encode(locale.getpreferredencoding()) will of course fail in any locale in which the ANSI codepage doesn't map "é". In 2.x subprocess.Popen calls CreateProcessA, which decodes the command line as ANSI, so it is limited to commands that can be encoded as such. If you need a command line that can't be encoded as ANSI, then you must do something else via ctypes, cffi, or an extension module, such as call CreateProcessW or a CRT function such as _wsystem.Assai
CMD is a Unicode application. It only uses codepages to decode bytes when working with files and pipes, such as reading a line of a batch script or a for /f loop that reads stdout from a command. In this case its default codepage is ANSI if it isn't attached to a console. Otherwise it uses the console's input or output codepage (CMD is not the console), which defaults to OEM unless changed via chcp.com. In any case, the encoding CMD uses for files is irrelevant. By the time CMD sees its command line, it's already decoded as Unicode by Windows.Assai
M
2

Your problem can be solved through smart_str function of Django module.

Use this code:

from django.utils.encoding import smart_str, smart_unicode
cmd = u'cmd /c "C:\\Pok\xe9mon.mp3"'
smart_cmd = smart_str(cmd)
subprocess.Popen(smart_cmd)

You can find information on how to install Django on Windows here. You can first install pip and then you can install Django by starting a command shell with administrator privileges and run this command:

pip install Django

This will install Django in your Python installation's site-packages directory.

Mcgregor answered 30/3, 2012 at 11:18 Comment(6)
I won't install a whole new framework just to encode unicode correctly. fix should be one or two lines long, not 1000+ of complex code.Yvette
ok, am sorry, I have updated my answer. Maybe it is more helpful now.Mcgregor
First, the latin-1 encoding is not unicode. It won't work with all unicode cases. Second, it's still doesn't work. Try it yourself.Yvette
ok, I work on Linux and I tested it with the os.popen it worked.. Maybe for windows doesn't work.. :( I remove my updated part of the answer.Mcgregor
Someone made a standalone module out of Django smart_str: smartencodingSelfstarter
@Selfstarter that's really helpful! Good idea to isolate and include that functionality in a module. thanks!Mcgregor
B
-1
>>> subprocess.call(['start', u'avión.mp3'.encode('latin1')], shell=True)
0

There's no need to call cmd if you use the shell parameter The correct way to launch an associated program is to use the cmd's start built-in AFAIK.

My 2c, HIH.

Brillatsavarin answered 30/3, 2012 at 15:51 Comment(1)
Thanks for the side note, but this still doesn't fix the unicode problem. This works on your system because your locale MBCS has the ó char. This code won't work on computers that has hebrew or japanese as their locale language.Yvette
N
-2

I think windows uses 16-bit characters, not sure if it's UCS2 or UTF16 or something like that. So I guess that it could have an issue with UTF8.

Nystrom answered 30/3, 2012 at 10:34 Comment(1)
setting as 'utf-16' returns TypeError: must be string without null bytes or None, not str so i guess thats wrong.Yvette

© 2022 - 2024 — McMap. All rights reserved.