I don't know how to explain this, actually I'm looking for the explanation, so I'll just mention some steps to reproduce the issue. Hopefully someone will be able to understand and elaborate:
- Python 3.5.0 on Windows 8.1. (However this should be reproducible regardless of Python and Windows version.)
- Having Persian standard keyboard Installed. (It can be downloaded from here. Again I'm sure the problem is not limited to this specific keyboard and there are some characters in some other languages that have the same problem. Just for the sake of reproducibility. )
- Open IDLE, set the keyboard's layout to Persian and type some characters.
- For some characters like 'آ' (Shift+h). They are typed perfectly fine.
- For some other characters like 'ی' (d). They are converted to a similar character, in this case 'ي' (notice the small dots under the glyph).
- There are some characters that can't be typed. For example '﷼' (Shift+4). These are typed as '?' in IDLE.
- All the above characters can be typed in almost any other program that I have installed. One of the simplest ones being notepad.exe.
- We can type the same characters in another program e.g. notepad.exe and then copy and paste them into IDLE. This shows that IDLE supports unicode characters, just can't type them.
I'm a fan of IDLE. It's lightweight IDE that is shipped with the standard Python installation and I don't want to switch to another IDE just because of this. But the above is the most annoying thing about IDLE for me. Whenever I need to write a program with some Persian characters in it, I can't trust IDLE to type them correctly and I have to open some other program and use the copy-paste method.
What I'm looking for is:
- Why this happens? Where is the problem?
- Are there any workarounds?
- Any documentation or bug reports directly related to this issue.
Also this information may be helpful:
>>> import locale
>>> locale.getdefaultlocale()
('en_US', 'cp1256')
>>> locale.getpreferredencoding()
'cp1256'
>>> locale.getlocale()
('English_United States', '1252')
>>>
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
Thanks.
Update:
Please see the first three comments below. It seems that this issue is caused by usage of WindowsBestFit mappings while typing in tkinter apps.
To test whether it's some bad configuration in python/tkinter bindings or tcl/tk itself, I downloaded and installed Tkabber. It's an application written in Tcl/Tk. Well, the exact same problem exists there i.e. I can't type the above characters but can copy and paste them. So my conclusion is that the root of the problem lies in tcl/tk itself and not IDLE/Python/tkinter.
My questions still hold.
>>> hex(ord('X'))
, where X is your entry. What values do you get for your entries? – Mantrac.encode('unicode_escape')
: In step five what I'm trying to type is b'\\u06cc'
(copy-paste), but what I get by typing it directly into IDLE isb'\\u064a'
. In step 6:b'\\ufdfc'
is converted tob'?'
. – Rauschb'\\u06cc'
tob'\\u064a'
is according to Windows best fit mapping table. IDLE is definitely dependent on WindowsBestFit somewhere. But where exactly? I don't know,,, – RauschU+06CC
andU+064A
exists in the former but not in the latter). However even if I change the font to Lucida (that does not include the Arabic glyphs required) there seems to be a font fallback mechanism involved such that it shows the Arabic parts with some other fancy font. – Rausch