Why some characters can not be typed in Python's IDLE?
Asked Answered
R

1

7

I don't know how to explain this, actually I'm looking for the explanation, so I'll just mention some steps to reproduce the issue. Hopefully someone will be able to understand and elaborate:

  1. Python 3.5.0 on Windows 8.1. (However this should be reproducible regardless of Python and Windows version.)
  2. Having Persian standard keyboard Installed. (It can be downloaded from here. Again I'm sure the problem is not limited to this specific keyboard and there are some characters in some other languages that have the same problem. Just for the sake of reproducibility. )
  3. Open IDLE, set the keyboard's layout to Persian and type some characters.
  4. For some characters like 'آ' (Shift+h). They are typed perfectly fine.
  5. For some other characters like 'ی' (d). They are converted to a similar character, in this case 'ي' (notice the small dots under the glyph).
  6. There are some characters that can't be typed. For example '﷼' (Shift+4). These are typed as '?' in IDLE.
  7. All the above characters can be typed in almost any other program that I have installed. One of the simplest ones being notepad.exe.
  8. We can type the same characters in another program e.g. notepad.exe and then copy and paste them into IDLE. This shows that IDLE supports unicode characters, just can't type them.

I'm a fan of IDLE. It's lightweight IDE that is shipped with the standard Python installation and I don't want to switch to another IDE just because of this. But the above is the most annoying thing about IDLE for me. Whenever I need to write a program with some Persian characters in it, I can't trust IDLE to type them correctly and I have to open some other program and use the copy-paste method.

What I'm looking for is:

  • Why this happens? Where is the problem?
  • Are there any workarounds?
  • Any documentation or bug reports directly related to this issue.

Also this information may be helpful:

>>> import locale
>>> locale.getdefaultlocale()
('en_US', 'cp1256')
>>> locale.getpreferredencoding()
'cp1256'
>>> locale.getlocale()
('English_United States', '1252')
>>> 
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'

Thanks.

Update:

Please see the first three comments below. It seems that this issue is caused by usage of WindowsBestFit mappings while typing in tkinter apps.

To test whether it's some bad configuration in python/tkinter bindings or tcl/tk itself, I downloaded and installed Tkabber. It's an application written in Tcl/Tk. Well, the exact same problem exists there i.e. I can't type the above characters but can copy and paste them. So my conclusion is that the root of the problem lies in tcl/tk itself and not IDLE/Python/tkinter.

My questions still hold.

Rausch answered 6/12, 2015 at 10:7 Comment(5)
IDLE has almost nothing to do with characters entered into the tk Text widget from the keyboard. After characters are entered, what you see is determined by tk and the font you specify I use Lucida Console. What about you? A font may use the same glyph for more than one character, as in point 5. Or it may substitute a 'cannot display' symbol, as in point 6. Lucida console uses boxes ࣉ rather ?. (But on tk, the hex digits in the box are not visible.) To see what is actually entered into Python, >>> hex(ord('X')), where X is your entry. What values do you get for your entries?Mantra
@TerryJanReedy My IDLE's font is set to Courier New. But this does not seem to be a font issue. if it was then even copy-and-paste method should have resulted to same thing as typing, shouldn't it? Anyway, to make sure, if I store the values resulted from typing and the values resulted from copy-paste into variable c and use the c.encode('unicode_escape'): In step five what I'm trying to type is ‍b'\\u06cc' (copy-paste), but what I get by typing it directly into IDLE is b'\\u064a'. In step 6: b'\\ufdfc' is converted to b'?'.Rausch
The conversion from b'\\u06cc' to b'\\u064a' is according to Windows best fit mapping table. IDLE is definitely dependent on WindowsBestFit somewhere. But where exactly? I don't know,,,Rausch
Changing between Courier New and Lucida Console make a difference in the glyph displayed for Arabic characters. Whether the Courier versions as an accurate alternative or not I cannot say since I do not know what they should look like. irdb seems to have found an answer below.Mantra
Actually Courier New supports Arabic but Lucida Console does not. (If you follow the links you can see that both U+06CC and U+064A exists in the former but not in the latter). However even if I change the font to Lucida (that does not include the Arabic glyphs required) there seems to be a font fallback mechanism involved such that it shows the Arabic parts with some other fancy font.Rausch
R
6

After some searching I found this ticket on Tk's bug tracker. That pretty much explains what's happening behind the scene. TCL/TK is internally using codepages to translate keyboard input to UTF-8.

Unfortunately there has been no activity around this bug since 2014-09-18 which is a sad thing. The bug has a huge impact on many languages, both those that have a Windows codepage (listed here) and even more on many others that don't have any codepage associated with them (like Bengali).

IMO, this should have been one of the highest priorities of TCL/TK development team. At its current state, users should not rely on Tcl/Tk for applications that require Unicode input support on Windows.

Rausch answered 7/12, 2015 at 7:47 Comment(4)
Thank you for tracking this down. The answer is a disappointment, but better than none at all. I am thinking about how to document this limitation on unicode entry on Windows.Mantra
Thanks Terry, it is greatly appreciated. At least users will know what to expect and hopefully there will be less confusion.Rausch
Speaking as one of the team in question, there's a little bit too much accusation round here. There's plenty of other pressing tasks too. But I'm not a Windows developer; that message-loop-level stuff is out of my expertise set and I don't have a development environment set up to poke around.Alfonzoalford
I've boosted the priority on the bug; the main problem is that I'm not quite sure who's the best developer to assign this to.Alfonzoalford

© 2022 - 2024 — McMap. All rights reserved.