Tkinter and 32-bit Unicode duplicating – any fix?
Asked Answered
D

3

7

I only want to show Chip, but I get both Chip AND Dale. It doesn't seem to matter which 32 bit character I put in, tkinter seems to duplicate them - it's not just chipmunks.

I'm thinking that I may have to render them to png and then place them as images, but that seems a bit ... heavy-handed.

Any other solutions? Is tkinter planning on fixing this?

import tkinter as tk

# Python 3.8.3
class Application(tk.Frame):
    def __init__(self, master=None):
        self.canvas = None
        self.quit_button = None
        tk.Frame.__init__(self, master)
        self.grid()
        self.create_widgets()

    def create_widgets(self):
        self.canvas = tk.Canvas(self, width=500, height=420, bg='yellow')
        self.canvas.create_text(250, 200, font="* 180", text='\U0001F43F')
        self.canvas.grid()

        self.quit_button = tk.Button(self, text='Quit', command=self.quit)
        self.quit_button.grid()

app = Application()
app.master.title('Emoji')
app.mainloop()

Chip and Dale on Mac OS

  • Apparently this works fine on Windows - so maybe it’s a MacOS issue.
  • I've run it on two separate Mac - both of them on the latest OS Catalina 10.15.5 - and both show the problem
  • The bug shows with the standard Python installer from python.org - Python 3.8.3 with Tcl/Tk 8.6.8
  • Supposedly it might be fixed with Tcl/Tk 8.6.10 - but I don't really see how I can upgrade Tcl/Tk using the normal installer.
  • This is also reported as a bug cf. https://bugs.python.org/issue41212

One of the python contributors believes that TCL/Tk can-not/will-not support variable width encoding (it always internally converts fixed width encoding) which indicates to me that Tcl/Tk is not suitable for general UTF-8 development.

Dottie answered 3/7, 2020 at 10:42 Comment(10)
Couldn't reproduce your issue.It will show me only one squirrel on my PC.Glyceride
I'm on macOS, and it is working as expected. What TK/TCL version are you on? Try text='Hello' and check if it displays "Hello" twice.Tilney
How did you install python? I installed it using anaconda and update Tcl/Tk with conda install -c conda-forge tk and conda install -c conda-forge/label/cf201901 tk commands.Tilney
Did you try with the equivalent \uD83D\uDC3F?Nephrosis
@Jerry, yeah: "UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed"Dottie
How about use PIL to generate this emoji,and use PIL.ImageTk to put it on the canvas?Belong
@jizhihaoSAMA, yes that may work for individual emoji, but the general problem is more severe, so currently we are assessing PyQt.Dottie
I'm showing the same issue on my Mac (Catalina). Interestingly if you put two spaces in front of it, it works correctly (text=' \U0001F43F'), but this loophole doesn't work for more complex emoji that are more than one 32-bit codepoint, like 🏴󠁧󠁢󠁳󠁣󠁴󠁿Mirilla
@Dottie could you clarify why the UTF-16 solution is not applicable to your situation ?Exile
@Space, yes - it doesn't really resolve the problem - which is that TCL/TK doesn't support astral characters. Secondly, it doesn't work on my machine - I just get a yellow rectangle. If I follow your description, I see the error: "UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed". The best workaround was delivered by jdaz, but it's just a workaround as he points out. The bug report messages follow the same analysis as Donal.Dottie
F
7

The fundamental problem is that Tcl and Tk are not very happy with non-BMP (Unicode Basic Multilingual Plane) characters. Prior to 8.6.10, what happens is anyone's guess; the implementation simply assumed such characters didn't exist and was known to be buggy when they actually turned up (there's several tickets on various aspects of this). 8.7 will have stronger fixes in place (see TIP #389 for the details) — the basic aim is that if you feed non-BMP characters in, they can be got out at the other side so they can be written to a UTF-8 file or displayed by Tk if the font engine deigns to support them — but some operations will still be wrong as the string implementation will still be using surrogates. 9.0 will fix things properly (by changing the fundamental character storage unit to be large enough to accommodate any Unicode codepoint) but that's a disruptive change.

With released versions, if you can get the surrogates over the wall from Python to Tcl, they'll probably end up in the GUI engine which might do the right thing. In some cases (not including any build I've currently got, FWIW, but I've got strange builds so don't read very much into that). With 8.7, sending over UTF-8 will be able to work; that's part of the functionality profile that will be guaranteed. (The encoding functions exist in older versions, but with 8.6 releases they will do the wrong thing with non-BMP UTF-8 and break weirdly with older versions than that.)

Friesian answered 7/7, 2020 at 15:58 Comment(5)
This was going to be a comment as it doesn't offer any workaround, but it's way too long!Friesian
that's very interesting. We are probably going to 'jump ship' because of this issue. It's a real concern that tk hasn't been able to keep up with the times on something so fundamental. Much that I would prefer to use a solution baked into Python, I'm guessing we will probably migrate to PyQt (and no doubt have a load of other issues with that).Dottie
Tk has always been slow to update, in part because producing cross platform changes is difficult. The last time I did significant work on it, I had reimplement the platform-specific part 4 times and fix weird bugs on three of them. (Font rendering engines are ghastly.) The UTF-8 fixes have been slow because they weren't very pressing for most developers; as a community, the people who write the code must not use a lot of emoji…Friesian
I believe you. There's more to it than just showing emoji though - you probably agree that undefined behaviour often indicates a security threat. Tk was given a huge boost by being the default / baked-in GUI library for python - but a concern I have is that it's slow development may well end up dragging Python down: Active development is so important on these projects - and developers are spread so thin. I guess that we (the OSS dev. community) could stop inventing yet more stuff to maintain - but somehow I doubt that will happen... 🐒Dottie
I rewarded you the bounty because you answered the question best - but also, you deserve some sort of credit for sweating out the long hours developing Tk, albeit in the past.Dottie
A
1

The problem

Several things could have happened:
  • That is what the emoji is. There is no way to fix it, except change the source emoji.
  • Tk and/or Tcl are confused with the emoji. This means that it isn't sure what emoji to put, so it puts 2 chipmunks. When I tried that emoji on my Linux computer, it threw an error.

The solution

The only solution may be to save the emoji as a file, then create an image. But there could be other, slightly more complicated ways. For example, you could create a rectangle of Frame over the second chipmunk to hide it.
Aloes answered 9/7, 2020 at 14:53 Comment(1)
@Mirilla found a workaround for the simple case of a single emoji, but my concern is that there is a general problem, which would not really work for us.Dottie
E
0

As you pointed out, your code works as is on Windows (tested on Windows 10), however for macOS, the following workaround should work:

  1. Convert the encoding of the Emoji from UTF-32 to UTF-16 (No loss of functionality occurs since UTF-16 is a variable length encoding, hence any code point that can be represented in UTF-32 can be converted to UTF-16 only in these case where modern Emojis are involved, the UTF-16 encoded value will use 32 bits, same as UTF-32, meaning it should support Unicode v11 character representation).
  2. Pass the resulting string to the embedded Tcl/Tk interpreter.

UTF-16 Programming with Unicode

In UTF-16, characters in ranges U+0000—U+D7FF and U+E000—U+FFFD are stored as a single 16 >bits unit. Non-BMP characters (range U+10000—U+10FFFF) are stored as “surrogate pairs”, >two 16 bits units: an high surrogate (in range U+D800—U+DBFF) followed by a low surrogate (in range U+DC00—U+DFFF).

For Tcl to perform the substitution of a unicode-escaped string (with its character/emoji representation), the string itself must be of the form "\uXXXX" or "\uXXXX\uXXXX".

The chipmunk Emoji's encoding must be converted to UTF-16 => "\ud83d\udc3f"


    # The tcl/tk code
    set chipmunk "\ud83d\udc3f"
    
    pack [set c [canvas .c -highlightcolor blue -highlightbackground black -background yellow]] -padx 4cm -pady 4cm -expand 1 -fill both
    
    set text_id [$c create text 0 0 -text $chipmunk -font [list * 180]]
    
    $c moveto $text_id 0 0

Unicode chipmunk in Tcl/Tk

The equivalent code in python, will have at some point to bypass tkinter and issue direct tcl commands to the embedded/linked interpreter

import tkinter as tk

# the top-level window
top = tk.Tk()

# the canvas
c = tk.Canvas(top, highlightcolor = 'blue', highlightbackground = 'black', background = 'yellow')

# create the text item, with placeholder text
text_id = c.create_text(0,0, font = '* 180', text = 'to be replaced')

# pack it
c.pack(side = 'top', fill = 'both' , expand = 1, padx = '4c' , pady = '4c')

# The 'Bypassing' aka issuing tcl/tk calls directly
# For Tk calls use => c.tk.cal(...), we will not use this.
# For bare Tcl => c.tk.eval(...)

# chipmunk in UTF-16 (in this instance it is using 32-bits to represent the codepoint)
# as a raw string

chipmunk = r"\ud83d\udc3f"

# create another variable in tcl/tk
c.tk.eval('set the_tcl_chipmunk {}'.format(chipmunk))

# set the text_id item's -text property/option as the value of variable the_tcl_chipmunk, gotten by calling the tcl's set command

c.tk.eval( '{} itemconfig {} -text [set the_tcl_chipmunk]'.format( str(c), text_id ) )

# Apparently a hack to get the chipmunk in position
c.tk.eval( '{} moveto {} 0 0'.format( str(c), text_id ) )

# the main gui event loop
top.mainloop()

Unicode chipmunk in python

Getting the UTF-16 of chipmunk

There are two avenues you could pursue:

  1. Getting it from a website, I use fileformat.info all the time chipmunk on fileformat.info and copy value shown for C/C++/Java source code

  2. Doing the conversion from UTF-32 to UTF-16 in Python


# A UTF-32 string, since it's of the form "\UXXXX_XXXX" ( _ is not part of the syntax, a mere visual aide fo illustrative purposes)
chipmunk_utf_32 = '\U0001F43F'

# convert/encode it to UTF-16 (big endiann), to get a bytes object

chipmunk_utf_16 = chipmunk_utf_32.encode('utf-16-be')

# obtain the hex representation
chipmunk_utf_16 = chipmunk_utf_16.hex()

#format it to be an escaped UTF-16 tcl string
chipmunk = '\\u{}\\u{}'.format(chipmunk_utf_16[0:4], chipmunk_utf_16[4:8])

EDIT: The whole script

import tkinter as tk

# A UTF-32 string, since it's of the form "\UXXXX_XXXX" ( _ is not part of the syntax, a mere visual aide fo illustrative purposes)
chipmunk_utf_32 = '\U0001F43F'

# convert/encode it to UTF-16 (big endiann), to get a bytes object

chipmunk_utf_16 = chipmunk_utf_32.encode('utf-16-be')

# obtain the hex representation
chipmunk_utf_16 = chipmunk_utf_16.hex()

#format it to be an escaped UTF-16 tcl string
chipmunk = '\\u{}\\u{}'.format(chipmunk_utf_16[0:4], chipmunk_utf_16[4:8])

# the top-level window
top = tk.Tk()

# the canvas
c = tk.Canvas(top, highlightcolor = 'blue', highlightbackground = 'black', background = 'yellow')

# create the text item, with placeholder text
text_id = c.create_text(0,0, font = '* 180', text = 'to be replaced')

# pack it
c.pack(side = 'top', fill = 'both' , expand = 1, padx = '4c' , pady = '4c')

# The 'Bypassing' aka issuing tcl/tk calls directly
# For Tk calls use => c.tk.cal(...), we will not use this.
# For bare Tcl => c.tk.eval(...)

# chipmunk in UTF-16 (in this instance it is using 32-bits to represent the codepoint)
# as a raw string

#print(chipmunk)
#chipmunk = r"\ud83d\udc3f"

# create another variable in tcl/tk
c.tk.eval('set the_tcl_chipmunk {}'.format(chipmunk))

# set the text_id item's -text property/option as the value of variable the_tcl_chipmunk, gotten by calling the tcl's set command

c.tk.eval( '{} itemconfig {} -text [set the_tcl_chipmunk]'.format( str(c), text_id ) )

# Apparently a hack to get the chipmunk in position
c.tk.eval( '{} moveto {} 0 0'.format( str(c), text_id ) )

top.mainloop()
Exile answered 12/7, 2020 at 20:29 Comment(10)
Tested on Tcl/Tk 8.6.10 and Python 3.8.3 respectively.Exile
Your Python code does not work for me, it's just a blank canvas. And it shows up as soon as c.tk.eval( '{} itemconfig {} -text [set the_tcl_chipmunk]'.format( str(c), text_id ) ) is runMirilla
the entire script is on EDIT, (I use PyScripter as my IDE but that shouldn't matter). if you're still having problems with a blank (yellow) canvas, could you change padx = 4c to padx = 0 and likewise for padyExile
Still blank when running on my Mac, even with padx = 0 and pady = 0Mirilla
what IDE are you using ?Exile
No IDE, just saving your code and running from command lineMirilla
That's the problem. if you have IDLE (should be installed by default). copy the enitre script from EDIT and save it as a new file (chipmunk.py for instance) from a text editor, then run IDLE, go File -> Open... (select chipmunk.py), you should see a new window open, then from that new window's menu bar go Run -> Run ModuleExile
c.tk.eval( '{} moveto {} 0 0'.format( str(c), text_id ) ) _tkinter.TclError: bad option "moveto": must be addtag, bbox, bind, canvasx, canvasy, cget, configure, coords, create, dchars, delete, dtag, find, focus, gettags, icursor, index, insert, itemcget, itemconfigure, lower, move, postscript, raise, scale, scan, select, type, xview, or yviewMirilla
Ok just comment it, to change the location of the text_id, you could change c.create_text(0,0, font = '* 180', text = 'to be replaced') as appropriateExile
Let us continue this discussion in chat.Exile

© 2022 - 2024 — McMap. All rights reserved.