I am trying to query and display utf-8 encoded characters in a gui built on tkinter and thus tcl. However, I have found that tkinter cannot display 4-byte characters i.e. unicode codepoints greater than U+FFFF. Why is this the case? What limitations would implementing non-BMP characters have for tcl?
I can't query non-BMP characters through my gui, but if they come up in a result I can copy/paste the character and see the character/codepoint through unicode-table.com despite my system not displaying it. So, it seems that the character is being displayed as codepoint U+FFFD but stored in the view with the correct codepoint.
I am running a Python 3.6.4 script on Windows 7.
Update: Here is the error I get for some context where the 4-byte unicode codepoint is out of range of BMP characters and can't be handled by Tcl
File "Project/userInterface.py", line 569, in populate_tree
iids.append(self.detailtree.insert('', 'end', values=entry))
File "C:\Program Files (x86)\Python36-32\Lib\tkinter\ttk.py", line 1343, in insert
res = self.tk.call(self._w, "insert", parent, index, *opts)
_tkinter.TclError: character U+1f624 is above the range (U+0000-U+FFFF) allowed by Tcl
I handle this by using regular expressions to substitute out of range unicode characters with the replacement character.
for item in entries:
#handles unicode characters that are greator than 3 bytes as tkinter/tcl cannot handle/display them
entry = list(item)
for i, col in enumerate(entry):
if col and isinstance(col, str):
re_pattern = re.compile(u'[^\u0000-\uD7FF\uE000-\uFFFF]', re.UNICODE)
filtered_string = re_pattern.sub(u'\uFFFD', col) #replaces \u1000 and greater with the unknow character
if filtered_string != col:
entry[i] = filtered_string
entry = tuple(entry)
iids.append(self.detailtree.insert('', 'end', values=entry))
'surrogatepass'
error handler. I don't think this is possible with UTF-8 in Unix. For example:title_bytes = '\U0001F60A'.encode('utf-16le');
title = ''.join(title_bytes[n:n+2].decode('utf-16le', 'surrogatepass') for n in range(0, len(title_bytes), 2));
root = Tk();
root.title(title);
root.mainloop()
. – Corelli