Why does tcl/tkinter only support BMP characters?

I am trying to query and display utf-8 encoded characters in a gui built on tkinter and thus tcl. However, I have found that tkinter cannot display 4-byte characters i.e. unicode codepoints greater than U+FFFF. Why is this the case? What limitations would implementing non-BMP characters have for tcl?

I can't query non-BMP characters through my gui, but if they come up in a result I can copy/paste the character and see the character/codepoint through unicode-table.com despite my system not displaying it. So, it seems that the character is being displayed as codepoint U+FFFD but stored in the view with the correct codepoint.

I am running a Python 3.6.4 script on Windows 7.

Update: Here is the error I get for some context where the 4-byte unicode codepoint is out of range of BMP characters and can't be handled by Tcl

 File "Project/userInterface.py", line 569, in populate_tree
    iids.append(self.detailtree.insert('', 'end', values=entry))
  File "C:\Program Files (x86)\Python36-32\Lib\tkinter\ttk.py", line 1343, in insert
    res = self.tk.call(self._w, "insert", parent, index, *opts)
_tkinter.TclError: character U+1f624 is above the range (U+0000-U+FFFF) allowed by Tcl

I handle this by using regular expressions to substitute out of range unicode characters with the replacement character.

  for item in entries:
        #handles unicode characters that are greator than 3 bytes as tkinter/tcl cannot handle/display them
        entry = list(item)
        for i, col in enumerate(entry):
            if col and isinstance(col, str):
                re_pattern = re.compile(u'[^\u0000-\uD7FF\uE000-\uFFFF]', re.UNICODE)
                filtered_string = re_pattern.sub(u'\uFFFD', col) #replaces \u1000 and greater with the unknow character
                if filtered_string != col:
                    entry[i] = filtered_string
        entry = tuple(entry)
        iids.append(self.detailtree.insert('', 'end', values=entry))

Recommended topics

Hot tags