Overview
I’m trying to add markdown syntax highlighting in a text editor for my project, but I am having some issues with making it user proof so to speak, while being performance friendly
Basically, I'm after this–from Visual Studio Code's markdown:
I’m talking about simple highlighting of bold, italic, lists, etc. to indicate the style that will be applied when the user previews their markdown file.
My Solution
I originally set up this method for my project (simplified for the question and using colours to make the styles clearer for debugging)
import re
import tkinter
root = tkinter.Tk()
root.title("Markdown Text Editor")
editor = tkinter.Text(root)
editor.pack()
# bind each key Release to the markdown checker function
editor.bind("<KeyRelease>", lambda event : check_markdown(editor.index('insert').split(".")[0]))
# configure markdown styles
editor.tag_config("bold", foreground = "#FF0000") # red for debugging clarity
editor.tag_config("italic", foreground = "#00FF00") # green for debugging clarity
editor.tag_config("bold-italic", foreground = "#0000FF") # blue for debugging clarity
# regex expressions and empty tag legnth
search_expressions = {
# <tag name> <regex expression> <empty tag size>
"italic" : ["\*(.*?)\*", 2],
"bold" : ["\*\*(.*?)\*\*", 4],
"bold-italic" : ["\*\*\*(.*?)\*\*\*", 6],
}
def check_markdown(current_line):
# loop through each tag with the matching regex expression
for tag, expression in search_expressions.items():
# start and end indices for the seach area
start_index, end_index = f"{current_line}.0", f"{current_line}.end"
# remove all tag instances
editor.tag_remove(tag, start_index, end_index)
# while there is still text to search
while 1:
length = tkinter.IntVar()
# get the index of 'tag' that matches 'expression' on the 'current_line'
index = editor.search(expression[0], start_index, count = length, stopindex = end_index, regexp = True)
# break if the expression was not met on the current line
if not index:
break
# else is this tag empty ('**' <- empty italic)
elif length.get() != expression[1]:
# apply the tag to the markdown syntax
editor.tag_add(tag, index, f"{index}+{length.get()}c")
# continue searching after the markdown
start_index = index + f"+{length.get()}c"
# update the display - stops program freezing
root.update_idletasks()
continue
continue
return
root.mainloop()
I reasoned that by removing all formatting each KeyRelease and then rescanning the current line, it reduces the amount of syntax being misinterpreted like bold-italic as bold or italic, and tags stacking on top of each other. This works well for a few sentences on a single line, but if the user types lots of text on one line, the performance drops fast, with long waits for the styles to be applied - especially when lots of different markdown syntax is involved.
I used Visual Studio Code's markdown language highlighting as a comparison, and it could handle far more syntax on a single line before it removed the highlighting for "performance reasons".
I understand this is an extremely large amount of looping to be doing every keyReleaee, but I found the alternatives to be vastly more complicated, while not really improving the performance.
Alternative Solutions
I thought, let’s decrease the load. I’ve tested checking every time the user types markdown syntax like asterisks and m-dashes, and doing validation on any tag that has been edited (key release within a tags range). but there are so many variables to consider with the users input– like when text is pasted into the editor, as it is difficult to determine what the effects of certain syntax combinations could have on the surrounding documents markdown–these would need to be checked and validated.
Is there some better and more intuitive method to highlight markdown that I haven’t thought of yet? is there a way to drastically speed up my original idea? Or is python and Tkinter simply not able to do what I’m trying to do fast enough.
Thanks in advance.