I'm masochistically writing an open-source text editor for Mac and have finally reached the point at which I want to add syntax highlighting. I've been going back and forth on various solutions for the past few days, and I've finally decided to open the question to a wider audience.
Here are the options I see:
- Define languages basically with a series of regex pattern matching (similar to how TextMate defines its languages)
- Define languages with a formal grammar like BNF or PEG
Using regex pattern matching seems less than ideal as it cannot formally represent a language nearly as well as a formal grammar; however, some less formal languages will have a hard time fitting into BNF (i.e. Markdown -- though I know there's a great PEG implementation).
What are the performance tradeoffs for live syntax highlighting? What about flexibility for a wide range of languages?
If I go the BNF route, Todd Ditchendorf created the awesome ParseKit framework which would work nicely out-of-the-box. Anyone know of any anything similar for PEG's?