Syntax Highlighting: How Does Eclipse Do It So Fast?
Asked Answered
D

1

9

I've developed a syntax highlighter in Java for Android and it's working well, but the problem is it can be slow with big files.

So I'm wondering how source code editors like Eclipse and Gedit (Ubuntu) highlight what you've just wrote so quickly. For example, if you enter the ending greater than symbol when writing a HTML tag, it highlights the tag instantly.

How is it so quick, even with big files? Is there a specific way they go about doing it or do they just perform the syntax highlighting for the line you're on?

Thanks, Alex

Disused answered 30/8, 2011 at 12:50 Comment(1)
I'd imagine that it stores and highlights only whatever you see in the viewport, not the invisible remnant of the code.Vermin
C
34

I cannot talk for Gedit, but in Eclipse, we cheat :-)

If you look very carefully, you can actually see that syntax coloring for structured languages like Java is a two-phase process.

First, a presentation reconciler is run to do very basic syntax coloring. This is done immediately triggered on changes in the document of the editor and is expected to be extremely fast. It is really not syntax-based coloring, but actually lexically-based coloring. So the focus is on tokens like strings, keywords, words, numbers, comments, etc - all tokens that can be recognized easily based on simple character tables or similar. Thus there are no difference between a class name, a variable name or a static method name, even though they may be colored different in the end. For many languages, this is the only coloring done.

Next, a syntax reconciler is run to build an abstract syntax tree (AST) for the document - or as near as you can get in the face of syntax errors or semantic errors. This is triggered by a timer and for some languages an attempt is made to just do a partial update of the AST (not easy). The completed AST is then used to update the outline view and then do additional syntax coloring based on the additional information - e.g. static method name. (The AST is often used for many other things, like hover information, folding, hyperlinking, etc.

Both for the initial presentation reconciler and the later syntax based reconciler, some rather elaborate logic determines just how big a region of the document that must be parsed. For the presentation reconciler the decision can be based on any existing coloring, whereas for the syntax based coloring a separate damage/repair phase in run to determine the size of the region.

Some extreme examples that always complicate matters are when block comments are added or removed

a = b /* c + 1 /* remember the offset! */;

If the first slash is removed or added, the presentation reconciler must process a larger area, than what can be naively expected...

Cello answered 30/8, 2011 at 13:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.