I am in the middle of transforming a usual full-restart tokenizer (ported from the original language parser, the language is irrelevant here) into a more advanced incremental one. This implies the following:
a) it has to be fast, very fast;
b) upon every text update (be it insertion or deletion) it has to find damaged tokens and repair the token list accordingly.
The original tokenizer version just builds a list of tokens while going through the buffer text using regexps; every token in the list is a vector of 4 elements (['TOKENTYPE "token-lexeme" linum charnum]). Linum/charnum are plain numbers specifying token position in the buffer at the moment lexing was done. Easy pie.
Now, to the point. Whenever (well.. not every time, but often enough) a user adds or deletes a character (or a string) the new tokenizer has to find a token built using the text in a change position, and, probably, dependant tokens for later deletions/updates.
There are two problems here:
a) token positions should be dynamic (i.e. if a user adds some text in the beginning of a buffer -> we shouldn't bother fixing tokens in the buffer end);
b) a way to get to a damaged token(or lots of tokens in general).
Right now I am trying to use overlays for the task as overlays have a nice interface suiting my needs: overlays-at/overlays-in functions help the search; and overlay start/end move in an apropriate way.
I can happily do that for a smaller file. But it turns out (and I have to admit that I was warned by the docs) that the solution does not scale: even an average 1K LOC file can have CONST * LOC overlays, which is just too much for Emacs.
That was my first try, and it wasn't a successful one. I am considering alternatives such as:
1) managing a handwritten token search tree using plain numbers;
2) same tree, but using markers;
3) some kind of a mixed approach including both plain numbers and markers.
Any alternatives to mentioned approaches? Or maybe there's a better way to handle lots of overlays?
Wisent
andsemantic
in emacs. Probably, you already have checked those ones. There is also some highlighting support e.g., `gnu.org/software/emacs/manual/html_mono/…. – Gunpoint