emacs: is there a clear example of multi-line font-locking?

Asked 26/2, 2012 at 11:17 Answered 6/3, 2013 at 5:32

Some background, I'm comfortable with Emacs Lisp, and have written lots of lines of it. However I've never written a major mode, so I'm fairly new to how the font-locking mechanism works.

For my current project, I'd like to add inlined javascript and css highlighting to html-mode. Currently, I do this with MMM-mode, but it's bulky and I don't use other features of it, so I'd just like to make a minor-mode or even just a hack that I can add to the sgml-mode-hook to just do the highlighting.

I've found this section of the manual, which sorely lacks an example, and this emacswiki page of broken code.

Can someone show me a clear example of how this is can be done?

EDIT: I should clarify that I don't want to see mode-specific font-locking within the javascript/css chunks. The only requirement is that I'm able to see the chunks by applying a different face to them.

Winnow answered 26/2, 2012 at 11:17 Comment(0)

I'll outline a simple major mode for highlighting <style> (CSS) and <script> (JavaScript, etc.) blocks. To get multiline font lock working reasonably well, you'll need to first enable it by setting font-lock-multiline to t and write a function to add to font-lock-extend-region-functions which will extend the relevant search region to contain larger blocks of text. Then, you'll need to write multiline matchers—either regular expressions or functions—and add them to font-lock-defaults.

Here's a basic major mode definition that names the font lock keywords list (here, test-font-lock-keywords), enables multiline font lock, and adds the region extension function test-font-lock-extend-region.

(define-derived-mode test-mode html-mode "Test"
  "Major mode for highlighting JavaScript and CSS blocks."
  ;; Basic font lock
  (set (make-local-variable 'font-lock-defaults)
       '(test-font-lock-keywords))
  ;; Multiline font lock
  (set (make-local-variable 'font-lock-multiline) t)
  (add-hook 'font-lock-extend-region-functions
            'test-font-lock-extend-region))

The region extension function should look something like this:

(defun test-font-lock-extend-region ()
  "Extend the search region to include an entire block of text."
  ;; Avoid compiler warnings about these global variables from font-lock.el.
  ;; See the documentation for variable `font-lock-extend-region-functions'.
  (eval-when-compile (defvar font-lock-beg) (defvar font-lock-end))
  (save-excursion
    (goto-char font-lock-beg)
    (let ((found (or (re-search-backward "\n\n" nil t) (point-min))))
      (goto-char font-lock-end)
      (when (re-search-forward "\n\n" nil t)
        (beginning-of-line)
        (setq font-lock-end (point)))
      (setq font-lock-beg found))))

This function looks at the global variables font-lock-beg and font-lock-end, which contain the starting and ending positions of the search region, and extends the region to contain an entire block of text (separated by blank lines, or "\n\n").

Now that Emacs will be searching for matches in larger regions, we need to set up the test-font-lock-keywords list. There are two reasonably good ways to go about matching multiline constructs: a regular expression which will match across lines and a matching function. I'll give examples of both. This keyword list contains a regular expression for matching <style> blocks and a function for matching <script> blocks:

(defvar test-font-lock-keywords
  (list
   (cons test-style-block-regexp 'font-lock-string-face)
   (cons 'test-match-script-blocks '((0 font-lock-keyword-face)))
   )
  "Font lock keywords for inline JavaScript and CSS blocks.")

The first item in the list is straightforward: a regular expression and a face for highlighting matches of that regular expression. The second looks a bit more complicated, but can be generalized to specify different faces for different groups defined in the match data specified by the function. Here, we just highlight group zero (the entire match) using font-lock-keyword-face. (The relevant documentation for these matchers is in the Search-based fontification section of the Emacs manual.)

A basic regular expression for matching <style> blocks would be:

(defconst test-style-block-regexp
  "<style>\\(.\\|\n\\)*</style>"
  "Regular expression for matching inline CSS blocks.")

Note that we have to put \n in the inner group because . does not match newlines.

The matching function, on the other hand, needs to look for the first <script> block in the region from the point to the single given argument, last:

(defun test-match-script-blocks (last)
  "Match JavaScript blocks from the point to LAST."
  (cond ((search-forward "<script" last t)
         (let ((beg (match-beginning 0)))
           (cond ((search-forward-regexp "</script>" last t)
                  (set-match-data (list beg (point)))
                  t)
                 (t nil))))
        (t nil)))

This function sets the match data, which is a list of the form begin-0 end-0 begin-1 end-1 ... giving the beginning and end of the zeroth group, first group, and so on. Here, we only give bounds on the entire block that was matched, but you could do something more sophisticated, such as setting different faces for the tags and the contents.

If you combine all of this code into a single file and run M-x test-mode, it should work for highlighting these two types of blocks. While I believe this does the job, if there is a more efficient or proper way of going about it, I'd also be curious to know as well.

Denazify answered 6/3, 2013 at 5:32 Comment(1)

font-lock-defaults and font-lock-multiline are automatically buffer-local when set (Emacs 25.3) – Bootlick 3/5, 2018 at 19:40

In the example below, I use the "anchored" form of font-lock keywords, it allows you to search more than the current line. The "trick" is that the "pre" hook do two things: 1) it allows you to position the point to the start of the search and 2) it allows you to limit the search by returning the end-position. In the example below, I have used the second property.

Note that this is only a proof-of-concept. You will need to make sure that the font-lock-multiline variable and the font-lock keywords are applied to the correct buffer.

(defun my-end-of-paragraph-position (&rest foo)
  "Return position of next empty line."
  (save-excursion
    (while (not (or (eobp)             ; Stop at end of buffer.
                    (and (bolp)        ; Or at an empty line.
                         (eolp))))
      (forward-line))
    (point)))

(setq font-lock-multiline t)

(font-lock-add-keywords nil
                        '(("^FOO"
                           (0 font-lock-keyword-face)  ;; Face for FOO
                           ("BAR"
                            (my-end-of-paragraph-position)
                            nil
                            (0 font-lock-variable-name-face)))))

Below, the first two lines of BAR will be highlighted, but not the last:

FOO BAR BAR BAR BAR
BAR BAR BAR BAR

BAR BAR BAR BAR

Hartshorn answered 26/2, 2012 at 20:15 Comment(5)

In this thread, Stefan Monnier says that font-lock-multi-line is appropriate for things that are usually single-line, but occasionally multi-line. Comments? – Winnow 4/3, 2012 at 17:25

Well, Stefan Monnier is the definitive reference when it comes to anything font-lock related, so any advice from him is well-worth following. Using the syntactic system, as he suggests, is one way to do it. Another is to use a "match function" rather than a regexp, which could match the entire block. You should see my answers as a direct answer to the question (provide an example of the multi-line font-lock feature) and maybe not an answer on best way to accomplish whatever the original poster asked about. – Hartshorn 4/3, 2012 at 17:43

Anchoring might be an unreliable solution for multi-line font locking. Current versions of the manual warn, ‘It is generally a bad idea to return a position greater than the end of the line’ — as your pre-form function my-end-of-paragraph-position does — ‘in other words, the anchored-matcher search should not span lines.’ – Middling 31/5, 2019 at 15:48

@MichaelAllan, I guess that that is a bug in the manual. Font-lock contains code that specifically handles multi-line anchored matches (when font-lock-multiline is non-nil) see font-lock-fontify-anchored-keywords. However, one need to take great care to handle this correct -- in particular when font-lock rehighlights a region, it should be expanded to include all relevant lines, see font-lock-extend-region-functions. – Hartshorn 31/5, 2019 at 18:42

A bug in the manual, I think you’re right. It does work for me. – Middling 9/6, 2019 at 8:45

It's probably not the best possible example, but you can look at how haml-mode has solved the problem of syntax highlighting in submode regions. Here's the blog post with a high-level description.

Note that the current haml-mode has some problems with Emacs 24 compatibility, but a couple of forks have fixes for this.

Regarding multiline font-locking, I think you may be asking the wrong question. But basically, this solves the problem of what to do if the user has made an edit in the middle or the end of a multiline syntactic construct. Initially, font-lock starts refontifying the buffer from the position of the point. The two default font-lock-extend-region-functions, font-lock-extend-region-wholelines and font-lock-extend-region-multiline, move the start of the refontification region to the beginning of the line, and then maybe somewhere even further, depending on the font-lock-multiline property. If you need it to move further up, you either add another function to font-lock-region-functions, or make sure to backtrack programmatically while parsing certain constructs, inside font-lock-region-function or syntax-propertize-function.

One example of the latter approach would be Ruby's heredoc and ruby-syntax-propertize-heredoc in the Emacs trunk. It is called from two places in ruby-syntax-propertize-function. The first time to handle the case when we already are inside of a heredoc literal, and then for any subsequent heredocs.

Upland answered 26/2, 2012 at 20:40 Comment(0)

Recommended topics

Hot tags