How to highlight all R function names with highlight.js?
Asked Answered
T

1

7

I want to extend highlight.js capabilities for R language so that (1) all function names that are followed by opening parenthesis ( and (2) all package names that are followed by :: and ::: operators would be highlighted (as it is in RStudio, see Fig.1.). Parentheses (, ) and the operators ::, ::: should not be highlighted.

Fig.1. Desired highlighting. Fig.1. Desired highlighting of R code parts (function and package names).

My example consists of two files: index.html and r.min.js.

HTML file:

<html lang="en-us">
<head> <meta charset="utf-8">
    <link href='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/agate.min.css' rel='stylesheet' type='text/css' />
</head>

<body>

<pre class="r"><code>doc_name &lt;-
    officer::read_docx() %&gt;% 
    flextable:::body_add_flextable(table_to_save) %&gt;% 
    print(target = &quot;word.docx&quot;)

.libPaths()

c("a", "b")

package::function()$field
</code></pre> 

<script src="https://cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/highlight.min.js"></script>
<script src="r.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>

</body>
</html>

r.min.js file:

hljs.registerLanguage("r",function(e){var r="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";return{c:[e.HCM,{b:r,l:r,k:{keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass ...",literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},r:0},{cN:"number",b:"0[xX][0-9a-fA-F]+[Li]?\\b",r:0},{cN:"number",b:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",r:0},{cN:"number",b:"\\d+\\.(?!\\d)(?:i\\b)?",r:0},{cN:"number",b:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{b:"`",e:"`",r:0},{cN:"string",c:[e.BE],v:[{b:'"',e:'"'},{b:"'",e:"'"}]},

/* My attempt... */
/* ... to highlight function names between double 
and triple colons and opening parenthesis (in red as symbol): */
{cN:"symbol",b:":::|::",e:"\\(",eB:!0,eE:!0},

/* ... to highlight other function names (in red as symbol): */
{cN:"symbol",  b:"([a-zA-Z]|\.[a-zA-Z.])[a-zA-Z0-9._]*",e:"\\(",eE:!0},

/* ... to highlight package names (in cyan as variable): */
{cN:"variable",b:"(?<!\w)",e:":::|::",eE:!0},

]}});

r.min.js is based on (this file) and contains highlight.js rules to identify r code elements. The lines I added are below the comment "My attempt." Meanings of the abbreviations: cN - css class name, b - "beggins", e - "ends", eB - "exclude begin", eE - "exclude end", other meanings are explained here.

The result I get (Fig.2.) is not satisfactory. It seems that regular expressions I use do not find the correct beginnings and ends of desired parts of the R code.

Fig.2. The result using modified <code>r.min.js</code>
Fig.2. The result using modified r.min.js

What should be the correct highlight.js code in r.min.js to get the parts of R code highlighted as in RStudio?

Trici answered 30/6, 2018 at 0:51 Comment(1)
This doesn't answer your question, but an alternative approach is to use RMarkdown and get the highlighting done by R, not by highlight.js. The highlighter there uses the R parser, not regular expression matching, so it is guaranteed to match the R language.Bombay
S
3

Sounds like a worthwhile improvement, so I tinkered for a while with it.

This should be fairly easy,

A regex to capture the package name prefixes could be written like this (demo):

\w+(?=:::?)

and for function names like this (demo):

\.?\w+(?=\()

unfortunately, it is not so easily applied to highlight.js language parsing rules.

After some back and trail and error, I settled with the following code that gives a pretty consistent highlighting:

/* ... to highlight other function names (in orange as a keyword): */
{
    cN: "keyword",
    b: /(^|\s*)(:::?|\.)\w+(?=\(|$)/
},
/* ... to highlight package names (in red as meta): */
{
    cN: "meta",
    b: /(^|\s*)\w+(?=:::?|$)/,
    r: 0
},
  • I use the cN|className keyword for functions this is what it is and it interferes less with the predefined style for functions.
  • The same goes for packages names where I suggest to use the cN meta. This is what other packages use for similar constructs, and again, it gives a more consistent result for built-in styles, e.g. numbers.
  • I've also added print and c to the list of keywords. The list for the R language is obviously somewhat incomplete. Arguably every function name (even from 3rd party packages) should be added as a keyword - this is how some other languages do it - but that's not very practical).

This is what I get.

Sample Code:

hljs.registerLanguage("r",function(e){var r="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";return{c:[e.HCM,{b:r,l:r,k:{keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass c print ...",literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},r:0},{cN:"number",b:"0[xX][0-9a-fA-F]+[Li]?\\b",r:0},{cN:"number",b:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",r:0},{cN:"number",b:"\\d+\\.(?!\\d)(?:i\\b)?",r:0},{cN:"number",b:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{b:"`",e:"`",r:0},{cN:"string",c:[e.BE],v:[{b:'"',e:'"'},{b:"'",e:"'"}]},
{cN: "keyword", b: /(^|\s*)(:::?|\.)\w+(?=\(|$)/},
{cN: "meta",b: /(^|\s*)\w+(?=:::?|$)/,r: 0 }, ]}});

hljs.initHighlightingOnLoad();
<html lang="en-us">
<head> <meta charset="utf-8"><link href='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/agate.min.css' rel='stylesheet' type='text/css' />
</head><body>

    <pre class="r"><code>library(officer)
doc_name &lt;-
    officer::read_docx() %&gt;% 
    flextable:::body_add_flextable(table_to_save) %&gt;% 
    print(target = &quot;word.docx&quot;)

.libPaths()
x = 4
c("a", "b")

package::function()$field
</code></pre>
<script src="https://cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/highlight.min.js"></script>

</body></html>

Pretty close, but far from being perfect. The main hurdle here is that I struggle to fully understand how the parser interprets the patterns. Some of the results simply make no sense to me but still work.

Skyscape answered 1/7, 2018 at 2:49 Comment(3)
I believe the suggested vanilla regex solution does not work because of issues with lookaheads in highlight.js. There is even an open pull request that address this issue that never made it into the source.Skyscape
PS: I would look into other js highlighters if this becomes a major obstacle. For instance, rainbow.js does a good job highlighting R and looks easier to adjust; setting colors is mostly a matter of identifying the detected style classes and set colors in the theme.css; screenshot.Skyscape
It's a pity that highlight.js has problems with lookaheads. Nevertheless, @wp78de, your answer, and the comments were really helpful.Trici

© 2022 - 2024 — McMap. All rights reserved.