How to configure remark.js to parse HTML embedded in Markdown?
Asked Answered
T

1

6

I am using remark to get an AST for a Markdown document that includes HTML tags. When I run this:

const remark = require('remark')
const result = remark.parse('<h1>First</h1>')
console.log(JSON.stringify(result, null, 2))

I get an AST that includes the level-1 heading:

{
  "type": "root",
  "children": [
    {
      "type": "heading",
      "depth": 1,
      "children": [
        {
          "type": "text",
          "value": "Title",
          "position": {
            "start": {
              "line": 1,
              "column": 3,
              "offset": 2
            },
            "end": {
              "line": 1,
              "column": 8,
              "offset": 7
            }
          }
        }
      ],
      "position": {
        "start": {
          "line": 1,
          "column": 1,
          "offset": 0
        },
        "end": {
          "line": 1,
          "column": 8,
          "offset": 7
        }
      }
    },
    {
      "type": "paragraph",
      "children": [
        {
          "type": "text",
          "value": "body",
          "position": {
            "start": {
              "line": 2,
              "column": 1,
              "offset": 8
            },
            "end": {
              "line": 2,
              "column": 5,
              "offset": 12
            }
          }
        }
      ],
      "position": {
        "start": {
          "line": 2,
          "column": 1,
          "offset": 8
        },
        "end": {
          "line": 2,
          "column": 5,
          "offset": 12
        }
      }
    }
  ],
  "position": {
    "start": {
      "line": 1,
      "column": 1,
      "offset": 0
    },
    "end": {
      "line": 2,
      "column": 5,
      "offset": 12
    }
  }
}

But if I use an explicit h1 tag instead:

const remark = require('remark')
const result = remark.parse('<h1>Title</h1>\nbody') # <- note change
console.log(JSON.stringify(result, null, 2))

I get a node of type html containing the text of the tag and its contents:

{
  "type": "root",
  "children": [
    {
      "type": "html",
      "value": "<h1>Title</h1>\nbody",
      "position": {
        "start": {
          "line": 1,
          "column": 1,
          "offset": 0
        },
        "end": {
          "line": 2,
          "column": 5,
          "offset": 19
        }
      }
    }
  ],
  "position": {
    "start": {
      "line": 1,
      "column": 1,
      "offset": 0
    },
    "end": {
      "line": 2,
      "column": 5,
      "offset": 19
    }
  }
}

I would like to get the same AST in the second case as I do in the first, i.e., I would like remark to parse the HTML. I expected it to do this by default, since Markdown is allowed to contain HTML; if this is enabled by a parser configuration option, I haven't been able to find it. Pointers would be very welcome.

Tuberculous answered 28/10, 2020 at 12:38 Comment(1)
How does it work if you separate the heading from the body with a blank line (<h1>Title</h1>\n\nbody <= note the two line breaks). Just curious if that makes a difference as it does for some implementations as I'm not real familiar with remark specifically.Ardeliaardelis
B
6

Perhaps what you want to use is the rehype-raw plugin. It allows you to parse the embedded HTML in markdown. Check a related discussion here.

Babi answered 8/3, 2021 at 22:33 Comment(1)
This is the answer but, sadly, this question appears to be abandoned.Jaffe

© 2022 - 2024 — McMap. All rights reserved.