How do I convert a docutils document tree into an HTML string?

from docutils import utils from docutils.frontend import OptionParser from docutils.parsers.rst import Parser # preamble rst = '*NB:* just an example.' # will actually have many sections path = 'some.url.com' settings = OptionParser(components=(Parser,)).get_default_values() # step 1 document = utils.new_document(path, settings) Parser().parse(rst, document) # step 2 for node in document: do_something_with(node) # step 3: Help! for node in filtered(document): print(convert_to_html(node))

My problem was that I was trying to use the docutils package at too low a level. They provide an interface for this sort of thing:

from docutils.core import publish_doctree, publish_from_doctree

rst = '*NB:* just an example.'

# step 1
tree = publish_doctree(rst)

# step 2
# do something with the tree

# step 3
html = publish_from_doctree(tree, writer_name='html').decode()
print(html)

Step one is now much simpler. That said, I'm still slightly dissatisfied with the result; I realise that what I really want is a publish_node function. If you know a better way please do post it.

I should also note that I haven't managed to get this working with Python 3.

The real lesson

What I was actually trying to do was extract all of the sidebar elements from the doctree so they can be handled separately to the main body of the article. This is not the sort of use case that docutils was intended to solve. Hence no publish_node function.

Once I realised this, the correct approach was simple enough:

Generate the HTML using docutils.
Extract the sidebar elements using BeautifulSoup.

Here's the code that got the job done:

from docutils.core import publish_parts
from bs4 import BeautifulSoup

rst = get_rst_string_from_somewhere()

# get just the body of an HTML document 
html = publish_parts(rst, writer_name='html')['html_body']
soup = BeautifulSoup(html, 'html.parser')

# docutils wraps the body in a div with the .document class
# we can just dispose of that div altogether
wrapper = soup.select('.document')[0]
wrapper.unwrap()

# knowing that docutils gives all sidebar elements the
# .sidebar class makes extracting those elements easy
sidebar = ''.join(tag.extract().prettify() for tag in soup.select('.sidebar'))

# leaving the non-sidebar elements as the document body
body = soup.prettify()

The real lesson

Recommended topics

Hot tags