how to document a single space character within a string in reST/Sphinx?
Asked Answered
M

2

4

I've gotten lost in an edge case of sorts. I'm working on a conversion of some old plaintext documentation to reST/Sphinx format, with the intent of outputting to a few formats (including HTML and text) from there. Some of the documented functions are for dealing with bitstrings, and a common case within these is a sentence like the following: Starting character is the blank " " which has the value 0.

I tried writing this as an inline literal the following ways: Starting character is the blank `` `` which has the value 0. or Starting character is the blank :literal:` ` which has the value 0. but there are a few problems with how these end up working:

  1. reST syntax objects to a whitespace immediately inside of the literal, and it doesn't get recognized.
  2. The above can be "fixed"--it looks correct in the HTML () and plaintext (" ") output--with a non-breaking space character inside the literal, but technically this is a lie in our case, and if a user copied this character, they wouldn't be copying what they expect.
  3. The space can be wrapped in regular quotes, which allows the literal to be properly recognized, and while the output in HTML is probably fine (" "), in plaintext it ends up double-quoted as "" "".
  4. In both 2/3 above, if the literal falls on the wrap boundary, the plaintext writer (which uses textwrap) will gladly wrap inside the literal and trim the space because it's at the start/end of the line.

I feel like I'm missing something; is there a good way to handle this?

Model answered 8/7, 2015 at 21:53 Comment(0)
M
1

I was hoping to get out of this without needing custom code to handle it, but, alas, I haven't found a way to do so. I'll wait a few more days before I accept this answer in case someone has a better idea. The code below isn't complete, nor am I sure it's "done" (will sort out exactly what it should look like during our review process) but the basics are intact.

There are two main components to the approach:

  1. introduce a char role which expects the unicode name of a character as its argument, and which produces an inline description of the character while wrapping the character itself in an inline literal node.
  2. modify the text-wrapper Sphinx uses so that it won't break at the space.

Here's the code:

class TextWrapperDeux(TextWrapper):
    _wordsep_re = re.compile(
    r'((?<!`)\s+(?!`)|'                       # whitespace not between backticks
    r'(?<=\s)(?::[a-z-]+:)`\S+|'              # interpreted text start
    r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|'   # hyphenated words
    r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))')   # em-dash

    @property
    def wordsep_re(self):
        return self._wordsep_re

def char_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
    """Describe a character given by unicode name.

    e.g., :char:`SPACE` -> "char:` `(U+00020 SPACE)"
    """
    try:
        character = nodes.unicodedata.lookup(text)
    except KeyError:
        msg = inliner.reporter.error(
            ':char: argument %s must be valid unicode name at line %d' % (text, lineno))
        prb = inliner.problematic(rawtext, rawtext, msg)
        return [prb], [msg]
    app = inliner.document.settings.env.app
    describe_char = "(U+%05X %s)" % (ord(character), text)
    char = nodes.inline("char:", "char:", nodes.literal(character, character))
    char += nodes.inline(describe_char, describe_char)
    return [char], []

def setup(app):
    app.add_role('char', char_role)

The code above lacks some glue to actually force the use of the new TextWrapper, imports, etc. When a full version settles out I may try to find a meaningful way to republish it; if so I'll link it here.

Markup: Starting character is the :char:`SPACE` which has the value 0.

It'll produce plaintext output like this: Starting character is the char:` `(U+00020 SPACE) which has the value 0.

And HTML output like: Starting character is the <span>char:<code class="docutils literal"> </code><span>(U+00020 SPACE)</span></span> which has the value 0.

The HTML output ends up looking roughly like: Starting character is the char:(U+00020 SPACE) which has the value 0.

Model answered 12/7, 2015 at 16:59 Comment(0)
B
4

Try using the unicode character codes. If I understand your question, this should work.

Here is a "|space|" and a non-breaking space (|nbspc|)

.. |space| unicode:: U+0020 .. space
.. |nbspc| unicode:: U+00A0 .. non-breaking space

You should see:

Here is a “ ” and a non-breaking space ( )

Ballesteros answered 10/7, 2015 at 3:35 Comment(1)
Sorry for the delay. I could've been a bit more explicit about having tried unicode. It certainly adds some transparency to the source, but it doesn't solve wrapping the character in a literal or the wrapping issue in plaintext. Still upvoting--thanks for taking a swing at it.Model
M
1

I was hoping to get out of this without needing custom code to handle it, but, alas, I haven't found a way to do so. I'll wait a few more days before I accept this answer in case someone has a better idea. The code below isn't complete, nor am I sure it's "done" (will sort out exactly what it should look like during our review process) but the basics are intact.

There are two main components to the approach:

  1. introduce a char role which expects the unicode name of a character as its argument, and which produces an inline description of the character while wrapping the character itself in an inline literal node.
  2. modify the text-wrapper Sphinx uses so that it won't break at the space.

Here's the code:

class TextWrapperDeux(TextWrapper):
    _wordsep_re = re.compile(
    r'((?<!`)\s+(?!`)|'                       # whitespace not between backticks
    r'(?<=\s)(?::[a-z-]+:)`\S+|'              # interpreted text start
    r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|'   # hyphenated words
    r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))')   # em-dash

    @property
    def wordsep_re(self):
        return self._wordsep_re

def char_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
    """Describe a character given by unicode name.

    e.g., :char:`SPACE` -> "char:` `(U+00020 SPACE)"
    """
    try:
        character = nodes.unicodedata.lookup(text)
    except KeyError:
        msg = inliner.reporter.error(
            ':char: argument %s must be valid unicode name at line %d' % (text, lineno))
        prb = inliner.problematic(rawtext, rawtext, msg)
        return [prb], [msg]
    app = inliner.document.settings.env.app
    describe_char = "(U+%05X %s)" % (ord(character), text)
    char = nodes.inline("char:", "char:", nodes.literal(character, character))
    char += nodes.inline(describe_char, describe_char)
    return [char], []

def setup(app):
    app.add_role('char', char_role)

The code above lacks some glue to actually force the use of the new TextWrapper, imports, etc. When a full version settles out I may try to find a meaningful way to republish it; if so I'll link it here.

Markup: Starting character is the :char:`SPACE` which has the value 0.

It'll produce plaintext output like this: Starting character is the char:` `(U+00020 SPACE) which has the value 0.

And HTML output like: Starting character is the <span>char:<code class="docutils literal"> </code><span>(U+00020 SPACE)</span></span> which has the value 0.

The HTML output ends up looking roughly like: Starting character is the char:(U+00020 SPACE) which has the value 0.

Model answered 12/7, 2015 at 16:59 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.