Why doesn't Python auto escape '\' in __doc__? [duplicate]
Asked Answered
B

1

14

It seems that some escape chars still matter in docstring. For example, if we run python foo.py (Python 2.7.10), it will emit error like ValueError: invalid \x escape.

def f():
    """
    do not deal with '\x0'
    """
    pass

And in effect, it seem the correct docsting should be:

    """
    do not deal with '\\\\x0'
    """

Additionally it also affects import.

For Python 3.4.3+, the error message is:

  File "foo.py", line 4
    """
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 24-25: truncated \xXX escape

I feel it a bit strange since I was thinking it would only affect __doc__ and have no side effect on the module itself.

Why designed to be so? Is it a flaw/bug in Python?

NOTE

I know the meaning of """ and raw literals, however I think python interpreter should be able to treat docstring specially, at least in theory.

Borlow answered 16/11, 2015 at 11:22 Comment(6)
Try raw string: r"""do not deal with '\x0'""".Moron
You can put r at the beginning of the string r"""\x0"""Hellbender
@PeterWood: Later than my comment only one second ;)Moron
@KevinGuan why doesn't treat all docstring as raw string, then?Borlow
@HongxuChen Check my answer, because it also is string.Moron
@ivan_pozdeev: No, check OP's edit.Moron
M
26

From PEP 257:

For consistency, always use """triple double quotes""" around docstrings. Use r"""raw triple double quotes""" if you use any backslashes in your docstrings. For Unicode docstrings, use u"""Unicode triple-quoted strings""" .

There are two forms of docstrings: one-liners and multi-line docstrings.


Also from here:

There's no such python type as "raw string" -- there are raw string literals, which are just one syntax approach (out of many) to specify constants (i.e., literals) that are of string types.

So "getting" something "as a raw string" just makes no sense. You can write docstrings as raw string literals (i.e., with the prefix r -- that's exactly what denotes a raw string literal, the specific syntax that identifies such a constant to the python compiler), or else double up any backslashes in them (an alternative way to specify constant strings including backslash characters), but that has nothing to do with "getting" them one way or another.

Moron answered 16/11, 2015 at 11:31 Comment(8)
still fell it a bit strange; as the interpreter can locate docstring, it should also be able to treat them as raw literals during parsing.Borlow
@HongxuChen: Just edited, check PEP 257 for more details.Moron
yes, noticed that. wait for a while for me to accept your answer:) But I still believe that's flaw in pythonBorlow
@HongxuChen: Well, edited again. It's little hard to keep the format from a third-part site ;)Moron
So I think it may due to the processing "order": python interpreter firstly load all content, and during this procedure it only specially treats raw literals starting with "r"; and later it will parse the string and locate the docstring. So in this case, there is no way to discriminate the docstring from others, right?Borlow
@HongxuChen Yeah, although the string is little special(it's __doc__), but it's still string. So unless Python use another object on __doc__, string never auto become raw string.Moron
@HongxuChen You're saying a string literal should be parsed differently depending upon its context. That would be confusing.Hellbender
Docstrings already behave differently based on context. You cannot do """my docstring""".replace("my", "your") or f"""my docstring"""; it will no longer be a docstring.Alcuin

© 2022 - 2024 — McMap. All rights reserved.