Why can't Python's raw string literals end with a single backslash?
Asked Answered
B

14

274

Technically, any odd number of backslashes, as described in the documentation.

>>> r'\'
  File "<stdin>", line 1
    r'\'
       ^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
  File "<stdin>", line 1
    r'\\\'
         ^
SyntaxError: EOL while scanning string literal

It seems like the parser could just treat backslashes in raw strings as regular characters (isn't that what raw strings are all about?), but I'm probably missing something obvious.

Bebebebeerine answered 15/3, 2009 at 12:54 Comment(4)
looks like this is now a faq. might not have been when you asked the question. i know the docs you cited say pretty much the same thing, but i just thought i would add another source of documentation.Psychometrics
@Psychometrics And that doc clearly explains they were meant primarily for regular expressions (which shouldn't end with a backslash) not Windows paths, which should.Bakerman
See also: python: SyntaxError: EOL while scanning string literal for the related error message, and other common causes.Gurkha
It's something that should be fixed in Python 4.Piggin
P
166

The reason is explained in the part of that section which I highlighted in bold:

String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

So raw strings are not 100% raw, there is still some rudimentary backslash-processing.

Perreira answered 15/3, 2009 at 13:5 Comment(9)
Oh wow... that's weird. Nice catch. Makes sense that r'\'' == "\\'" but it's still strange that the escape character has an effect without disappearing.Bebebebeerine
You might as well use a forward slash to achieve the same purpose.. This worked on Windows 7 Python 2.7... root_path = r'P:/Temp/IT/' then use it to create subfolder like this: create_folder = root_path + sub_folderHammer
@Hammer this may work for file system paths, but there are other uses of the backslash. And for file system paths, don't hardcode the separator. Use 'os.path.sep', or better the higher level features of 'os.path'. (Or 'pathlib', when available)Perreira
Note: Workaround is to use adjacent literal concatentation. r"foo\bar\baz" "\\" (wrap in parens if ambiguous) will create a single literal at compile time, the first part of which is raw, and only the last tiny bit is non-raw, to allow the trailing backslash.Agneta
IMO this just restates the question (what is allowed/will work, and what not), without saying why it's designed this way. There's a FAQ entry that sort of explains the why (raw strings were designed for a specific purpose, and it makes sense in the context of that purpose).Marcel
What's the point of raw strings then? Seems like a shady implementation of the concept.Carlottacarlovingian
Interpreting "why" differently: the reason is that the sequence of a backslash followed by a text character is interpreted by the source code tokenizer, as part of the process of figuring out where the string ends, in a step that happens before determining the actual content of the string literal.Gurkha
@Agneta no wonder I abandoned Python for getting too complicated...Eaglestone
in C# you can use @"\" to represent single backslash. so I think this is not technical problem.Porterhouse
B
180

The whole misconception about python's raw strings is that most of people think that backslash (within a raw string) is just a regular character as all others. It is NOT. The key to understand is this python's tutorial sequence:

When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string

So any character following a backslash is part of raw string. Once parser enters a raw string (non Unicode one) and encounters a backslash it knows there are 2 characters (a backslash and a char following it).

This way:

r'abc\d' comprises a, b, c, \, d

r'abc\'d' comprises a, b, c, \, ', d

r'abc\'' comprises a, b, c, \, '

and:

r'abc\' comprises a, b, c, \, ' but there is no terminating quote now.

Last case shows that according to documentation now a parser cannot find closing quote as the last quote you see above is part of the string i.e. backslash cannot be last here as it will 'devour' string closing char.

Bushed answered 29/10, 2013 at 9:24 Comment(5)
So I understand the mechanics but why? why is it implemented so? I do not see rational behind this. So the explanation above tells us that essentially raw-string makes everything inside the quotation mark as itself but backslash can't not appear as the last character. So why? So that make it sure that it can not be used a s a file path string????Preeminence
As I read further down the page, I found it has the purpose of having quotation mark in the string ,then again, why can't I put just a quotation mark but I have to put in a set with backslash in front of it? I figure there must be reasons for it, maybe related to regex expressions?Preeminence
I think if it is not related to regular expression, it is a design flaw since there are another options to take, like doubling quotation marks, like using "" for " like in most .csv files. x = r"I have ""an apple""" stands for I have "an apple". One problem is python allows something like a="a""b" or a="a" "b" resulting in a="ab". So to use doubling quotation marks, python needs to ban use case of a="a""b".Preeminence
I suggest to include one more: r'abc\\' comprises a, b, c, \, \Diwan
It's one of many shame of Python, that they don't understand this kind of misconception and, therefore, that they don't fix it. It is completely OK if we must write r'c:\\', but, of course, it's a bug if Python then creates this string: "c:\\"Lefthand
P
166

The reason is explained in the part of that section which I highlighted in bold:

String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

So raw strings are not 100% raw, there is still some rudimentary backslash-processing.

Perreira answered 15/3, 2009 at 13:5 Comment(9)
Oh wow... that's weird. Nice catch. Makes sense that r'\'' == "\\'" but it's still strange that the escape character has an effect without disappearing.Bebebebeerine
You might as well use a forward slash to achieve the same purpose.. This worked on Windows 7 Python 2.7... root_path = r'P:/Temp/IT/' then use it to create subfolder like this: create_folder = root_path + sub_folderHammer
@Hammer this may work for file system paths, but there are other uses of the backslash. And for file system paths, don't hardcode the separator. Use 'os.path.sep', or better the higher level features of 'os.path'. (Or 'pathlib', when available)Perreira
Note: Workaround is to use adjacent literal concatentation. r"foo\bar\baz" "\\" (wrap in parens if ambiguous) will create a single literal at compile time, the first part of which is raw, and only the last tiny bit is non-raw, to allow the trailing backslash.Agneta
IMO this just restates the question (what is allowed/will work, and what not), without saying why it's designed this way. There's a FAQ entry that sort of explains the why (raw strings were designed for a specific purpose, and it makes sense in the context of that purpose).Marcel
What's the point of raw strings then? Seems like a shady implementation of the concept.Carlottacarlovingian
Interpreting "why" differently: the reason is that the sequence of a backslash followed by a text character is interpreted by the source code tokenizer, as part of the process of figuring out where the string ends, in a step that happens before determining the actual content of the string literal.Gurkha
@Agneta no wonder I abandoned Python for getting too complicated...Eaglestone
in C# you can use @"\" to represent single backslash. so I think this is not technical problem.Porterhouse
W
37

That's the way it is! I see it as one of those small defects in python!

I don't think there's a good reason for it, but it's definitely not parsing; it's really easy to parse raw strings with \ as a last character.

The catch is, if you allow \ to be the last character in a raw string then you won't be able to put " inside a raw string. It seems python went with allowing " instead of allowing \ as the last character.

However, this shouldn't cause any trouble.

If you're worried about not being able to easily write windows folder pathes such as c:\mypath\ then worry not, for, you can represent them as r"C:\mypath", and, if you need to append a subdirectory name, don't do it with string concatenation, for it's not the right way to do it anyway! use os.path.join

>>> import os
>>> os.path.join(r"C:\mypath", "subfolder")
'C:\\mypath\\subfolder'
Wore answered 15/3, 2009 at 13:17 Comment(11)
Good ancillary material. :-) Devil's advocate, though: sometimes you want to differentiate file paths from directory paths by appending the path separator. Nice thing about os.path.join is that it will collapse them: assert os.path.join('/home/cdleary/', 'foo/', 'bar/') == '/home/cdleary/foo/bar/'Bebebebeerine
It doesn't make a (technical) difference though! os.path.isdir will tell you whether a certain path is a directory (folder)Wore
Yep, it's just to indicate to someone reading the code whether you expect a path to be a directory or a file.Bebebebeerine
The convention on windows is that files have an extension, always. it's not likely at all (under normal circumstances) to have a text file with a path such as c:\path\dataWore
and btw, I did say that I consider this a defect in python! All I'm saying is that, despite my opinion, it practically doesn't really matterWore
..or you can represent them as "c:/mypath" and forget your backslash woes altogether :-)Faustofaustus
one correction, though, you can't put a quote in a raw string either...you have to do a similar trick to get quote to work, i.e.: r"raw" "\"" or r'raw"quote'Trickster
Of course Python devs didn't heard about Windows UNC paths that start with \\?\ and os.path.join does not support that.Scold
This needs to be promoted to the answer of how to end a windows path with a \ in python. Perfect answer.Herrle
"The catch is, if you allow \ to be the last character in a raw string then you won't be able to put " inside a raw string. It seems python went with allowing " instead of allowing \ as the last character." It's not really a one-for-one tradeoff; as is, you're still forced to include a literal backslash before the quote if you want the raw string to include a quote character.Gurkha
@Herrle better to use forward slashes anyway; see stackoverflow.com/questions/2953834.Gurkha
P
36

In order for you to end a raw string with a slash I suggest you can use this trick:

>>> print r"c:\test"'\\'
test\

It uses the implicit concatenation of string literals in Python and concatenates one string delimited with double quotes with another that is delimited by single quotes. Ugly, but works.

Pugnacious answered 29/4, 2011 at 8:57 Comment(0)
I
18

Another trick is to use chr(92) as it evaluates to "\".

I recently had to clean a string of backslashes and the following did the trick:

CleanString = DirtyString.replace(chr(92),'')

I realize that this does not take care of the "why" but the thread attracts many people looking for a solution to an immediate problem.

Irriguous answered 2/11, 2011 at 19:54 Comment(3)
But what if the original string contains backslashes?Desdemona
chr(92) is awfully obscure, probably better to use "\\" (non-raw string with backslash)Trickster
For the entirely separate question of how to create a string with a single backslash, please see #19096296.Gurkha
B
9

Since \" is allowed inside the raw string. Then it can't be used to identify the end of the string literal.

Why not stop parsing the string literal when you encounter the first "?

If that was the case, then \" wouldn't be allowed inside the string literal. But it is.

Begum answered 15/3, 2009 at 16:59 Comment(1)
Exactly. Python designers likely evaluated the liklihood of the two alternatives: the two-character sequence \" anywhere within a double-quoted raw string, OR \ at end of double-quoted raw string. The usage statistics must favor the two character sequence anywhere vs. the one-character sequence at the end.Luann
M
4

The reason for why r'\' is syntactical incorrect is that although the string expression is raw the used quotes (single or double) always have to be escape since they would mark the end of the quote otherwise. So if you want to express a single quote inside single quoted string, there is no other way than using \'. Same applies for double quotes.

But you could use:

'\\'
Mestas answered 15/3, 2009 at 12:59 Comment(0)
B
2

Another user who has since deleted their answer (not sure if they'd like to be credited) suggested that the Python language designers may be able to simplify the parser design by using the same parsing rules and expanding escaped characters to raw form as an afterthought (if the literal was marked as raw).

I thought it was an interesting idea and am including it as community wiki for posterity.

Bebebebeerine answered 15/3, 2009 at 12:55 Comment(1)
But it might let you avoid having two separate string-literal-parser code paths.Bebebebeerine
S
2

Given the confusion around the arbitrary-seeming restriction against an odd number of backslashes at the end of a Python raw-string, it's fair to say that this is a design mistake or legacy issue originating in a desire to have a simpler parser.

While workarounds (such as r'C:\some\path' '\\' yielding (in Python notation:) 'C:\\some\\path\\' or (verbatim:) C:\some\path\) are simple, it's counterintuitive to be needing them. For comparison, let's have a look at C++ and Perl.


In C++, we can straightforwardly use raw string literal syntax

#include <iostream>

int main() {
    std::cout << R"(Hello World!)" << std::endl;
    std::cout << R"(Hello World!\)" << std::endl;
    std::cout << R"(Hello World!\\)" << std::endl;
    std::cout << R"(Hello World!\\\)" << std::endl;
}

to get the following output:

Hello World!
Hello World!\
Hello World!\\
Hello World!\\\

If we want to use the closing delimiter (above: )) within the string literal, we can even extend the syntax in an ad-hoc way to R"delimiterString(quotedMaterial)delimiterString". For example, R"asdf(some random delimiters: ( } [ ] { ) < > just for fun)asdf" produces the string some random delimiters: ( } [ ] { ) < > just for fun in the output. (Ain't that a good use of "asdf"!)


In Perl, this code

my $str = q{This is a test.\\};
print ($str);
print ("This is another test.\n");

will output the following: This is a test.\This is another test.

Replacing the first line by

my $str = q{This is a test.\};

would lead to an error message: Can't find string terminator "}" anywhere before EOF at main.pl line 1.

However, Perl treating a pre-delimiter \ as an escape character doesn't prevent the user from having an odd number of backslashes at the end of the resulting string; eg to place 3 backslashes \\\ into the end of $str, simply end the code with 6 backslashes: my $str = q{This is a test.\\\\\\};. Importantly, while we need to double the backslashes in the input, there is no Python-like inconsistent-seeming syntactic restriction.


Another way of looking at things is that these 3 languages use different ways to address the parsing issue of interaction between escape characters and closing delimiters:

  • Python: disallows an odd number of backslashes just before the closing delimiter; a simple workaround is r'stringWithoutFinalBackslash' '\\'
  • C++: allows essentially¹ everything between the delimiters
  • Perl: allows essentially² everything between the delimiters, but backslashes need to be consistently doubled

¹ The custom delimiterString itself cannot be more than 16 characters long, but that's hardly a limitation.

² If you need the delimiter itself, just escape it with \.

However, to be fair in a comparison to Python, we need to acknowledge that (1) C++ didn't have such string literals until C++11 and is famously hard to parse and (2) Perl is even harder to parse.

Stewart answered 24/2, 2023 at 14:28 Comment(0)
W
1

Naive raw strings

The naive idea of a raw string is

If I put an r in front of a pair of quotes, I can put whatever I want between the quotes and it will mean itself.

Unfortunately, this does not work, because if the whatever happens to contain a quote, the raw string would end at that point.

It is simply impossible that I can put "whatever I want" between fixed delimiters, because some of it could look like the terminating delimiter -- no matter what that delimiter is.

Real-world raw strings (variant 1)

One possible approach to this problem would be to say

If I put an r in front of a pair of quotes, I can put whatever I want between the quotes as long as it does not contain a quote and it will mean itself.

This restriction sounds harsh, until one recognizes that Python's large offering of quotes can accommodate most situations with this rule. The following are all valid Python quotes:

'
"
'''
"""

With this many possibilities for the delimiter, almost anything can be made to work. About the only exception would be if the string literal is supposed to contain a complete list of all allowed Python quotes.

Real-world raw strings (variant 2, as in Python)

Python, however, takes a different route using an extended version of the above rule. It effectively states

If I put an r in front of a pair of quotes, I can put whatever I want between the quotes as long as it does not contain a quote and it will mean itself. If I insist on including a quote, even that is allowed, but I have to put a backslash before it.

So the Python approach is, in a sense, even more liberal than variant 1 above -- but it has the side effect of "mis"interpreting the closing quote as part of the string if the last intended character of the string is a backslash.

Variant 2 is not helpful:

  • If I want the quote in my string, but not the backslash, the allowed version of my string literal will not be what I need.
    However, given the three different other kinds of quotes I have at my disposal, I will probably just pick one of those and my problem will be solved -- so this is not problematic case.
  • The problematic case is this one: If I want my string to end with a backslash, I am at a loss. I need to resort to concatenating a non-raw string literal containing the backslash.

Conclusion

After writing this, I go with several of the other posters that variant 1 would have been easier to understand and to accept and therefore more pythonic. That's life!

Woodrowwoodruff answered 4/8, 2022 at 10:10 Comment(2)
I agree with your analysis, but it doesn't really answer the question - please keep in mind that this is not a discussion forum.Gurkha
To me this was far and away the best answer on this subject. Crystal clear. Good job. 👍Speechmaking
J
0

Comming from C it pretty clear to me that a single \ works as escape character allowing you to put special characters such as newlines, tabs and quotes into strings.

That does indeed disallow \ as last character since it will escape the " and make the parser choke. But as pointed out earlier \ is legal.

Justificatory answered 15/3, 2009 at 17:14 Comment(1)
Yeah -- the heart of the issue was that raw strings treat \ as a literal instead of the start of an escape sequence. The strange thing is that it still has escape properties for quoting, despite being treated as a literal character.Bebebebeerine
O
0

some tips :

1) if you need to manipulate backslash for path then standard python module os.path is your friend. for example :

os.path.normpath('c:/folder1/')

2) if you want to build strings with backslash in it BUT without backslash at the END of your string then raw string is your friend (use 'r' prefix before your literal string). for example :

r'\one \two \three'

3) if you need to prefix a string in a variable X with a backslash then you can do this :

X='dummy'
bs=r'\ ' # don't forget the space after backslash or you will get EOL error
X2=bs[0]+X  # X2 now contains \dummy

4) if you need to create a string with a backslash at the end then combine tip 2 and 3 :

voice_name='upper'
lilypond_display=r'\DisplayLilyMusic \ ' # don't forget the space at the end
lilypond_statement=lilypond_display[:-1]+voice_name

now lilypond_statement contains "\DisplayLilyMusic \upper"

long live python ! :)

n3on

Observant answered 15/3, 2009 at 22:22 Comment(4)
None of these answer the question of "why", but #3 and #4 should not be used. Slicing and adding strings is generally bad practice, and you should prefer r'\dummy' for #3 (which works fine) and ' '.join([r'\DisplayLilyMusic', r'\upper']) to #4.Bebebebeerine
Reason being that strings are immutable and each slice/concatenation creates a new immutable string object that is typically discarded. Better to accumulate them all and join them together in one step with str.join(components)Bebebebeerine
Oh, whoops -- misunderstood what you meant for #3. I think there a simple '\\' + X is preferred to creating a string just to slice it.Bebebebeerine
Just find os.path.normpath will remove the tailing backslash... Then how should I concat the filename into the path...Saturation
O
0

Despite its role, even a raw string cannot end in a single backslash, because the backslash escapes the following quote character—you still must escape the surrounding quote character to embed it in the string. That is, r"...\" is not a valid string literal—a raw string cannot end in an odd number of backslashes.
If you need to end a raw string with a single backslash, you can use two and slice off the second.

Oversold answered 30/12, 2017 at 6:14 Comment(2)
What are you quoting?Bakerman
Seems to be from apprize.best/python/learning_1/8.html without attribution.Kook
L
-3

I encountered this problem and found a partial solution which is good for some cases. Despite python not being able to end a string with a single backslash, it can be serialized and saved in a text file with a single backslash at the end. Therefore if what you need is saving a text with a single backslash on you computer, it is possible:

x = 'a string\\' 
x
'a string\\' 

# Now save it in a text file and it will appear with a single backslash:

with open("my_file.txt", 'w') as h:
    h.write(x)

BTW it is not working with json if you dump it using python's json library.

Finally, I work with Spyder, and I noticed that if I open the variable in spider's text editor by double clicking on its name in the variable explorer, it is presented with a single backslash and can be copied to the clipboard that way (it's not very helpful for most needs but maybe for some..).

Latrice answered 10/10, 2018 at 11:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.