Python 3.7.4: 're.error: bad escape \s at position 0'
Asked Answered
S

6

14

My program looks something like this:

import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The quick brown fox jumped"
escaped_str = re.escape(my_str)
# "The\\ quick\\ brown\\ fox\\ jumped"
# Replace escaped space patterns with a generic white space pattern
spaced_pattern = re.sub(r"\\\s+", r"\s+", escaped_str)
# Raises error

The error is this:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/swfarnsworth/programs/pycharm-2019.2/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/swfarnsworth/programs/pycharm-2019.2/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/swfarnsworth/projects/medaCy/medacy/tools/converters/con_to_brat.py", line 255, in <module>
    content = convert_con_to_brat(full_file_path)
  File "/home/swfarnsworth/projects/my_file.py", line 191, in convert_con_to_brat
    start_ind = get_absolute_index(text_lines, d["start_ind"], d["data_item"])
  File "/home/swfarnsworth/projects/my_file.py", line 122, in get_absolute_index
    entity_pattern_spaced = re.sub(r"\\\s+", r"\s+", entity_pattern_escaped)
  File "/usr/local/lib/python3.7/re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/local/lib/python3.7/re.py", line 309, in _subx
    template = _compile_repl(template, pattern)
  File "/usr/local/lib/python3.7/re.py", line 300, in _compile_repl
    return sre_parse.parse_template(repl, pattern)
  File "/usr/local/lib/python3.7/sre_parse.py", line 1024, in parse_template
    raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \s at position 0

I get this error even if I remove the two backslashes before the '\s+' or if I make the raw string (r"\\\s+") into a regular string. I checked the Python 3.7 documentation, and it appears that \s is still the escape sequence for white space.

Squish answered 10/10, 2019 at 17:54 Comment(4)
When I use your code, with entity_pattern_escaped changed to escaped_str, then print(spaced_pattern) produces The\s+quick\s+brown\s+fox\s+jumped which looks like the desired result.Odisodium
I couldn't reproduce in 3.6.3, but it fails at ideone.com which is 3.7.3.Odisodium
There's apparently been a change in how the replacement string is processed in 3.7.Odisodium
@SteeleFarnsworth: About your (deleted) question "How can I implement a new Python class within the cpython interpreter?": I noticed smth missing in _collectionmodule.c.Montane
S
23

Try fiddling with the backslashes to avoid that regex tries to interpret \s:

spaced_pattern = re.sub(r"\\\s+", "\\\s+", escaped_str)

now

>>> spaced_pattern
'The\\s+quick\\s+brown\\s+fox\\s+jumped'
>>> print(spaced_pattern)
The\s+quick\s+brown\s+fox\s+jumped

But why?

It seems that python tries to interpret \s like it would interpret r"\n" instead of leaving it alone like Python normally does. If you do. For example:

re.sub(r"\\\s+", r"\n+", escaped_str)

yields:

The
+quick
+brown
+fox
+jumped

even if \n was used in a raw string.

The change was introduced in Issue #27030: Unknown escapes consisting of '\' and ASCII letter in regular expressions now are errors.

The code that does the replacement is in sre_parse.py (python 3.7):

        else:
            try:
                this = chr(ESCAPES[this][1])
            except KeyError:
                if c in ASCIILETTERS:
                    raise s.error('bad escape %s' % this, len(this))

This code looks for what's behind a literal \ and tries to replace it by the proper non-ascii character. Obviously s is not in ESCAPES dictionary so the KeyError exception is triggered, then the message you're getting.

On previous versions it just issued a warning:

import warnings
warnings.warn('bad escape %s' % this,
              DeprecationWarning, stacklevel=4)

Looks that we're not alone to suffer from 3.6 to 3.7 upgrade: https://github.com/gi0baro/weppy/issues/227

Syck answered 10/10, 2019 at 18:3 Comment(2)
Thank you for the "why" portion... direly missing in the top voted regex module answer, although the top voted answer did solve the problem in a jiffy.Merilee
Thanks. regex module is way more powerful, that's true, but that's also not provided with basic python install, so more people will use re, because it's standard. I know that regex package can do marvels with nested regexes. As long as I don't need that, I'll stick to reOcto
P
12

Just try import regex as re instead of import re.

Progressist answered 23/11, 2020 at 7:50 Comment(6)
still does not work!Demob
@MohammadSadoughi can you paste your code gist ?Progressist
regex mentioned here is a third party moduleAmaryllis
This is a third-party module and shouldn't be the first resource to this issue. Also, there is no clarification as to why this solves the problem.Nerty
This library is certainly much more powerful and should be preferred when working with complex patterns, but it does not actually solve the problem with backslashes in the replacement patterns.Petaloid
On a vanilla Python installation, applying this answer results in ImportError.Boccioni
Y
1

Here is my simple code, which uses python-binance library and pandas, and it works in one venv with python 3.7, but when i had created new one for another project (python 3.7 as well) it threw the same errors with regex:

import pandas as pd
from binance import Client

api_key = ''
api_secret = ''

client = Client(api_key, api_secret)

timeframe = '1h'
coin = 'ETHUSDT'


def GetOHLC(coin, timeframe):
    frame = pd.DataFrame(client.get_historical_klines(coin, timeframe, '01.01.2015'))
    frame = frame.loc[:, :5]
    frame.columns = ['date', 'open', 'high', 'low', 'close', 'volume']
    frame.set_index('date', inplace=True)        
    frame.to_csv(path_or_buf=(coin+timeframe))


GetOHLC(coin, timeframe)

I had made some research but didn't find suitable solution. Then i compared version of regex lib of workable instance and new one: old one was from 2021 and new one was from 2022. Then i uninstall version of 2022 and install 2021 and it has started to work without any exceptions. Hope it will help in some particular cases.

Yama answered 16/3, 2022 at 6:38 Comment(0)
P
0

I guess you might be trying to do:

import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The\\ quick\\ brown\\ fox\\ jumped"
escaped_str = re.escape(my_str)
# "The\\ quick\\ brown\\ fox\\ jumped"
# Replace escaped space patterns with a generic white space pattern
print(re.sub(r"\\\\\\\s+", " ", escaped_str))

Output 1

The quick brown fox jumped

If you might want to have literal \s+, then try this answer or maybe:

import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The\\ quick\\ brown\\ fox\\ jumped"
escaped_str = re.escape(my_str)
print(re.sub(r"\\\\\\\s+", re.escape(r"\s") + '+', escaped_str))

Output 2

The\s+quick\s+brown\s+fox\s+jumped

Or maybe:

import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The\\ quick\\ brown\\ fox\\ jumped"
print(re.sub(r"\s+", "s+", my_str))

Output 3

The\s+quick\s+brown\s+fox\s+jumped

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Demo

Perceivable answered 10/10, 2019 at 18:0 Comment(1)
I think he wants to get a literal \s+ in the result.Odisodium
M
0

In case you are trying to replace anything by a single backslash, both the re and regex packages of Python 3.8.5 cannot do it alone.

The solution I rely on is to split the task between re.sub and Python's replace:

import re
re.sub(r'([0-9.]+)\*([0-9.]+)',r'\1 XBACKSLASHXcdot \2'," 4*2").replace('XBACKSLASHX','\\')
Marchelle answered 4/3, 2021 at 9:55 Comment(0)
I
-3
pip uninstall regex

pip install regex==2022.3.2
Impudence answered 27/6, 2022 at 14:41 Comment(1)
A bit more explanation should be provided here as to why this is a good solution suggestion.Spheno

© 2022 - 2024 — McMap. All rights reserved.