Find String Between Two Substrings in Python When There is A Space After First Substring
Asked Answered
S

5

5

While there are several posts on StackOverflow that are similar to this, none of them involve a situation when the target string is one space after one of the substrings.

I have the following string (example_string): <insert_randomletters>[?] I want this string.Reduced<insert_randomletters>

I want to extract "I want this string." from the string above. The randomletters will always change, however the quote "I want this string." will always be between [?] (with a space after the last square bracket) and Reduced.

Right now, I can do the following to extract "I want this string".

target_quote_object = re.search('[?](.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text[2:])

This eliminates the ] and that always appear at the start of my extracted string, thus only printing "I want this string." However, this solution seems ugly, and I'd rather make re.search() return the current target string without any modification. How can I do this?

Snowman answered 31/3, 2018 at 19:0 Comment(4)
You need to escape [ and ?, and add \s* after ], '\[\?]\s*(.*?)Reduced'Extreme
I ended up using re.search('] (.*?)Reduced', example_string). Would my solution work too, or is yours more optimal?Snowman
If you do not care if there is [? before ], it will.Extreme
I would also advise to check if there is a match before accessing .group(1) to avoid errors if no match is found.Extreme
A
4

Your '[?](.*?)Reduced' pattern matches a literal ?, then captures any 0+ chars other than line break chars, as few as possible up to the first Reduced substring. That [?] is a character class formed with unescaped brackets, and the ? inside a character class is a literal ? char. That is why your Group 1 contains the ] and a space.

To make your regex match [?] you need to escape [ and ? and they will be matched as literal chars. Besides, you need to add a space after ] to actually make sure it does not land into Group 1. A better idea is to use \s* (0 or more whitespaces) or \s+ (1 or more occurrences).

Use

re.search(r'\[\?]\s*(.*?)Reduced', example_string)

See the regex demo.

import re
rx = r"\[\?]\s*(.*?)Reduced"
s = "<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>"
m = re.search(r'\[\?]\s*(.*?)Reduced', s)
if m:
    print(m.group(1))
# => I want this string.

See the Python demo.

Apiary answered 31/3, 2018 at 19:7 Comment(3)
Question - will this capture the target string if there is only one whitespace after the first substring?Snowman
Yes, since the \s* will match zero or more whitespace characters.Check out the Python docs if needed.Glycerin
@Roymunson If there is only one literal space, you may use a space. But if you are not sure, you may use any of the hints I added to the answer. BTW, " +" matches 1+ spaces, " *" matches 0 or more spaces. \s+ will match 1+ whitespace chars, \s* will match 0+.Extreme
U
2

Regex may not be necessary for this, provided your string is in a consistent format:

mystr = '<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

res = mystr.split('Reduced')[0].split('] ')[1]

# 'I want this string.'
Unrelenting answered 31/3, 2018 at 19:27 Comment(0)
S
1

The solution turned out to be:

target_quote_object = re.search('] (.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text)

However, Wiktor's solution is better.

Snowman answered 31/3, 2018 at 19:2 Comment(0)
C
1

You [co]/[sho]uld use Positive Lookbehind (?<=\[\?\]) :

enter image description here

import re
pattern=r'(?<=\[\?\])(\s\w.+?)Reduced'

string_data='<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

print(re.findall(pattern,string_data)[0].strip())

output:

I want this string.
Champerty answered 1/4, 2018 at 6:22 Comment(0)
G
0

Like the other answer, this might not be necessary. Or just too long-winded for Python. This method uses one of the common string methods find.

  • str.find(sub,start,end) will return the index of the first occurrence of sub in the substring str[start:end] or returns -1 if none found.
  • In each iteration, the index of [?] is retrieved following with index of Reduced. Resulting substring is printed.
  • Every time this [?]...Reduced pattern is returned, the index is updated to the rest of the string. The search is continued from that index.

Code

s = ' [?] Nice to meet you.Reduced  efweww  [?] Who are you? Reduced<insert_randomletters>[?] I want this 
string.Reduced<insert_randomletters>'


idx = s.find('[?]')
while idx is not -1:
    start = idx
    end = s.find('Reduced',idx)
    print(s[start+3:end].strip())
    idx = s.find('[?]',end)

Output

$ python splmat.py
Nice to meet you.
Who are you?
I want this string.
Glycerin answered 31/3, 2018 at 20:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.