How can I find the position of the list of substrings from the string?
Given a string:
"The plane, bound for St Petersburg, crashed in Egypt's Sinai desert just 23 minutes after take-off from Sharm el-Sheikh on Saturday."
And a list of substring:
['The', 'plane', ',', 'bound', 'for', 'St', 'Petersburg', ',', 'crashed', 'in', 'Egypt', "'s", 'Sinai', 'desert', 'just', '23', 'minutes', 'after', 'take-off', 'from', 'Sharm', 'el-Sheikh', 'on', 'Saturday', '.']
Desired output:
>>> s = "The plane, bound for St Petersburg, crashed in Egypt's Sinai desert just 23 minutes after take-off from Sharm el-Sheikh on Saturday."
>>> tokens = ['The', 'plane', ',', 'bound', 'for', 'St', 'Petersburg', ',', 'crashed', 'in', 'Egypt', "'s", 'Sinai', 'desert', 'just', '23', 'minutes', 'after', 'take-off', 'from', 'Sharm', 'el-Sheikh', 'on', 'Saturday', '.']
>>> find_offsets(tokens, s)
[(0, 3), (4, 9), (9, 10), (11, 16), (17, 20), (21, 23), (24, 34),
(34, 35), (36, 43), (44, 46), (47, 52), (52, 54), (55, 60), (61, 67),
(68, 72), (73, 75), (76, 83), (84, 89), (90, 98), (99, 103), (104, 109),
(110, 119), (120, 122), (123, 131), (131, 132)]
Explanation of the output, the first substring "The" can be found using the (start, end)
index by using the string s
. So from the desired output.
So if we loop through all the tuples of integers from the desired output we'll get back the list of substrings, i.e.
>>> [s[start:end] for start, end in out]
['The', 'plane', ',', 'bound', 'for', 'St', 'Petersburg', ',', 'crashed', 'in', 'Egypt', "'s", 'Sinai', 'desert', 'just', '23', 'minutes', 'after', 'take-off', 'from', 'Sharm', 'el-Sheikh', 'on', 'Saturday', '.']
I've tried:
def find_offset(tokens, s):
index = 0
offsets = []
for token in tokens:
start = s[index:].index(token) + index
index = start + len(token)
offsets.append((start, index))
return offsets
Is there another way to find the position of the list of substrings from the string?
.index()
twice. – Deanndeanna