How to find index of an exact word in a string in Python [duplicate]
Asked Answered
E

5

21
word = 'laugh'    
string = 'This is laughing laugh'
index = string.find ( word )

index is 8, should be 17. I looked around hard, but could not find an answer.

Easting answered 15/8, 2016 at 13:45 Comment(4)
New to Python, re is too complicated for me to solve this yet!Easting
I found 194 questions on this site when I search for "how to find a word in a string". Are you saying none of those answers helped?Uranus
8 is the right answer, find returns the starting position of the first matching substringDenier
Does this answer your question? Finding the position of a word in a stringJanis
A
42

You should use regex (with word boundary) as str.find returns the first occurrence. Then use the start attribute of the match object to get the starting index.

import re

string = 'This is laughing laugh'

a = re.search(r'\b(laugh)\b', string)
print(a.start())
>> 17

You can find more info on how it works here.

Aerobe answered 15/8, 2016 at 13:50 Comment(4)
Great! Could you let me know how to use a variable in the re expression, i.e I want to use word instead of (laugh)?Easting
@Easting Like you would with any Python string. You can concat or use .format, ie word = 'laugh' ; re.search(r'\b({})\b'.format(word), string)Aerobe
This worked: re.compile(r'\b%s\b' % word, re.I) not sure why re.search(r'\b({})\b‌​'.format(word), string) didn't...Easting
Many Thanks! Spent a lot of time on this to find out (newbie!).Easting
M
8

try this:

word = 'laugh'    
string = 'This is laughing laugh'.split(" ")
index = string.index(word)

This makes a list containing all the words and then searches for the relevant word. Then I guess you could add all of the lengths of the elements in the list less than index and find your index that way

position = 0
for i,word in enumerate(string):
    position += (1 + len(word))
    if i>=index:
        break

print position  

Hope this helps.

Makeup answered 15/8, 2016 at 14:0 Comment(0)
D
4

Here is one approach without regular expressions:

word = 'laugh'    
string = 'This is laughing laugh'
# we want to find this >>> -----
# index   0123456789012345678901     
words = string.split(' ')
word_index = words.index(word)
index = sum(len(x) + 1 for i, x in enumerate(words) 
            if i < word_index) 
=> 17

This splits the string into words, finds the index of the matching word and then sums up the lengths and the blank char as a separater of all words before it.

Update Another approach is the following one-liner:

index = string.center(len(string) + 2, ' ').find(word.center(len(word) + 2, ' '))

Here both the string and the word are right and left padded with blanks as to capture the full word in any position of the string.

You should of course use regular expressions for performance and convenience. The equivalent using the re module is as follows:

r = re.compile(r'\b%s\b' % word, re.I)
m = r.search(string)
index = m.start()

Here \b means word boundary, see the re documentation. Regex can be quite daunting. A great way to test and find regular expressions is using regex101.com

Denier answered 15/8, 2016 at 13:58 Comment(8)
downvote all you like but please add a comment so I can improve the answer.Denier
r = re.compile(r'\b%s\b' % word, re.I) worked like a charm. Your complete solution also works! Thanks a lot!Easting
The reason for the downvote is that this answer (both parts of it) already exist in very similar forms.Melindamelinde
@Melindamelinde I came up with these solutions and the whole answer by myself. Also if you look carefully this exact solution was not posted by anybody else.Denier
index = sum(len(x) + 1 for i, x in enumerate(words) if i < word_index) is not giving right char index.Ayer
@RashmiJain what index would you expect? It returns 17 which is the starting index for the word 'laugh' and is correct as per the stated expectation in the original question.Denier
index = sum(len(x) + 1 for i, x in enumerate(words) if i < word_index) piece of code is not working in general to give the character index from word's index. Yes But respective to the above quesgion it is working.Ayer
@RashmiJain can you give an example where it does not work? It works under the assumption that the word boundaries are spaces, more specifically the same as the sep argument to the split(sep) methodDenier
M
1

Strings in code are not separated by spaces. If you want to find the space, you must include the space in the word you are searching for. You may find it would actually be more efficient for you to split the string into words then iterate, e.g:

str = "This is a laughing laugh"
strList = str.split(" ")
for sWord in strList:
    if sWord == "laugh":
        DoStuff()

As you iterate you can add the length of the current word to an index and when you find the word, break from the loop. Don't forget to account for the spaces!

Melindamelinde answered 15/8, 2016 at 13:48 Comment(2)
I can find that the word is in string, I want to know its index.Easting
My bad, you can add the length of each word as you iterate. It's probably less efficient than the regex method listed, but I try to avoid regex in Python where possible - I see it as a scripting language and as something to be kept easy to read over performant.Melindamelinde
W
0

I stumbled upon this. I hope by now you would have figured it out. If you haven't maybe this would help. I had the same dilemma as you, was trying to print out a word using index.

string = 'This is laughing laugh'
word = string.split(" ")
print(word[02])

This would print out laughing.

I hope this helps. This is the first time of me answering a question on this forum, please pardon my syntax.

Thank you.

Weevil answered 18/4, 2020 at 15:36 Comment(1)
print(word[02]) This will fail in Python 3: "SyntaxError: leading zeros in decimal integer literals are not permitted"Experimentation

© 2022 - 2024 — McMap. All rights reserved.