Check if a word is in a string in Python
Asked Answered
P

15

253

I'm working with Python, and I'm trying to find out if you can tell if a word is in a string.

I have found some information about identifying if the word is in the string - using .find, but is there a way to do an if statement. I would like to have something like the following:

if string.find(word):
    print("success")
Parkins answered 16/3, 2011 at 1:10 Comment(0)
O
455

What is wrong with:

if word in mystring: 
   print('success')
Octans answered 16/3, 2011 at 1:13 Comment(13)
just as a caution, if you have a string "paratyphoid is bad" and you do a if "typhoid" in "paratyphoid is bad" you will get a true.Sterilant
Anyone knows how to overcome this problem?Gladiolus
@user2567857, regular expressions -- see Hugh Bothwell's answer.Epact
@fabrizioM what can I do if I want to check if two words are in my string?Minute
if (word1 in mystring and word2 in mystring)Kibitz
How is this the accepted answer?!! It just checks whether a sequence of characters (not a word) appear in a stringPhoton
This really shouldn't be the accepted answer as it will test wrong for a number of cases.Keir
what happens if the string is contained more than one?Mcfadden
This is not useful when you want to find the exact word in the sentenceHourihan
@pedrambashiri a word is a sequence of characters... the question is vague and as a result the answer is vague, there is nothing wrong with this being the accepted answer. Sure people might want to find if a word is in a sentence at which point this would fail, but the person asking the question didn't clarify their requirementsArmure
@Kevin, a sequence of characters is not a word. Yes, the question is vague. So, try to close it. Not close the mine! I want to help to code. Not to make points...Boxwood
Is using "in" case sensitive?Blazonry
I've added a function to this thread that solves all the problems of the word being at the beginning, end, case-sensitivity or next to punctuation.Lacroix
R
214
if 'seek' in 'those who seek shall find':
    print('Success!')

but keep in mind that this matches a sequence of characters, not necessarily a whole word - for example, 'word' in 'swordsmith' is True. If you only want to match whole words, you ought to use regular expressions:

import re

def findWholeWord(w):
    return re.compile(r'\b({0})\b'.format(w), flags=re.IGNORECASE).search

findWholeWord('seek')('those who seek shall find')    # -> <match object>
findWholeWord('word')('swordsmith')                   # -> None
Reconstruct answered 16/3, 2011 at 1:52 Comment(6)
Is there a really fast method of searching for multiple words, say a set of several thousand words, without having to construct a for loop going through each word? I have a million sentences, and a million terms to search through to see which sentence has which matching words. Currently it's taking me days to process, and I want to know if there's a faster way.Adanadana
@Adanadana try to use grep instead of python regexMavismavra
p1 for swordsmithPeri
How do you handle exceptions, e.g. when the word is not found in the string?Formality
@FaCoffee: if the string is not found, the function returns None (see last example above).Reconstruct
To be on the safe side of things, you should do .format(re.escape(w)). If you don't have that you open yourself up to string manipulation attacks. Of course, if you can trust your input, this is a non issue. However, if your list of words comes from another source (list found on the internet, database, user input), this is super critical.Voodoo
P
67

If you want to find out whether a whole word is in a space-separated list of words, simply use:

def contains_word(s, w):
    return (' ' + w + ' ') in (' ' + s + ' ')

contains_word('the quick brown fox', 'brown')  # True
contains_word('the quick brown fox', 'row')    # False

This elegant method is also the fastest. Compared to Hugh Bothwell's and daSong's approaches:

>python -m timeit -s "def contains_word(s, w): return (' ' + w + ' ') in (' ' + s + ' ')" "contains_word('the quick brown fox', 'brown')"
1000000 loops, best of 3: 0.351 usec per loop

>python -m timeit -s "import re" -s "def contains_word(s, w): return re.compile(r'\b({0})\b'.format(w), flags=re.IGNORECASE).search(s)" "contains_word('the quick brown fox', 'brown')"
100000 loops, best of 3: 2.38 usec per loop

>python -m timeit -s "def contains_word(s, w): return s.startswith(w + ' ') or s.endswith(' ' + w) or s.find(' ' + w + ' ') != -1" "contains_word('the quick brown fox', 'brown')"
1000000 loops, best of 3: 1.13 usec per loop

Edit: A slight variant on this idea for Python 3.6+, equally fast:

def contains_word(s, w):
    return f' {w} ' in f' {s} '
Pearman answered 11/4, 2016 at 20:32 Comment(6)
This has several problems: (1) Words at the end (2) Words at the beginning (3) words in between like contains_word("says", "Simon says: Don't use this answer")Kalat
@MartinThoma - As stated, this method is specifically for finding out "whether a whole word is in a space-separated list of words". In that situation, it works fine for: (1) Words at the end (2) Words at the beginning (3) words in between. Your example only fails because your list of words includes a colon.Pearman
Clever thinking. Thanks! :)Pinchpenny
This has a few problems. It assumes space is the only thing that breaks one word from another. Try finding fox on "the quick brown fox!" or "the quick brown dog, fox, and chicken. The regex answer does not have this issue, that I can see. Though, tokenization is a hard problem and for best results use SPACY or NLTK.Sea
@Sea Once again, this method is SPECIFICALLY for "If you want to find out whether a whole word is in a space-separated list of words", as the author clearly stated.Neoarsphenamine
def wordSearch(word, phrase): return word in [words.strip(',.?!') for words in phrase.split()] linkBoxwood
L
23

You can split string to the words and check the result list.

if word in string.split():
    print("success")
Loire answered 1/12, 2016 at 18:26 Comment(3)
Please use the edit link explain how this code works and don’t just give the code, as an explanation is more likely to help future readers.Dougald
This should be the actual answer for matching the whole word.Stressful
We should think about punctuation too. Look here.Boxwood
N
22

find returns an integer representing the index of where the search item was found. If it isn't found, it returns -1.

haystack = 'asdf'

haystack.find('a') # result: 0
haystack.find('s') # result: 1
haystack.find('g') # result: -1

if haystack.find(needle) >= 0:
  print('Needle found.')
else:
  print('Needle not found.')
Nisi answered 16/3, 2011 at 1:13 Comment(0)
M
12

This small function compares all search words in given text. If all search words are found in text, returns length of search, or False otherwise.

Also supports unicode string search.

def find_words(text, search):
    """Find exact words"""
    dText   = text.split()
    dSearch = search.split()

    found_word = 0

    for text_word in dText:
        for search_word in dSearch:
            if search_word == text_word:
                found_word += 1

    if found_word == len(dSearch):
        return lenSearch
    else:
        return False

usage:

find_words('çelik güray ankara', 'güray ankara')
Menstruation answered 22/6, 2012 at 22:51 Comment(0)
T
9

If matching a sequence of characters is not sufficient and you need to match whole words, here is a simple function that gets the job done. It basically appends spaces where necessary and searches for that in the string:

def smart_find(haystack, needle):
    if haystack.startswith(needle+" "):
        return True
    if haystack.endswith(" "+needle):
        return True
    if haystack.find(" "+needle+" ") != -1:
        return True
    return False

This assumes that commas and other punctuations have already been stripped out.

Tridentum answered 15/6, 2012 at 7:23 Comment(1)
This solution worked best for my case as I am using tokenized space separated strings.Scottscotti
C
9

Using regex is a solution, but it is too complicated for that case.

You can simply split text into list of words. Use split(separator, num) method for that. It returns a list of all the words in the string, using separator as the separator. If separator is unspecified it splits on all whitespace (optionally you can limit the number of splits to num).

list_of_words = mystring.split()
if word in list_of_words:
    print('success')

This will not work for string with commas etc. For example:

mystring = "One,two and three"
# will split into ["One,two", "and", "three"]

If you also want to split on all commas etc. use separator argument like this:

# whitespace_chars = " \t\n\r\f" - space, tab, newline, return, formfeed
list_of_words = mystring.split( \t\n\r\f,.;!?'\"()")
if word in list_of_words:
    print('success')
Commemorate answered 18/12, 2017 at 11:44 Comment(2)
This is a good solution, and similar to @Corvax, with the benefit of adding common characters to split on so that in a string like "First: there..", the word "First" could be found. Note that @Commemorate isn't including ":" in the additional chars. I would :). Also, if the search is case-insensitive, consider using .lower() on both the word and string before the split. mystring.lower().split() and word.lower() I think this is also faster than the regex example.Lavernalaverne
I think to use something like split( \t\n\r\f,.;!?'\"()") we need to import re. But it is a good solution too.Boxwood
K
7

As you are asking for a word and not for a string, I would like to present a solution which is not sensitive to prefixes / suffixes and ignores case:

#!/usr/bin/env python

import re


def is_word_in_text(word, text):
    """
    Check if a word is in a text.

    Parameters
    ----------
    word : str
    text : str

    Returns
    -------
    bool : True if word is in text, otherwise False.

    Examples
    --------
    >>> is_word_in_text("Python", "python is awesome.")
    True

    >>> is_word_in_text("Python", "camelCase is pythonic.")
    False

    >>> is_word_in_text("Python", "At the end is Python")
    True
    """
    pattern = r'(^|[^\w]){}([^\w]|$)'.format(word)
    pattern = re.compile(pattern, re.IGNORECASE)
    matches = re.search(pattern, text)
    return bool(matches)


if __name__ == '__main__':
    import doctest
    doctest.testmod()

If your words might contain regex special chars (such as +), then you need re.escape(word)

Kalat answered 9/8, 2017 at 10:11 Comment(0)
S
5

Advanced way to check the exact word, that we need to find in a long string:

import re
text = "This text was of edited by Rock"
#try this string also
#text = "This text was officially edited by Rock" 
for m in re.finditer(r"\bof\b", text):
    if m.group(0):
        print("Present")
    else:
        print("Absent")
Snailfish answered 2/11, 2016 at 8:39 Comment(0)
B
3

What about to split the string and strip words punctuation?

w in [ws.strip(',.?!') for ws in p.split()]

If need, do attention to lower/upper case:

w.lower() in [ws.strip(',.?!') for ws in p.lower().split()]

Maybe that way:

def wcheck(word, phrase):
    # Attention about punctuation and about split characters
    punctuation = ',.?!'
    return word.lower() in [words.strip(punctuation) for words in phrase.lower().split()]

Sample:

print(wcheck('CAr', 'I own a caR.'))

I didn't check performance...

Boxwood answered 26/12, 2020 at 5:18 Comment(0)
C
2

You could just add a space before and after "word".

x = raw_input("Type your word: ")
if " word " in x:
    print("Yes")
elif " word " not in x:
    print("Nope")

This way it looks for the space before and after "word".

>>> Type your word: Swordsmith
>>> Nope
>>> Type your word:  word 
>>> Yes
Cassondra answered 26/2, 2015 at 14:23 Comment(1)
But what if the word is at the beginning or the end of the sentence (no space)Electrophilic
L
0

I believe this answer is closer to what was initially asked: Find substring in string but only if whole words?

It is using a simple regex:

import re

if re.search(r"\b" + re.escape(word) + r"\b", string):
  print('success')
Lucan answered 25/8, 2021 at 13:25 Comment(0)
L
0

One of the solutions is to put a space at the beginning and end of the test word. This fails if the word is at the beginning or end of a sentence or is next to any punctuation. My solution is to write a function that replaces any punctuation in the test string with spaces, and add a space to the beginning and end or the test string and test word, then return the number of occurrences. This is a simple solution that removes the need for any complex regex expression.

def countWords(word, sentence):
    testWord = ' ' + word.lower() + ' '
    testSentence = ' '

    for char in sentence:
        if char.isalpha():
            testSentence = testSentence + char.lower()
        else:
            testSentence = testSentence + ' '

    testSentence = testSentence + ' '

    return testSentence.count(testWord)

To count the number of occurrences of a word in a string:

sentence = "A Frenchman ate an apple"
print(countWords('a', sentence))

returns 1

sentence = "Is Oporto a 'port' in Portugal?"
print(countWords('port', sentence))

returns 1

Use the function in an 'if' to test if the word exists in a string

Lacroix answered 18/3, 2022 at 9:37 Comment(0)
E
0
def word_find(word, string):
    # Using str.find() method
    # It returns -1 if the word is not found, else returns the index of the first occurrence
    if string.find(word) != -1:
        return 'success'
    else:
        return 'word not found in string'
    

print(word_find('lo', 'Hello world')) ## success
Empirin answered 10/3 at 9:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.