Similar to what monkut mentioned, one of the best ways to do this is to utilize the .get()
function. Credit for this goes to Charles Severance and the Python For Everybody Course
For testing:
# Pretend line is as follow.
# It can and does contain \n (newline) but not \t (tab).
line = """Your battle is my battle . We fight together . One team . One team .
Shining sun always comes with the rays of hope . The hope is there .
Our best days yet to come . Let the hope light the road .""".lower()
His code (with my notes):
# make an empty dictionary
# split `line` into a list. default is to split on a space character
# etc., etc.
# iterate over the LIST of words (made from splitting the string)
counts = dict()
words = line.split()
for word in words:
counts[word] = counts.get(word,0) + 1
Your code:
for words in word_list:
if words in word_dict.keys():
word_dict[words] += 1
else:
word_dict[words] = 1
.get()
does this:
- Return the VALUE in the dictionary associated with
word
.
- Otherwise (, if the word is not a key in the dictionary,) return
0
.
No matter what is returned, we add 1
to it. Thus it handles the base case of seeing the word for the first time. We cannot use a dictionary comprehension, since the variable the comprehension is assigned to won't exist as we are creating that variable. Meaning
this: counts = { word:counts.get(word,0) + 1 for word in words}
is not possible, since counts
(is being created and assigned to at the same time. Alternatively, since) counts
the variable hasn't been fully defined when we reference it (again) to .get()
from it.
Output
>> counts
{'.': 8,
'always': 1,
'battle': 2,
'best': 1,
'come': 1,
'comes': 1,
'days': 1,
'fight': 1,
'hope': 3,
'is': 2,
'let': 1,
'light': 1,
'my': 1,
'of': 1,
'one': 2,
'our': 1,
'rays': 1,
'road': 1,
'shining': 1,
'sun': 1,
'team': 2,
'the': 4,
'there': 1,
'to': 1,
'together': 1,
'we': 1,
'with': 1,
'yet': 1,
'your': 1}
As an aside here is a "loaded" use of .get()
that I wrote as a way to solve the classic FizzBuzz question. I'm currently writing code for a similar situation in which I will use modulus and a dictionary, but for a split string as input.
word_dict.keys()
gets all the keys as a list, and checking membership in a list is aO(n)
operation, while checking for membership in a hashmap is much faster. – Eboatcollections.Counter
is available hg.python.org/cpython/file/2.7/Lib/collections.py – Eboat