I am trying to solve a difficult problem and am getting lost.
Here's what I'm supposed to do:
INPUT: file
OUTPUT: dictionary
Return a dictionary whose keys are all the words in the file (broken by
whitespace). The value for each word is a dictionary containing each word
that can follow the key and a count for the number of times it follows it.
You should lowercase everything.
Use strip and string.punctuation to strip the punctuation from the words.
Example:
>>> #example.txt is a file containing: "The cat chased the dog."
>>> with open('../data/example.txt') as f:
... word_counts(f)
{'the': {'dog': 1, 'cat': 1}, 'chased': {'the': 1}, 'cat': {'chased': 1}}
Here's all I have so far, in trying to at least pull out the correct words:
def word_counts(f):
i = 0
orgwordlist = f.split()
for word in orgwordlist:
if i<len(orgwordlist)-1:
print orgwordlist[i]
print orgwordlist[i+1]
with open('../data/example.txt') as f:
word_counts(f)
I'm thinking I need to somehow use the .count method and eventually zip some dictionaries together, but I'm not sure how to count the second words for each first word.
I know I'm nowhere near solving the problem, but trying to take it one step at a time. Any help is appreciated, even just tips pointing in the right direction.
f.split()
.f
is a file handler, not a string. – Diaspora