python - find the occurrence of the word in a file
Asked Answered
A

6

12

I am trying to find the count of words that occured in a file. I have a text file (TEST.txt) the content of the file is as follows:

ashwin programmer india
amith programmer india

The result I expect is:

{ 'ashwin':1, 'programmer ':2,'india':2, 'amith ':1}

The code I am using is:

for line in open(TEST.txt,'r'):
    word = Counter(line.split())
    print word

The result I get is:

Counter({'ashwin': 1, 'programmer': 1,'india':1})
Counter({'amith': 1, 'programmer': 1,'india':1})

Can any one please help me? Thanks in advance .

Abaft answered 26/2, 2013 at 6:53 Comment(0)
L
18

Use the update method of Counter. Example:

from collections import Counter

data = '''\
ashwin programmer india
amith programmer india'''

c = Counter()
for line in data.splitlines():
    c.update(line.split())
print(c)

Output:

Counter({'india': 2, 'programmer': 2, 'amith': 1, 'ashwin': 1})
Limb answered 26/2, 2013 at 6:59 Comment(1)
+1 Just what I was going to post - this makes nice use of the specialised Counter.update method and doesn't require reading the entire file to memory...Unceasing
E
8
from collections import Counter;
cnt = Counter ();

for line in open ('TEST.txt', 'r'):
  for word in line.split ():
    cnt [word] += 1

print cnt
Elma answered 26/2, 2013 at 6:57 Comment(0)
B
5

You're iterating over every line and calling Counter each time. You want Counter to run over the entire file. Try:

from collections import Counter

with open("TEST.txt", "r") as f:
    # Used file context read and save into contents
    contents = f.read().split()
print Counter(contents)
Baroscope answered 26/2, 2013 at 6:55 Comment(4)
@jadkik94 If he's processing every line within that block either way, why would it make a difference?Baroscope
@Baroscope What happens if you have a 50gb file that you want to count? (Tha just so happens to only have 3 unique words)....Unceasing
@JonClements I was about to say this too, even if here it's unlikely to be the case. But a best practice is a best practice...Zip
Yep, you guys are right actually. I was forgetting about the default generator behavior.Baroscope
F
1

Using a Defaultdict:

from collections import defaultdict 

def read_file(fname):

    words_dict = defaultdict(int)
    fp = open(fname, 'r')
    lines = fp.readlines()
    words = []

    for line in lines:
        words += line.split(' ')

    for word in words:
        words_dict[word] += 1

    return words_dict
Firstly answered 19/1, 2014 at 2:2 Comment(0)
H
0
FILE_NAME = 'file.txt'

wordCounter = {}

with open(FILE_NAME,'r') as fh:
  for line in fh:
    # Replacing punctuation characters. Making the string to lower.
    # The split will spit the line into a list.
    word_list = line.replace(',','').replace('\'','').replace('.','').lower().split()
    for word in word_list:
      # Adding  the word into the wordCounter dictionary.
      if word not in wordCounter:
        wordCounter[word] = 1
      else:
        # if the word is already in the dictionary update its count.
        wordCounter[word] = wordCounter[word] + 1

print('{:15}{:3}'.format('Word','Count'))
print('-' * 18)

# printing the words and its occurrence.
for  (word,occurance)  in wordCounter.items(): 
  print('{:15}{:3}'.format(word,occurance))
Headwards answered 20/2, 2017 at 15:42 Comment(0)
R
0
f = open('input.txt', 'r')
data=f.read().lower()
list1=data.split()

d={}
for i in set(list1):
    d[i]=0

for i in list1:
    for j in d.keys():
       if i==j:
          d[i]=d[i]+1
print(d)
Riffe answered 21/12, 2018 at 14:52 Comment(1)
f = open('input.txt', 'r') # opening the file data=f.read().lower() list1=data.split() ##list created with all words d={} # empty dictionary for i in set(list1): d[i]=0 #adding all elements of the list to a dictionary and assigning its value as zero. for i in list1: for j in d.keys(): if i==j: d[i]=d[i]+1 #checking and counting the values. print(d) #sample file contents (input.txt) --- "Return all non-overlapping matches of pattern return pattern" #program output: {'non-overlapping': 1, 'of': 1, 'matches': 1, 'return': 2, 'pattern': 2, 'all': 1}Riffe

© 2022 - 2024 — McMap. All rights reserved.