Python - reading files from directory file not found in subdirectory (which is there)
Asked Answered
F

2

4

I am convinced it is something simply syntactic - I however can not figure out why my code:

import os
from collections import Counter
d = {}
for filename in os.listdir('testfilefolder'):
    f = open(filename,'r')
    d = (f.read()).lower()
    freqs = Counter(d)
    print(freqs)

will not work - it apparently can see in to the 'testfilefolder' folder and tell me that the the file is there i.e. an error message 'file2.txt' is not found. So it can find it to tell me that it is not found...

I however get this piece of code to work:

from collections import Counter
d = {}
f = open("testfilefolder/file2.txt",'r')
d = (f.read()).lower()
freqs = Counter(d)
print(freqs)

Bonus - is this a good way of doing what I am trying to do (read from file and count the frequencies of words)? This is my first day with Python (although I have some amounts of programming exp.)

I have to say that I am liking Python!

Thanks,

Brian

Footstalk answered 22/3, 2013 at 22:14 Comment(0)
W
2

As isedev pointed out, listdir() returns just the file names, not the full path (or relative paths). Another way to deal with this problem is to os.chdir() into the directory in question, then os.listdir('.').

Secondly, it seems your goal is to count frequency of words, not letters (characters). For that, you will need to break up the contents of the files into words. I prefer to use regular expression for this.

Thirdly, your solution counts words frequencies for each files separately. If you ever need to do it for all files, create a Counter() object in the beginning, then call the update() method to tally the counts.

Without further ado, my solution:

import collections
import re
import os

all_files_frequency = collections.Counter()

previous_dir = os.getcwd()
os.chdir('testfilefolder')
for filename in os.listdir('.'):
    with open(filename) as f:
        file_contents = f.read().lower()

    words = re.findall(r"[a-zA-Z0-9']+", file_contents) # Breaks up into words
    frequency = collections.Counter(words)              # For this file only
    all_files_frequency.update(words)                   # For all files
    print(frequency)

os.chdir(previous_dir)

print ''
print all_files_frequency
Wherewith answered 22/3, 2013 at 22:46 Comment(1)
Thanks a lot guys! I see that problem with os.listdir now, indeed it was finding just the names and not accessing the whole of the files. @Hai, I appreciate the solution you wrote. I am running in to the problem that it seems to load as well as count the temporary? files in to the counting. So for example file3.txt~ in the output below here: file2.txt~ Counter({'e': 2, 'a': 1, 'c': 1, 'd': 1, 'i': 1, 'l': 1, 's': 1}) file2.txt Counter({'a': 1, 'b': 1}) This happens for all the files in the directory - then at the end it also tallies them. Thanks, BrianFootstalk
C
6

Change:

f = open(filename,'r')

To:

f = open(os.path.join('testfilefolder',filename),'r')

Which is effectively what you are doing in:

f = open("testfilefolder/file2.txt",'r')

Reason: you are listing the files in 'testfilefolder' (a subdirectory of your current directory) but then trying to open the file in your current directory.

Cachepot answered 22/3, 2013 at 22:16 Comment(0)
W
2

As isedev pointed out, listdir() returns just the file names, not the full path (or relative paths). Another way to deal with this problem is to os.chdir() into the directory in question, then os.listdir('.').

Secondly, it seems your goal is to count frequency of words, not letters (characters). For that, you will need to break up the contents of the files into words. I prefer to use regular expression for this.

Thirdly, your solution counts words frequencies for each files separately. If you ever need to do it for all files, create a Counter() object in the beginning, then call the update() method to tally the counts.

Without further ado, my solution:

import collections
import re
import os

all_files_frequency = collections.Counter()

previous_dir = os.getcwd()
os.chdir('testfilefolder')
for filename in os.listdir('.'):
    with open(filename) as f:
        file_contents = f.read().lower()

    words = re.findall(r"[a-zA-Z0-9']+", file_contents) # Breaks up into words
    frequency = collections.Counter(words)              # For this file only
    all_files_frequency.update(words)                   # For all files
    print(frequency)

os.chdir(previous_dir)

print ''
print all_files_frequency
Wherewith answered 22/3, 2013 at 22:46 Comment(1)
Thanks a lot guys! I see that problem with os.listdir now, indeed it was finding just the names and not accessing the whole of the files. @Hai, I appreciate the solution you wrote. I am running in to the problem that it seems to load as well as count the temporary? files in to the counting. So for example file3.txt~ in the output below here: file2.txt~ Counter({'e': 2, 'a': 1, 'c': 1, 'd': 1, 'i': 1, 'l': 1, 's': 1}) file2.txt Counter({'a': 1, 'b': 1}) This happens for all the files in the directory - then at the end it also tallies them. Thanks, BrianFootstalk

© 2022 - 2024 — McMap. All rights reserved.