I'm trying to use a python script to edit a large directory of .html files in a loop. I'm having trouble looping through the filenames using os.walk(). This chunk of code just turns the html files into strings that I can work with, but the script does not even enter the loop, as if the files don't exist. Basically it prints point1
but never reaches point2
. The script ends without an error message. The directory is set up inside the folder called "amazon", and there is one level of 20 subfolders inside of it with 20 html files in each of those.
Oddly the code works perfectly on a neighboring directory that only contains .txt files, but it seems like it's not grabbing my .html files for some reason. Is there something I don't understand about the structure of the for root, dirs, filenames in os.walk()
loop? This is my first time using os.walk, and I've looked at a number of other pages on this site to try to make it work.
import os
rootdir = 'C:\filepath\amazon'
print "point1"
for root, dirs, filenames in os.walk(rootdir):
print "point2"
for file in filenames:
with open (os.path.join(root, file), 'r') as myfile:
g = myfile.read()
print g
Any help is much appreciated.
os.path.join
. Interoperability is one of the best things about Python, so we may as well use the stdlib function included to join filenames together! :) – Gildea