I’m having difficulty eliminating and tokenizing a .text file using nltk
. I keep getting the following AttributeError: 'list' object has no attribute 'lower'
.
I just can’t figure out what I’m doing wrong, although it’s my first time of doing something like this. Below are my lines of code.I’ll appreciate any suggestions, thanks
import nltk
from nltk.corpus import stopwords
s = open("C:\zircon\sinbo1.txt").read()
tokens = nltk.word_tokenize(s)
def cleanupDoc(s):
stopset = set(stopwords.words('english'))
tokens = nltk.word_tokenize(s)
cleanup = [token.lower()for token in tokens.lower() not in stopset and len(token)>2]
return cleanup
cleanupDoc(s)