I have tried to remove words from a document that are considered to be named entities by spacy, so basically removing "Sweden" and "Nokia" from the string example. I could not find a way to work around the problem that entities are stored as a span. So when comparing them with single tokens from a spacy doc, it prompts an error.
In a later step, this process is supposed to be a function applied to several text documents stored in a pandas data frame.
I would appreciate any kind of help and advice on how to maybe better post questions as this is my first one here.
nlp = spacy.load('en')
text_data = u'This is a text document that speaks about entities like Sweden and Nokia'
document = nlp(text_data)
text_no_namedentities = []
for word in document:
if word not in document.ents:
text_no_namedentities.append(word)
return " ".join(text_no_namedentities)
It creates the following error:
TypeError: Argument 'other' has incorrect type (expected spacy.tokens.token.Token, got spacy.tokens.span.Span)