Using Spacy, I extract aspect-opinion pairs from a text, based on the grammar rules that I defined. Rules are based on POS tags and dependency tags, which is obtained by token.pos_
and token.dep_
. Below is an example of one of the grammar rules. If I pass the sentence Japan is cool,
it returns [('Japan', 'cool', 0.3182)]
, where the value represents the polarity of cool
.
However I don't know how I can make it recognise the Named Entities. For example, if I pass Air France is cool
, I want to get [('Air France', 'cool', 0.3182)]
but what I currently get is [('France', 'cool', 0.3182)]
.
I checked Spacy online documentation and I know how to extract NE(doc.ents
). But I want to know what the possible workaround is to make my extractor work. Please note that I don't want a forced measure such as concatenating strings AirFrance
, Air_France
etc.
Thank you!
import spacy
nlp = spacy.load("en_core_web_lg-2.2.5")
review_body = "Air France is cool."
doc=nlp(review_body)
rule3_pairs = []
for token in doc:
children = token.children
A = "999999"
M = "999999"
add_neg_pfx = False
for child in children :
if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
A = child.text
if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
M = child.text
# example - 'this could have been better' -> (this, not better)
if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
neg_prefix = "not"
add_neg_pfx = True
if(child.dep_ == "neg"): # neg is negation
neg_prefix = child.text
add_neg_pfx = True
if (add_neg_pfx and M != "999999"):
M = neg_prefix + " " + M
if(A != "999999" and M != "999999"):
rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))
Result
rule3_pairs
>>> [('France', 'cool', 0.3182)]
Desired output
rule3_pairs
>>> [('Air France', 'cool', 0.3182)]