Named Entity Recognition in aspect-opinion extraction using dependency rule matching

Using Spacy, I extract aspect-opinion pairs from a text, based on the grammar rules that I defined. Rules are based on POS tags and dependency tags, which is obtained by token.pos_ and token.dep_. Below is an example of one of the grammar rules. If I pass the sentence Japan is cool, it returns [('Japan', 'cool', 0.3182)], where the value represents the polarity of cool.

However I don't know how I can make it recognise the Named Entities. For example, if I pass Air France is cool, I want to get [('Air France', 'cool', 0.3182)] but what I currently get is [('France', 'cool', 0.3182)].

I checked Spacy online documentation and I know how to extract NE(doc.ents). But I want to know what the possible workaround is to make my extractor work. Please note that I don't want a forced measure such as concatenating strings AirFrance, Air_France etc.

Thank you!

import spacy

nlp = spacy.load("en_core_web_lg-2.2.5")
review_body = "Air France is cool."
doc=nlp(review_body)

rule3_pairs = []

for token in doc:

    children = token.children
    A = "999999"
    M = "999999"
    add_neg_pfx = False

    for child in children :
        if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
            A = child.text

        if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
            M = child.text

        # example - 'this could have been better' -> (this, not better)
        if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
            neg_prefix = "not"
            add_neg_pfx = True

        if(child.dep_ == "neg"): # neg is negation
            neg_prefix = child.text
            add_neg_pfx = True

    if (add_neg_pfx and M != "999999"):
        M = neg_prefix + " " + M

    if(A != "999999" and M != "999999"):
        rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))

Result

rule3_pairs
>>> [('France', 'cool', 0.3182)]

Desired output

rule3_pairs
>>> [('Air France', 'cool', 0.3182)]

!python -m spacy download en_core_web_lg import nltk nltk.download('vader_lexicon') import spacy nlp = spacy.load("en_core_web_lg") from nltk.sentiment.vader import SentimentIntensityAnalyzer sid = SentimentIntensityAnalyzer() def find_sentiment(doc): # find roots of all entities in the text ner_heads = {ent.root.idx: ent for ent in doc.ents} rule3_pairs = [] for token in doc: children = token.children A = "999999" M = "999999" add_neg_pfx = False for child in children: if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject if child.idx in ner_heads: A = ner_heads[child.idx].text else: A = child.text if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement M = child.text # example - 'this could have been better' -> (this, not better) if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary neg_prefix = "not" add_neg_pfx = True if(child.dep_ == "neg"): # neg is negation neg_prefix = child.text add_neg_pfx = True if (add_neg_pfx and M != "999999"): M = neg_prefix + " " + M if(A != "999999" and M != "999999"): rule3_pairs.append((A, M, sid.polarity_scores(M)['compound'])) return rule3_pairs print(find_sentiment(nlp("Air France is cool."))) print(find_sentiment(nlp("I think Gabriel García Márquez is not boring."))) print(find_sentiment(nlp("They say Central African Republic is really great. ")))

Recommended topics

Hot tags