I have a list of words, noun-verb phrases and I want to:
- Search dependency patterns, words, in a corpus of text
- identify the paragraph that matches appears in
- extract the paragraph
- highlight the matched words in the paragraph
- create a snip/jpeg of the paragraph with matched words highlighted
- save the image in an excel.
The MWE below pertains to highlighting the matched words and displaying them using displacy. I have mentioned the rest of my task just to provide the context. The output isn't coloring the custom entities with custom colors.
import spacy
from spacy.matcher import PhraseMatcher
from spacy.tokens import Span
good = ['bacon', 'chicken', 'lamb','hot dog']
bad = [ 'apple', 'carrot']
nlp = spacy.load('en_core_web_sm')
patterns1 = [nlp(good) for good in good]
patterns2 = [nlp(bad) for bad in bad]
matcher = PhraseMatcher(nlp.vocab)
matcher.add('good', None, *patterns1)
matcher.add('bad', None, *patterns2)
doc = nlp("I like bacon and chicken but unfortunately I only had an apple and a carrot in the fridge")
matches = matcher(doc)
for match_id, start, end in matches:
span = Span(doc, start, end, label=match_id)
doc.ents = list(doc.ents) + [span] # add span to doc.ents
print([(ent.text, ent.label_) for ent in doc.ents])
The code above produces this output:
[('bacon', 'good'), ('chicken', 'good'), ('apple', 'bad'), ('carrot', 'bad')]
But when I try to custom color the entities, it doesn't seem to be working.
from spacy import displacy
colors = {'good': "#85C1E9", "bad": "#ff6961"}
options = {"ents": ['good', 'bad'], "colors": colors}
displacy.serve(doc, style='ent',options=options)
This is the output I get: