Spacy: How to get all words that describe a noun?
Asked Answered
S

2

8

I am new to spacy and to nlp overall.

To understand how spacy works I would like to create a function which takes a sentence and returns a dictionary,tuple or list with the noun and the words describing it.

I know that spacy creates a tree of the sentence and knows the use of each word (shown in displacy).

But what's the right way to get from:

"A large room with two yellow dishwashers in it"

To:

{noun:"room",adj:"large"} {noun:"dishwasher",adj:"yellow",adv:"two"}

Or any other solution that gives me all related words in a usable bundle.

Thanks in advance!

Studdingsail answered 3/6, 2021 at 12:3 Comment(0)
B
10

This is a very straightforward use of the DependencyMatcher.

import spacy
from spacy.matcher import DependencyMatcher

nlp = spacy.load("en_core_web_sm")

pattern = [
  {
    "RIGHT_ID": "target",
    "RIGHT_ATTRS": {"POS": "NOUN"}
  },
  # founded -> subject
  {
    "LEFT_ID": "target",
    "REL_OP": ">",
    "RIGHT_ID": "modifier",
    "RIGHT_ATTRS": {"DEP": {"IN": ["amod", "nummod"]}}
  },
]

matcher = DependencyMatcher(nlp.vocab)
matcher.add("FOUNDED", [pattern])

text = "A large room with two yellow dishwashers in it"
doc = nlp(text)
for match_id, (target, modifier) in matcher(doc):
    print(doc[modifier], doc[target], sep="\t")

Output:

large   room
two dishwashers
yellow  dishwashers

It should be easy to turn that into a dictionary or whatever you'd like. You might also want to modify it to take proper nouns as the target, or to support other kinds of dependency relations, but this should be a good start.

You may also want to look at the noun chunks feature.

Baikal answered 3/6, 2021 at 13:9 Comment(4)
Sorry for the late reply but thank you this produces exactly the output that I was looking for! Even though I do not completely understand what's going on in your code :) But I think reading through the documentation you sent will help :)Studdingsail
I hope you get a notification when I comment on this :) I was able to understand the Attrs and that patterns themselves but I don't seem to find anything about the ids: what possible ids are there and what do you mean by target and modifier? Also I don't really understand what the relative operators use is in this case .. I would be glad if you find the time to give me a short explanation or the link to the docs :)Thanks!Studdingsail
The IDs are names you make up, they can be anything. The operators are from Semgrex. The documentation is already linked at the top of my answer and explains both of these things.Baikal
what does RIGHT_ATTRS {"DEP": {"IN": ["amod", "nummod"]}} mean?Woodworth
C
3

What you want to do is called "noun chunks":

import spacy
nlp = spacy.load('en_core_web_md')
txt = "A large room with two yellow dishwashers in it"
doc = nlp(txt)

chunks = []
for chunk in doc.noun_chunks:
    out = {}
    root = chunk.root
    out[root.pos_] = root
    for tok in chunk:
        if tok != root:
            out[tok.pos_] = tok
    chunks.append(out)
print(chunks)

[
 {'NOUN': room, 'DET': A, 'ADJ': large}, 
 {'NOUN': dishwashers, 'NUM': two, 'ADJ': yellow}, 
 {'PRON': it}
]

You may notice "noun chunk" doesn't guarantee the root will always be a noun. Should you wish to restrict your results to nouns only:

chunks = []
for chunk in doc.noun_chunks:
    out = {}
    noun = chunk.root
    if noun.pos_ != 'NOUN':
        continue
    out['noun'] = noun
    for tok in chunk:
        if tok != noun:
            out[tok.pos_] = tok
    chunks.append(out)
    
print(chunks)

[
 {'noun': room, 'DET': A, 'ADJ': large}, 
 {'noun': dishwashers, 'NUM': two, 'ADJ': yellow}
]
Criticize answered 5/6, 2021 at 10:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.