Warning: [W108] The rule-based lemmatizer did not find POS annotation for the token 'This'
Asked Answered
G

1

6

What this message is about? How do I remove this warning message?

import scispacy
import spacy
import en_core_sci_lg
from spacy_langdetect import LanguageDetector
from spacy.language import Language
from spacy.tokens import Doc


def create_lang_detector(nlp, name):
    return LanguageDetector()


Language.factory("language_detector", func=create_lang_detector)
nlp = en_core_sci_lg.load(disable=["tagger", "ner"])
nlp.max_length = 2000000
nlp.add_pipe('language_detector', last=True)

doc = nlp('This is some English text. Das ist ein Haus. This is a house.')

Warning:

[W108] The rule-based lemmatizer did not find POS annotation for the token 'This'. Check that your pipeline includes components that assign token.pos, typically 'tagger'+'attribute_ruler' or 'morphologizer'.

[W108] The rule-based lemmatizer did not find POS annotation for the token 'is'. Check that your pipeline includes components that assign token.pos, typically 'tagger'+'attribute_ruler' or 'morphologizer'.

[W108] The rule-based lemmatizer did not find POS annotation for the token 'some'. Check that your pipeline includes components that assign token.pos, typically 'tagger'+'attribute_ruler' or 'morphologizer'.
. . . .

Grapnel answered 3/3, 2021 at 6:2 Comment(0)
B
13

The lemmatizer is a separate component from the tagger in spacy v3. Disable the lemmatizer along with the tagger to avoid these warnings:

nlp = en_core_sci_lg.load(disable=["tagger", "ner", "lemmatizer"])
Billy answered 3/3, 2021 at 7:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.