What do spaCy's part-of-speech and dependency tags mean?
Asked Answered
P

9

77

spaCy tags up each of the Tokens in a Document with a part of speech (in two different formats, one stored in the pos and pos_ properties of the Token and the other stored in the tag and tag_ properties) and a syntactic dependency to its .head token (stored in the dep and dep_ properties).

Some of these tags are self-explanatory, even to somebody like me without a linguistics background:

>>> import spacy
>>> en_nlp = spacy.load('en')
>>> document = en_nlp("I shot a man in Reno just to watch him die.")
>>> document[1]
shot
>>> document[1].pos_
'VERB'

Others... are not:

>>> document[1].tag_
'VBD'
>>> document[2].pos_
'DET'
>>> document[3].dep_
'dobj'

Worse, the official docs don't contain even a list of the possible tags for most of these properties, nor the meanings of any of them. They sometimes mention what tokenization standard they use, but these claims aren't currently entirely accurate and on top of that the standards are tricky to track down.

What are the possible values of the tag_, pos_, and dep_ properties, and what do they mean?

Pectinate answered 27/10, 2016 at 15:14 Comment(2)
There is documentation now, see spacy.io/api/annotation#pos-en and spacy.io/api/annotation#dependency-parsing-englishTraynor
@Traynor and links are broken againShelbashelbi
P
129

tl;dr answer

Just expand the lists at:

Longer answer

The docs have greatly improved since I first asked this question, and spaCy now documents this much better.

Part-of-speech tags

The pos and tag attributes are tabulated at https://spacy.io/api/annotation#pos-tagging, and the origin of those lists of values is described. At the time of this (January 2020) edit, the docs say of the pos attribute that:

spaCy maps all language-specific part-of-speech tags to a small, fixed set of word type tags following the Universal Dependencies scheme. The universal tags don’t code for any morphological features and only cover the word type. They’re available as the Token.pos and Token.pos_ attributes.

As for the tag attribute, the docs say:

The English part-of-speech tagger uses the OntoNotes 5 version of the Penn Treebank tag set. We also map the tags to the simpler Universal Dependencies v2 POS tag set.

and

The German part-of-speech tagger uses the TIGER Treebank annotation scheme. We also map the tags to the simpler Universal Dependencies v2 POS tag set.

You thus have a choice between using a coarse-grained tag set that is consistent across languages (.pos), or a fine-grained tag set (.tag) that is specific to a particular treebank, and hence a particular language.

.pos_ tag list

The docs list the following coarse-grained tags used for the pos and pos_ attributes:

  • ADJ: adjective, e.g. big, old, green, incomprehensible, first
  • ADP: adposition, e.g. in, to, during
  • ADV: adverb, e.g. very, tomorrow, down, where, there
  • AUX: auxiliary, e.g. is, has (done), will (do), should (do)
  • CONJ: conjunction, e.g. and, or, but
  • CCONJ: coordinating conjunction, e.g. and, or, but
  • DET: determiner, e.g. a, an, the
  • INTJ: interjection, e.g. psst, ouch, bravo, hello
  • NOUN: noun, e.g. girl, cat, tree, air, beauty
  • NUM: numeral, e.g. 1, 2017, one, seventy-seven, IV, MMXIV
  • PART: particle, e.g. ’s, not,
  • PRON: pronoun, e.g I, you, he, she, myself, themselves, somebody
  • PROPN: proper noun, e.g. Mary, John, London, NATO, HBO
  • PUNCT: punctuation, e.g. ., (, ), ?
  • SCONJ: subordinating conjunction, e.g. if, while, that
  • SYM: symbol, e.g. $, %, §, ©, +, −, ×, ÷, =, :), 😝
  • VERB: verb, e.g. run, runs, running, eat, ate, eating
  • X: other, e.g. sfpksdpsxmsa
  • SPACE: space, e.g.

Note that the docs are lying slightly when they say that this list follows the Universal Dependencies Scheme; there are two tags listed above that aren't part of that scheme.

One of those is CONJ, which used to exist in the Universal POS Tags scheme but has been split into CCONJ and SCONJ since spaCy was first written. Based on the mappings of tag->pos in the docs, it would seem that spaCy's current models don't actually use CONJ, but it still exists in spaCy's code and docs for some reason - perhaps backwards compatibility with old models.

The second is SPACE, which isn't part of the Universal POS Tags scheme (and never has been, as far as I know) and is used by spaCy for any spacing besides single normal ASCII spaces (which don't get their own token):

>>> document = en_nlp("This\nsentence\thas      some weird spaces in\n\n\n\n\t\t   it.")
>>> for token in document:
...   print('%r (%s)' % (str(token), token.pos_))
... 
'This' (DET)
'\n' (SPACE)
'sentence' (NOUN)
'\t' (SPACE)
'has' (VERB)
'     ' (SPACE)
'some' (DET)
'weird' (ADJ)
'spaces' (NOUN)
'in' (ADP)
'\n\n\n\n\t\t   ' (SPACE)
'it' (PRON)
'.' (PUNCT)

I'll omit the full list of .tag_ tags (the finer-grained ones) from this answer, since they're numerous, well-documented now, different for English and German, and probably more likely to change between releases. Instead, look at the list in the docs (e.g. https://spacy.io/api/annotation#pos-en for English) which lists every possible tag, the .pos_ value it maps to, and a description of what it means.

Dependency tokens

There are now three different schemes that spaCy uses for dependency tagging: one for English, one for German, and one for everything else. Once again, the list of values is huge and I won't reproduce it in full here. Every dependency has a brief definition next to it, but unfortunately, many of them - like "appositional modifier" or "clausal complement" - are terms of art that are rather alien to an everyday programmer like me. If you're not a linguist, you'll simply have to research the meanings of those terms of art to make sense of them.

I can at least provide a starting point for that research for people working with English text, though. If you'd like to see some examples of the CLEAR dependencies (used by the English model) in real sentences, check out the 2012 work of Jinho D. Choi: either his Optimization of Natural Language Processing Components for Robustness and Scalability or his Guidelines for the CLEAR Style Constituent to Dependency Conversion (which seems to just be a subsection of the former paper). Both list all the CLEAR dependency labels that existed in 2012 along with definitions and example sentences. (Unfortunately, the set of CLEAR dependency labels has changed a little since 2012, so some of the modern labels are not listed or exemplified in Choi's work - but it remains a useful resource despite being slightly outdated.)

Pectinate answered 27/10, 2016 at 15:14 Comment(5)
Another good reference for understanding the dependency tags is the Stanford dependency manual: nlp.stanford.edu/software/dependencies_manual.pdfNonviolence
@NicholasMorley That's a completely different dependency scheme, isn't it? I see stuff like npadvmod and mwe in there that aren't part of any of spaCy's three dependency schemes.Pectinate
Documentation for labels was moved to single model label's section, e.g.: spacy.io/models/en#en_core_web_trf-labelsShelbashelbi
@Shelbashelbi Thank you! The answer should really be edited to show this change, I thought I was going crazy trying to find the list of tags and such.Cantoris
One of the links is broken: Optimization of Natural Language Processing Components for Robustness and ScalabilityWigeon
G
52

Just a quick tip about getting the detail meaning of the short forms. You can use explain method like following:

spacy.explain('pobj')

which will give you output like:

'object of preposition'
Galvanometer answered 8/1, 2018 at 3:17 Comment(1)
Now that my self-answer has (once again) fallen out of date, this is arguably the best answer on the page. I'll leave it to someone else to compile an up-to-date list of tags and definitions if they want to, but this answer, at least, should remain valuable even as the list of tags changes.Pectinate
T
11

The official documentation now provides much more details for all those annotations at https://spacy.io/api/annotation (and the list of other attributes for tokens can be found at https://spacy.io/api/token).

As the documentation shows, their parts-of-speech (POS) and dependency tags have both Universal and specific variations for different languages and the explain() function is a very useful shortcut to get a better description of a tag's meaning without the documentation, e.g.

spacy.explain("VBD")

which gives "verb, past tense".

Tevis answered 28/8, 2018 at 12:8 Comment(0)
A
10

Direct links (if you don't feel like going through endless spacy documentation to get the full tables):

Auten answered 25/2, 2021 at 13:16 Comment(0)
K
10

After the recent update of Spacy to v3 the above links do not work.

You may visit this link to get the complete list.

Universal POS Tags enter image description here

English POS Tags enter image description here

Knox answered 13/7, 2021 at 15:58 Comment(1)
Works as of May 2024. The other answers do not.Dunn
P
3

Retrieve the tags and their meaning directly from pipeline/model programmatically

An alternative to looking up the tags in the documentation is to retrieve these programmatically from nlp.pipe_labels.

This has the advantage that you get the actual labels that your trained pipeline (aka model) provides and you don't have to copy these manually.

The following example code uses model en_core_web_sm. Link to model card here. See Label Scheme at bottom. Adapt to the model of your choosing.

Note: The Universal Part-of-speech Tags aren't programmatically available (at least I couldn't find a way) and can be looked up here in the documentation.

import spacy
nlp = spacy.load("en_core_web_sm")

for component in nlp.pipe_names:
    tags = nlp.pipe_labels[component]
    if len(tags)!=0:
        print(f"Label mapping for component: {component}")
        display(dict(list(zip(tags, [spacy.explain(tag) for tag in tags]))))
        print()

Output

Label mapping for component: tagger

{'$': 'symbol, currency',
 "''": 'closing quotation mark',
 ',': 'punctuation mark, comma',
 '-LRB-': 'left round bracket',
 '-RRB-': 'right round bracket',
 '.': 'punctuation mark, sentence closer',
 ':': 'punctuation mark, colon or ellipsis',
 'ADD': 'email',
 'AFX': 'affix',
 'CC': 'conjunction, coordinating',
 'CD': 'cardinal number',
 'DT': 'determiner',
 'EX': 'existential there',
 'FW': 'foreign word',
 'HYPH': 'punctuation mark, hyphen',
 'IN': 'conjunction, subordinating or preposition',
 'JJ': 'adjective (English), other noun-modifier (Chinese)',
 'JJR': 'adjective, comparative',
 'JJS': 'adjective, superlative',
 'LS': 'list item marker',
 'MD': 'verb, modal auxiliary',
 'NFP': 'superfluous punctuation',
 'NN': 'noun, singular or mass',
 'NNP': 'noun, proper singular',
 'NNPS': 'noun, proper plural',
 'NNS': 'noun, plural',
 'PDT': 'predeterminer',
 'POS': 'possessive ending',
 'PRP': 'pronoun, personal',
 'PRP$': 'pronoun, possessive',
 'RB': 'adverb',
 'RBR': 'adverb, comparative',
 'RBS': 'adverb, superlative',
 'RP': 'adverb, particle',
 'SYM': 'symbol',
 'TO': 'infinitival "to"',
 'UH': 'interjection',
 'VB': 'verb, base form',
 'VBD': 'verb, past tense',
 'VBG': 'verb, gerund or present participle',
 'VBN': 'verb, past participle',
 'VBP': 'verb, non-3rd person singular present',
 'VBZ': 'verb, 3rd person singular present',
 'WDT': 'wh-determiner',
 'WP': 'wh-pronoun, personal',
 'WP$': 'wh-pronoun, possessive',
 'WRB': 'wh-adverb',
 'XX': 'unknown',
 '_SP': 'whitespace',
 '``': 'opening quotation mark'}


Label mapping for component: parser

{'ROOT': 'root',
 'acl': 'clausal modifier of noun (adjectival clause)',
 'acomp': 'adjectival complement',
 'advcl': 'adverbial clause modifier',
 'advmod': 'adverbial modifier',
 'agent': 'agent',
 'amod': 'adjectival modifier',
 'appos': 'appositional modifier',
 'attr': 'attribute',
 'aux': 'auxiliary',
 'auxpass': 'auxiliary (passive)',
 'case': 'case marking',
 'cc': 'coordinating conjunction',
 'ccomp': 'clausal complement',
 'compound': 'compound',
 'conj': 'conjunct',
 'csubj': 'clausal subject',
 'csubjpass': 'clausal subject (passive)',
 'dative': 'dative',
 'dep': 'unclassified dependent',
 'det': 'determiner',
 'dobj': 'direct object',
 'expl': 'expletive',
 'intj': 'interjection',
 'mark': 'marker',
 'meta': 'meta modifier',
 'neg': 'negation modifier',
 'nmod': 'modifier of nominal',
 'npadvmod': 'noun phrase as adverbial modifier',
 'nsubj': 'nominal subject',
 'nsubjpass': 'nominal subject (passive)',
 'nummod': 'numeric modifier',
 'oprd': 'object predicate',
 'parataxis': 'parataxis',
 'pcomp': 'complement of preposition',
 'pobj': 'object of preposition',
 'poss': 'possession modifier',
 'preconj': 'pre-correlative conjunction',
 'predet': None,
 'prep': 'prepositional modifier',
 'prt': 'particle',
 'punct': 'punctuation',
 'quantmod': 'modifier of quantifier',
 'relcl': 'relative clause modifier',
 'xcomp': 'open clausal complement'}


Label mapping for component: ner

{'CARDINAL': 'Numerals that do not fall under another type',
 'DATE': 'Absolute or relative dates or periods',
 'EVENT': 'Named hurricanes, battles, wars, sports events, etc.',
 'FAC': 'Buildings, airports, highways, bridges, etc.',
 'GPE': 'Countries, cities, states',
 'LANGUAGE': 'Any named language',
 'LAW': 'Named documents made into laws.',
 'LOC': 'Non-GPE locations, mountain ranges, bodies of water',
 'MONEY': 'Monetary values, including unit',
 'NORP': 'Nationalities or religious or political groups',
 'ORDINAL': '"first", "second", etc.',
 'ORG': 'Companies, agencies, institutions, etc.',
 'PERCENT': 'Percentage, including "%"',
 'PERSON': 'People, including fictional',
 'PRODUCT': 'Objects, vehicles, foods, etc. (not services)',
 'QUANTITY': 'Measurements, as of weight or distance',
 'TIME': 'Times smaller than a day',
 'WORK_OF_ART': 'Titles of books, songs, etc.'}
Plyler answered 23/12, 2022 at 15:21 Comment(0)
G
2

At present, dependency parsing and tagging in SpaCy appears to be implemented only at the word level, and not at the phrase (other than noun phrase) or clause level. This means SpaCy can be used to identify things like nouns (NN, NNS), adjectives (JJ, JJR, JJS), and verbs (VB, VBD, VBG, etc.), but not adjective phrases (ADJP), adverbial phrases (ADVP), or questions (SBARQ, SQ).

For illustration, when you use SpaCy to parse the sentence "Which way is the bus going?", we get the following tree.

By contrast, if you use the Stanford parser you get a much more deeply structured syntax tree.

Gaberlunzie answered 7/12, 2017 at 17:50 Comment(1)
-1; IMO, this doesn't answer the question that I asked (although the trees are interesting and a good illustration of the difference between the two parsers). FYI, what you describe here is the difference between a constituency parser, like Stanford, and a dependency parser, like spaCy. See also https://mcmap.net/q/174089/-difference-between-constituency-parser-and-dependency-parser.Pectinate
S
2

2023 Update

There is a pip package (disclaimer: I wrote it) called spacysee that lets you explore the parse output of a Spacy document. I built it because I ran into this exact issue - not least because each model tends to use different labelling schema and so the documentation differs - for the most part it just links to the relevant section of Universal Dependencies.

Screenshot of output

Solifluction answered 6/4, 2023 at 15:3 Comment(0)
P
0

spaCy has a glossary here in its source code where it maps tag codes to tag labels, for its POS tags, syntactic categories, phrase types, dependency labels, etc.

It's quiet extensive, includes multiple frameworks (e.g. Universal Dependencies, Penn Treebank, etc.), and for multiple languages.

GLOSSARY = {
    # POS tags
    # Universal POS Tags
    # http://universaldependencies.org/u/pos/
    "ADJ": "adjective",
    "ADP": "adposition",
    "ADV": "adverb",
    "AUX": "auxiliary",
    "CONJ": "conjunction",
    "CCONJ": "coordinating conjunction",
    "DET": "determiner",
    "INTJ": "interjection",
    "NOUN": "noun",
    "NUM": "numeral",
    "PART": "particle",
    "PRON": "pronoun",
    "PROPN": "proper noun",
    "PUNCT": "punctuation",
    "SCONJ": "subordinating conjunction",
    "SYM": "symbol",
    "VERB": "verb",
    "X": "other",
    "EOL": "end of line",
    "SPACE": "space",
    # POS tags (English)
    # OntoNotes 5 / Penn Treebank
    # https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
    ".": "punctuation mark, sentence closer",
    ",": "punctuation mark, comma",
    "-LRB-": "left round bracket",
    "-RRB-": "right round bracket",
    "``": "opening quotation mark",
    '""': "closing quotation mark",
    "''": "closing quotation mark",
    ":": "punctuation mark, colon or ellipsis",
    "$": "symbol, currency",
    "#": "symbol, number sign",
    "AFX": "affix",
    "CC": "conjunction, coordinating",
    "CD": "cardinal number",
    "DT": "determiner",
    "EX": "existential there",
    "FW": "foreign word",
    "HYPH": "punctuation mark, hyphen",
    "IN": "conjunction, subordinating or preposition",
    "JJ": "adjective (English), other noun-modifier (Chinese)",
    "JJR": "adjective, comparative",
    "JJS": "adjective, superlative",
    "LS": "list item marker",
    "MD": "verb, modal auxiliary",
    "NIL": "missing tag",
    "NN": "noun, singular or mass",
    "NNP": "noun, proper singular",
    "NNPS": "noun, proper plural",
    "NNS": "noun, plural",
    "PDT": "predeterminer",
    "POS": "possessive ending",
    "PRP": "pronoun, personal",
    "PRP$": "pronoun, possessive",
    "RB": "adverb",
    "RBR": "adverb, comparative",
    "RBS": "adverb, superlative",
    "RP": "adverb, particle",
    "TO": 'infinitival "to"',
    "UH": "interjection",
    "VB": "verb, base form",
    "VBD": "verb, past tense",
    "VBG": "verb, gerund or present participle",
    "VBN": "verb, past participle",
    "VBP": "verb, non-3rd person singular present",
    "VBZ": "verb, 3rd person singular present",
    "WDT": "wh-determiner",
    "WP": "wh-pronoun, personal",
    "WP$": "wh-pronoun, possessive",
    "WRB": "wh-adverb",
    "SP": "space (English), sentence-final particle (Chinese)",
    "ADD": "email",
    "NFP": "superfluous punctuation",
    "GW": "additional word in multi-word expression",
    "XX": "unknown",
    "BES": 'auxiliary "be"',
    "HVS": 'forms of "have"',
    "_SP": "whitespace",
    # POS Tags (German)
    # TIGER Treebank
    # http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_introduction.pdf
    "$(": "other sentence-internal punctuation mark",
    "$,": "comma",
    "$.": "sentence-final punctuation mark",
    "ADJA": "adjective, attributive",
    "ADJD": "adjective, adverbial or predicative",
    "APPO": "postposition",
    "APPR": "preposition; circumposition left",
    "APPRART": "preposition with article",
    "APZR": "circumposition right",
    "ART": "definite or indefinite article",
    "CARD": "cardinal number",
    "FM": "foreign language material",
    "ITJ": "interjection",
    "KOKOM": "comparative conjunction",
    "KON": "coordinate conjunction",
    "KOUI": 'subordinate conjunction with "zu" and infinitive',
    "KOUS": "subordinate conjunction with sentence",
    "NE": "proper noun",
    "NNE": "proper noun",
    "PAV": "pronominal adverb",
    "PROAV": "pronominal adverb",
    "PDAT": "attributive demonstrative pronoun",
    "PDS": "substituting demonstrative pronoun",
    "PIAT": "attributive indefinite pronoun without determiner",
    "PIDAT": "attributive indefinite pronoun with determiner",
    "PIS": "substituting indefinite pronoun",
    "PPER": "non-reflexive personal pronoun",
    "PPOSAT": "attributive possessive pronoun",
    "PPOSS": "substituting possessive pronoun",
    "PRELAT": "attributive relative pronoun",
    "PRELS": "substituting relative pronoun",
    "PRF": "reflexive personal pronoun",
    "PTKA": "particle with adjective or adverb",
    "PTKANT": "answer particle",
    "PTKNEG": "negative particle",
    "PTKVZ": "separable verbal particle",
    "PTKZU": '"zu" before infinitive',
    "PWAT": "attributive interrogative pronoun",
    "PWAV": "adverbial interrogative or relative pronoun",
    "PWS": "substituting interrogative pronoun",
    "TRUNC": "word remnant",
    "VAFIN": "finite verb, auxiliary",
    "VAIMP": "imperative, auxiliary",
    "VAINF": "infinitive, auxiliary",
    "VAPP": "perfect participle, auxiliary",
    "VMFIN": "finite verb, modal",
    "VMINF": "infinitive, modal",
    "VMPP": "perfect participle, modal",
    "VVFIN": "finite verb, full",
    "VVIMP": "imperative, full",
    "VVINF": "infinitive, full",
    "VVIZU": 'infinitive with "zu", full',
    "VVPP": "perfect participle, full",
    "XY": "non-word containing non-letter",
    # POS Tags (Chinese)
    # OntoNotes / Chinese Penn Treebank
    # https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports
    "AD": "adverb",
    "AS": "aspect marker",
    "BA": "把 in ba-construction",
    # "CD": "cardinal number",
    "CS": "subordinating conjunction",
    "DEC": "的 in a relative clause",
    "DEG": "associative 的",
    "DER": "得 in V-de const. and V-de-R",
    "DEV": "地 before VP",
    "ETC": "for words 等, 等等",
    # "FW": "foreign words"
    "IJ": "interjection",
    # "JJ": "other noun-modifier",
    "LB": "被 in long bei-const",
    "LC": "localizer",
    "M": "measure word",
    "MSP": "other particle",
    # "NN": "common noun",
    "NR": "proper noun",
    "NT": "temporal noun",
    "OD": "ordinal number",
    "ON": "onomatopoeia",
    "P": "preposition excluding 把 and 被",
    "PN": "pronoun",
    "PU": "punctuation",
    "SB": "被 in short bei-const",
    # "SP": "sentence-final particle",
    "VA": "predicative adjective",
    "VC": "是 (copula)",
    "VE": "有 as the main verb",
    "VV": "other verb",
    # Noun chunks
    "NP": "noun phrase",
    "PP": "prepositional phrase",
    "VP": "verb phrase",
    "ADVP": "adverb phrase",
    "ADJP": "adjective phrase",
    "SBAR": "subordinating conjunction",
    "PRT": "particle",
    "PNP": "prepositional noun phrase",
    # Dependency Labels (English)
    # ClearNLP / Universal Dependencies
    # https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md
    "acl": "clausal modifier of noun (adjectival clause)",
    "acomp": "adjectival complement",
    "advcl": "adverbial clause modifier",
    "advmod": "adverbial modifier",
    "agent": "agent",
    "amod": "adjectival modifier",
    "appos": "appositional modifier",
    "attr": "attribute",
    "aux": "auxiliary",
    "auxpass": "auxiliary (passive)",
    "case": "case marking",
    "cc": "coordinating conjunction",
    "ccomp": "clausal complement",
    "clf": "classifier",
    "complm": "complementizer",
    "compound": "compound",
    "conj": "conjunct",
    "cop": "copula",
    "csubj": "clausal subject",
    "csubjpass": "clausal subject (passive)",
    "dative": "dative",
    "dep": "unclassified dependent",
    "det": "determiner",
    "discourse": "discourse element",
    "dislocated": "dislocated elements",
    "dobj": "direct object",
    "expl": "expletive",
    "fixed": "fixed multiword expression",
    "flat": "flat multiword expression",
    "goeswith": "goes with",
    "hmod": "modifier in hyphenation",
    "hyph": "hyphen",
    "infmod": "infinitival modifier",
    "intj": "interjection",
    "iobj": "indirect object",
    "list": "list",
    "mark": "marker",
    "meta": "meta modifier",
    "neg": "negation modifier",
    "nmod": "modifier of nominal",
    "nn": "noun compound modifier",
    "npadvmod": "noun phrase as adverbial modifier",
    "nsubj": "nominal subject",
    "nsubjpass": "nominal subject (passive)",
    "nounmod": "modifier of nominal",
    "npmod": "noun phrase as adverbial modifier",
    "num": "number modifier",
    "number": "number compound modifier",
    "nummod": "numeric modifier",
    "oprd": "object predicate",
    "obj": "object",
    "obl": "oblique nominal",
    "orphan": "orphan",
    "parataxis": "parataxis",
    "partmod": "participal modifier",
    "pcomp": "complement of preposition",
    "pobj": "object of preposition",
    "poss": "possession modifier",
    "possessive": "possessive modifier",
    "preconj": "pre-correlative conjunction",
    "prep": "prepositional modifier",
    "prt": "particle",
    "punct": "punctuation",
    "quantmod": "modifier of quantifier",
    "rcmod": "relative clause modifier",
    "relcl": "relative clause modifier",
    "reparandum": "overridden disfluency",
    "root": "root",
    "ROOT": "root",
    "vocative": "vocative",
    "xcomp": "open clausal complement",
    # Dependency labels (German)
    # TIGER Treebank
    # http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_introduction.pdf
    # currently missing: 'cc' (comparative complement) because of conflict
    # with English labels
    "ac": "adpositional case marker",
    "adc": "adjective component",
    "ag": "genitive attribute",
    "ams": "measure argument of adjective",
    "app": "apposition",
    "avc": "adverbial phrase component",
    "cd": "coordinating conjunction",
    "cj": "conjunct",
    "cm": "comparative conjunction",
    "cp": "complementizer",
    "cvc": "collocational verb construction",
    "da": "dative",
    "dh": "discourse-level head",
    "dm": "discourse marker",
    "ep": "expletive es",
    "hd": "head",
    "ju": "junctor",
    "mnr": "postnominal modifier",
    "mo": "modifier",
    "ng": "negation",
    "nk": "noun kernel element",
    "nmc": "numerical component",
    "oa": "accusative object",
    "oc": "clausal object",
    "og": "genitive object",
    "op": "prepositional object",
    "par": "parenthetical element",
    "pd": "predicate",
    "pg": "phrasal genitive",
    "ph": "placeholder",
    "pm": "morphological particle",
    "pnc": "proper noun component",
    "rc": "relative clause",
    "re": "repeated element",
    "rs": "reported speech",
    "sb": "subject",
    "sbp": "passivized subject (PP)",
    "sp": "subject or predicate",
    "svp": "separable verb prefix",
    "uc": "unit component",
    "vo": "vocative",
    # Named Entity Recognition
    # OntoNotes 5
    # https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf
    "PERSON": "People, including fictional",
    "NORP": "Nationalities or religious or political groups",
    "FACILITY": "Buildings, airports, highways, bridges, etc.",
    "FAC": "Buildings, airports, highways, bridges, etc.",
    "ORG": "Companies, agencies, institutions, etc.",
    "GPE": "Countries, cities, states",
    "LOC": "Non-GPE locations, mountain ranges, bodies of water",
    "PRODUCT": "Objects, vehicles, foods, etc. (not services)",
    "EVENT": "Named hurricanes, battles, wars, sports events, etc.",
    "WORK_OF_ART": "Titles of books, songs, etc.",
    "LAW": "Named documents made into laws.",
    "LANGUAGE": "Any named language",
    "DATE": "Absolute or relative dates or periods",
    "TIME": "Times smaller than a day",
    "PERCENT": 'Percentage, including "%"',
    "MONEY": "Monetary values, including unit",
    "QUANTITY": "Measurements, as of weight or distance",
    "ORDINAL": '"first", "second", etc.',
    "CARDINAL": "Numerals that do not fall under another type",
    # Named Entity Recognition
    # Wikipedia
    # http://www.sciencedirect.com/science/article/pii/S0004370212000276
    # https://pdfs.semanticscholar.org/5744/578cc243d92287f47448870bb426c66cc941.pdf
    "PER": "Named person or family.",
    "MISC": "Miscellaneous entities, e.g. events, nationalities, products or works of art",
    # https://github.com/ltgoslo/norne
    "EVT": "Festivals, cultural events, sports events, weather phenomena, wars, etc.",
    "PROD": "Product, i.e. artificially produced entities including speeches, radio shows, programming languages, contracts, laws and ideas",
    "DRV": "Words (and phrases?) that are dervied from a name, but not a name in themselves, e.g. 'Oslo-mannen' ('the man from Oslo')",
    "GPE_LOC": "Geo-political entity, with a locative sense, e.g. 'John lives in Spain'",
    "GPE_ORG": "Geo-political entity, with an organisation sense, e.g. 'Spain declined to meet with Belgium'",
}
Puddle answered 31/5, 2023 at 20:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.