spaCy token.tag_ full list
Asked Answered
E

6

17

The official documentation of token.tag_ in spaCy is as follows:

A fine-grained, more detailed tag that represents the word-class and some basic morphological information for the token. These tags are primarily designed to be good features for subsequent models, particularly the syntactic parser. They are language and treebank dependent. The tagger is trained to predict these fine-grained tags, and then a mapping table is used to reduce them to the coarse-grained .pos tags.

But it doesn't list the full available tags and each tag's explanation. Where can I find it?

Eatage answered 3/6, 2016 at 9:46 Comment(0)
E
36

Finally I found it inside spaCy's source code: glossary.py. And this link explains the meaning of different tags.

Eatage answered 3/6, 2016 at 10:40 Comment(6)
Have you found a way to programmatically get this map from spacy?Asomatous
Answering my own comment - the Tokenizer has the right method - nlp.tokenizer.vocab.morphology.tag_mapAsomatous
@thuzhf Does . and X belong to any parts-of-speech or it's just treated as Foreign or Unknown?Petrochemical
This is the latest link: github.com/explosion/spaCy/blob/master/spacy/glossary.pyWindywindzer
I know this is late, but you could just from spacy.glossary import GLOSSARY lookup_dict = GLOSSARYGladiator
Hi there, I am in 2023 and spent a lot of time trying to find this glossary information! I was googling around like crazy and even tried to GPT that answer for me. Thanks!Writhe
H
6

Available values for token.tag_ are language specific. With language here, I don't mean English or Portuguese, I mean 'en_core_web_sm' or 'pt_core_news_sm'. In other words, they are language model specific and they are defined in the TAG_MAP, which is customizable and trainable. If you don't customize it, it will be default TAG_MAP for that language.

As of the writing of this answer, spacy.io/models lists all of the pre trained models and their labeling scheme.

Now, for the explanations. If you are working with English or German text, you're in luck! You can use spacy.explain() or access its glossary on github for the full list. If you are working with other languages, token.pos_ values are always those of Universal dependencies and will work regardless.

To finish up, if you are working with other languages, for a full explanation of the tags, you are going to have to look for them in the sources listed in the models page for your model of interest. For instance, for Portuguese I had to track the explanations for the tags in the Portuguese UD Bosque Corpus used to train the model.

Heliotaxis answered 19/2, 2020 at 15:7 Comment(0)
E
4

Here is the list of tags:

TAG_MAP = [
    ".",        
    ",",        
    "-LRB-",    
    "-RRB-",    
    "``",       
    "\"\"",     
    "''",       
    ",",        
    "$",        
    "#",        
    "AFX",      
    "CC",       
    "CD",       
    "DT",       
    "EX",       
    "FW",       
    "HYPH",     
    "IN",       
    "JJ",       
    "JJR",      
    "JJS",      
    "LS",       
    "MD",       
    "NIL",      
    "NN",       
    "NNP",      
    "NNPS",     
    "NNS",   
    "PDT",   
    "POS",   
    "PRP",   
    "PRP$",  
    "RB",    
    "RBR",   
    "RBS",   
    "RP",    
    "SP",    
    "SYM",   
    "TO",    
    "UH",    
    "VB",    
    "VBD",  
    "VBG",  
    "VBN",  
    "VBP",  
    "VBZ",  
    "WDT",  
    "WP",   
    "WP$",  
    "WRB",  
    "ADD",  
    "NFP",   
    "GW",    
    "XX",    
    "BES",   
    "HVS",   
    "_SP",   
]
Eclosion answered 24/5, 2018 at 20:46 Comment(1)
Thanks for putting up the list here. Does X belong to any parts-of-speech or it's just treated as Foreign or Unknown?Petrochemical
M
1

Here is the list of tags and POS Spacy uses in the below link.

https://spacy.io/api/annotation

  1. Universal parts of speech tags
  2. English
  3. German
Mucilaginous answered 8/9, 2020 at 9:42 Comment(0)
A
1

You can get an explaination using

from spacy import glossary
tag_name = 'ADP'
glossary.explain(tag_name)

Version: 3.3.0

Source: https://github.com/explosion/spaCy/blob/master/spacy/glossary.py

August answered 23/5, 2022 at 0:29 Comment(0)
F
0

You can use below:

dir(spacy.parts_of_speech)

Foulard answered 11/10, 2022 at 15:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.