How can I extract GPE(location) using NLTK ne_chunk?

Asked 7/2, 2018 at 9:46 Answered 27/12, 2018 at 11:47

Solved python geolocation nlp nltk named-entity-recognition

I am trying to implement a code to check for the weather condition of a particular area using OpenWeatherMap API and NLTK to find entity name recognition. But I am not able to find the method of passing the entity present in GPE(that gives the location), in this case, Chicago, to my API request. Kindly help me with the syntax.The code to given below.

Thank you for your assistance

import nltk
from nltk import load_parser
import requests
import nltk
from nltk import word_tokenize
from nltk.corpus import stopwords

sentence = "What is the weather in Chicago today? "
tokens = word_tokenize(sentence)

stop_words = set(stopwords.words('english'))

clean_tokens = [w for w in tokens if not w in stop_words]

tagged = nltk.pos_tag(clean_tokens)

print(nltk.ne_chunk(tagged))

Quean answered 7/2, 2018 at 9:46 Comment(0)

The GPE is a Tree object's label from the pre-trained ne_chunk model.

>>> from nltk import word_tokenize, pos_tag, ne_chunk
>>> sent = "What is the weather in Chicago today?"
>>> ne_chunk(pos_tag(word_tokenize(sent)))
Tree('S', [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('weather', 'NN'), ('in', 'IN'), Tree('GPE', [('Chicago', 'NNP')]), ('today', 'NN'), ('?', '.')])

To traverse the tree, see How to Traverse an NLTK Tree object?

Perhaps, you're looking for something that's a slight modification to NLTK Named Entity recognition to a Python list

from nltk import word_tokenize, pos_tag, ne_chunk
from nltk import Tree

def get_continuous_chunks(text, label):
    chunked = ne_chunk(pos_tag(word_tokenize(text)))
    prev = None
    continuous_chunk = []
    current_chunk = []

    for subtree in chunked:
        if type(subtree) == Tree and subtree.label() == label:
            current_chunk.append(" ".join([token for token, pos in subtree.leaves()]))
        if current_chunk:
            named_entity = " ".join(current_chunk)
            if named_entity not in continuous_chunk:
                continuous_chunk.append(named_entity)
                current_chunk = []
        else:
            continue

    return continuous_chunk

[out]:

>>> sent = "What is the weather in New York today?"
>>> get_continuous_chunks(sent, 'GPE')
['New York']

>>> sent = "What is the weather in New York and Chicago today?"
>>> get_continuous_chunks(sent, 'GPE')
['New York', 'Chicago']

>>> sent = "What is the weather in New York"
>>> get_continuous_chunks(sent, 'GPE')
['New York']

>>> sent = "What is the weather in New York and Chicago"
>>> get_continuous_chunks(sent, 'GPE')
['New York', 'Chicago']

Naker answered 8/2, 2018 at 1:46 Comment(0)

Here is the solution, which i would like to propose for your kind of situation:

Step 1. Word_tokenize,POS_tagging,Name Entity recognition: Code is this :

    Xstring = "What is the weather in New York and Chicago today?"

    tokenized_doc  = word_tokenize(Xstring)
    tagged_sentences = nltk.pos_tag(tokenized_doc )
    NE= nltk.ne_chunk(tagged_sentences )
    NE.draw()

Step 2. Extract all named entity after name entity recognition(done above)

    named_entities = []
    for tagged_tree in NE:
       print(tagged_tree)
       if hasattr(tagged_tree, 'label'):
          entity_name = ' '.join(c[0] for c in tagged_tree.leaves()) #
          entity_type = tagged_tree.label() # get NE category
          named_entities.append((entity_name, entity_type))

     print(named_entities)  #all entities will be printed,check at your end once

Step 3.Now extract only GPE tags

   for tag in named_entities:
      #print(tag[1])
      if tag[1]=='GPE':   #Specify any tag which is required
        print(tag)

Here is my output :

  ('New York', 'GPE')
  ('Chicago', 'GPE')

Inveteracy answered 27/12, 2018 at 11:47 Comment(0)

Recommended topics

Hot tags