Unable to use Stanford NER in python module

Asked 16/4, 2016 at 18:54 Answered 7/5, 2019 at 5:46

Solved python python-2.7 nlp stanford-nlp named-entity-recognition

I want to use Python Stanford NER module but keep getting an error,I searched it on internet but got nothing. Here is the basic usage with error.

import ner
tagger = ner.HttpNER(host='localhost', port=8080)
tagger.get_entities("University of California is located in California,   

United States")

Error

Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
tagger.get_entities("University of California is located in California, United States")
File "C:\Python27\lib\site-packages\ner\client.py", line 81, in get_entities
tagged_text = self.tag_text(text)
File "C:\Python27\lib\site-packages\ner\client.py", line 165, in tag_text
c.request('POST', self.location, params, headers)
File "C:\Python27\lib\httplib.py", line 1057, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 1097, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 1053, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 897, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 859, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 836, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 575, in create_connection
raise err
error: [Errno 10061] No connection could be made because the target machine actively refused it

Using windows 10 with latest Java installed

Mundford answered 16/4, 2016 at 18:54 Comment(11)

This should be a silly question, but you are running a web server serving port 80 on your computer, right? ...and it displays a web page when you type in localhost in a browser, right? – Excursionist 16/4, 2016 at 19:1

Ya it can be, just a newbie, trying out things, can you help me out ? – Mundford 16/4, 2016 at 19:3

Sorry, are you running a web server or not?? Your program looks like it's trying to read data from the main page of a website at localhost (ie. your computer). If you don't know whether you are running a web server or not, then you are (almost certainly) not. What did you think this program might do? What exactly are you trying to do? – Excursionist 16/4, 2016 at 19:7

I want to use Stanford NER through python to identify names and places from text, I followed its documentation which had this same code. I am using all this code in python IDLE. – Mundford 16/4, 2016 at 19:10

Could you add a link to its documentation which has this code? – Excursionist 16/4, 2016 at 19:11

pypi.python.org/pypi/ner – Mundford 16/4, 2016 at 19:13

github.com/dat/pyner – Mundford 16/4, 2016 at 19:14

I had this issue when i started with the NER too. Like @Excursionist is saying the stanford NER is a serperate service. You need to start that service up separately before you run your python code. – Makepeace 16/4, 2016 at 19:18

Heres how you go about running the stanford NER nlp.stanford.edu/software/CRF-NER.shtml. On windows you should run the .bat file that is in the stanford NER folder – Makepeace 16/4, 2016 at 19:21

@Craicerjack: It looks like you have a solution. You should probably collect your comments as an answer. – Excursionist 16/4, 2016 at 19:22

@Excursionist , I am able to use NER through that .bat file, but I need to use it via python, I guess I have to search something for starting NER server, and try using some other value in host and port, Thanks guys. – Mundford 16/4, 2016 at 19:36

The Python Stanford NER module is a wrapper for the Stanford NER that allows you to run python commands to use the NER service.
The NER service is a separate entity to the Python module. It is a Java program. To access this service, via python, or any other way, you first need to start the service.
Details on how to start the Java Program/service can be found here - http://nlp.stanford.edu/software/CRF-NER.shtml
The NER comes with a .bat file for windows and a .sh file for unix/linux. I think these files start the GUI
To start the service without the GUI you should run a command similar to this:
java -mx600m -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz
This runs the NER jar, sets the memory, and sets the classifier you want to use. (I think youll have to be in the Stanford NER directory to run this)
Once the NER program is running then you will be able to run your python code and query the NER.

Makepeace answered 16/4, 2016 at 19:30 Comment(2)

Now I know, what I have to do, Thanks alot @Makepeace – Mundford 16/4, 2016 at 19:55

@Zaibi: If this answers your question, you should mark it as accepted – Excursionist 16/4, 2016 at 20:24

This is the complete Stanford NER script in python 3x

This code will read each text file from "TextFilestoTest" folder and detect entities and store in a data frame (Testing)

import os
import nltk
import pandas as pd
import collections

from nltk.tag import StanfordNERTagger
from nltk.tokenize import word_tokenize


stanford_classifier = 'ner-trained-EvensTrain.ser.gz'
stanford_ner_path = 'stanford-ner.jar'

# Creating Tagger Object
st = StanfordNERTagger(stanford_classifier, stanford_ner_path, encoding='utf-8')

java_path = "C:/Program Files (x86)/Java/jre1.8.0_191/bin/java.exe"
os.environ['JAVAHOME'] = java_path


def get_continuous_chunks(tagged_sent):
    continuous_chunk = []
    current_chunk = []

    for token, tag in tagged_sent:
        if tag != "0":
            current_chunk.append((token, tag))
        else:
            if current_chunk: # if the current chunk is not empty
                continuous_chunk.append(current_chunk)
                current_chunk = []
    # Flush the final current_chunk into the continuous_chunk, if any.
    if current_chunk:
        continuous_chunk.append(current_chunk)
    return continuous_chunk

TestFiles = './TextFilestoTest/'
files_path = os.listdir(TestFiles)    
Test = {}

for i in files_path:
    p = (TestFiles+i)
    g= (os.path.splitext(i)[0])
    Test[str(g)] = open(p, 'r').read()

## Predict labels of all words of 200 text files and inserted into dataframe
df_fin = pd.DataFrame(columns = ["filename","Word","Label"])
for i in Test:
    test_text = Test[i]
    test_text = test_text.replace("\n"," ")
    tokenized_text = test_text.split(" ")
    classified_text = st.tag(tokenized_text)
    ne_tagged_sent = classified_text
    named_entities = get_continuous_chunks(ne_tagged_sent)

    flat_list = [item for sublist in named_entities for item in sublist]

    for fl in flat_list:
        df_ = pd.DataFrame()
        df_["filename"]  = [i]
        df_["Word"]  = [fl[0]]
        df_["Label"]  = [fl[1]]
        df_fin = df_fin.append(df_)

df_fin_vone = pd.DataFrame(columns = ["filename","Word","Label"])
test_files_len = list(set(df_fin['filename']))

If any questions comment below, I will answer. Thank you

Bavardage answered 7/5, 2019 at 5:46 Comment(0)

Recommended topics

Hot tags