Unable to use Stanford NER in python module
Asked Answered
M

2

1

I want to use Python Stanford NER module but keep getting an error,I searched it on internet but got nothing. Here is the basic usage with error.

import ner
tagger = ner.HttpNER(host='localhost', port=8080)
tagger.get_entities("University of California is located in California,   

United States")

Error

Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
tagger.get_entities("University of California is located in California, United States")
File "C:\Python27\lib\site-packages\ner\client.py", line 81, in get_entities
tagged_text = self.tag_text(text)
File "C:\Python27\lib\site-packages\ner\client.py", line 165, in tag_text
c.request('POST', self.location, params, headers)
File "C:\Python27\lib\httplib.py", line 1057, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 1097, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 1053, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 897, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 859, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 836, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 575, in create_connection
raise err
error: [Errno 10061] No connection could be made because the target machine actively refused it

Using windows 10 with latest Java installed

Mundford answered 16/4, 2016 at 18:54 Comment(11)
This should be a silly question, but you are running a web server serving port 80 on your computer, right? ...and it displays a web page when you type in localhost in a browser, right?Excursionist
Ya it can be, just a newbie, trying out things, can you help me out ?Mundford
Sorry, are you running a web server or not?? Your program looks like it's trying to read data from the main page of a website at localhost (ie. your computer). If you don't know whether you are running a web server or not, then you are (almost certainly) not. What did you think this program might do? What exactly are you trying to do?Excursionist
I want to use Stanford NER through python to identify names and places from text, I followed its documentation which had this same code. I am using all this code in python IDLE.Mundford
Could you add a link to its documentation which has this code?Excursionist
pypi.python.org/pypi/nerMundford
github.com/dat/pynerMundford
I had this issue when i started with the NER too. Like @Excursionist is saying the stanford NER is a serperate service. You need to start that service up separately before you run your python code.Makepeace
Heres how you go about running the stanford NER nlp.stanford.edu/software/CRF-NER.shtml. On windows you should run the .bat file that is in the stanford NER folderMakepeace
@Craicerjack: It looks like you have a solution. You should probably collect your comments as an answer.Excursionist
@Excursionist , I am able to use NER through that .bat file, but I need to use it via python, I guess I have to search something for starting NER server, and try using some other value in host and port, Thanks guys.Mundford
M
1
  • The Python Stanford NER module is a wrapper for the Stanford NER that allows you to run python commands to use the NER service.
  • The NER service is a separate entity to the Python module. It is a Java program. To access this service, via python, or any other way, you first need to start the service.
  • Details on how to start the Java Program/service can be found here - http://nlp.stanford.edu/software/CRF-NER.shtml
  • The NER comes with a .bat file for windows and a .sh file for unix/linux. I think these files start the GUI

  • To start the service without the GUI you should run a command similar to this:
    java -mx600m -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz
    This runs the NER jar, sets the memory, and sets the classifier you want to use. (I think youll have to be in the Stanford NER directory to run this)

  • Once the NER program is running then you will be able to run your python code and query the NER.

Makepeace answered 16/4, 2016 at 19:30 Comment(2)
Now I know, what I have to do, Thanks alot @MakepeaceMundford
@Zaibi: If this answers your question, you should mark it as acceptedExcursionist
B
0
  • This is the complete Stanford NER script in python 3x

This code will read each text file from "TextFilestoTest" folder and detect entities and store in a data frame (Testing)

import os
import nltk
import pandas as pd
import collections

from nltk.tag import StanfordNERTagger
from nltk.tokenize import word_tokenize


stanford_classifier = 'ner-trained-EvensTrain.ser.gz'
stanford_ner_path = 'stanford-ner.jar'

# Creating Tagger Object
st = StanfordNERTagger(stanford_classifier, stanford_ner_path, encoding='utf-8')

java_path = "C:/Program Files (x86)/Java/jre1.8.0_191/bin/java.exe"
os.environ['JAVAHOME'] = java_path


def get_continuous_chunks(tagged_sent):
    continuous_chunk = []
    current_chunk = []

    for token, tag in tagged_sent:
        if tag != "0":
            current_chunk.append((token, tag))
        else:
            if current_chunk: # if the current chunk is not empty
                continuous_chunk.append(current_chunk)
                current_chunk = []
    # Flush the final current_chunk into the continuous_chunk, if any.
    if current_chunk:
        continuous_chunk.append(current_chunk)
    return continuous_chunk

TestFiles = './TextFilestoTest/'
files_path = os.listdir(TestFiles)    
Test = {}

for i in files_path:
    p = (TestFiles+i)
    g= (os.path.splitext(i)[0])
    Test[str(g)] = open(p, 'r').read()

## Predict labels of all words of 200 text files and inserted into dataframe
df_fin = pd.DataFrame(columns = ["filename","Word","Label"])
for i in Test:
    test_text = Test[i]
    test_text = test_text.replace("\n"," ")
    tokenized_text = test_text.split(" ")
    classified_text = st.tag(tokenized_text)
    ne_tagged_sent = classified_text
    named_entities = get_continuous_chunks(ne_tagged_sent)

    flat_list = [item for sublist in named_entities for item in sublist]

    for fl in flat_list:
        df_ = pd.DataFrame()
        df_["filename"]  = [i]
        df_["Word"]  = [fl[0]]
        df_["Label"]  = [fl[1]]
        df_fin = df_fin.append(df_)

df_fin_vone = pd.DataFrame(columns = ["filename","Word","Label"])
test_files_len = list(set(df_fin['filename']))

If any questions comment below, I will answer. Thank you

Bavardage answered 7/5, 2019 at 5:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.