Search for job titles in an article using Spacy or NLTK
Asked Answered
D

2

7

I'm new to NLP and recently been playing with NTLK and Spacy. However, I could not find a way to search for job titles (ex: product manager, chief marketing officer, etc) in an article.

Example, I have 1000 articles and I want to get all the articles that have job titles that I am interested in.

Also, what entity type does job titles fall in? I check https://spacy.io/docs/usage/entity-recognition and did not see it in there. I there a plan to add it?

Thanks.

Dermatogen answered 30/12, 2016 at 18:27 Comment(2)
yes, job title in the limited context you mention is some type of NE, but I believe you would have to know what words you are looking for, or specific features that you would like to captureCointreau
a job title is a type of NP POS tag which is usually an ORG related entity tag. Basically it sounds like you want a job title tagger. You may want to try and make a list of job titles, and extract features for those job titles then make a tagger yourself. Will work better for your domain of knowledge.Cointreau
K
8

"Job Titles" entity is not supported by Spacy NER, as also stated by Nathan. But you can create a custom named entity for your use case. Here is official documentation link. You can find step by step guide to train Spacy NER there.

You would need labeled data to train your NER. Generally you would need atleast 4000-5000 examples for train and 2000 examples for test. The more training data you have, the better will be the NER performance.

Here is some sample training data.

TRAIN_DATA = [
    ('Who is Shaka Khan?', {
        'entities': [(7, 17, 'PERSON')]
    }),
    ('I like London and Berlin.', {
        'entities': [(7, 13, 'LOC'), (18, 24, 'LOC')]
    }),
    ('I work as software engineer.', {
        'entities': [(9, 18, 'JOBTITLE')]
    }),

]
Kropotkin answered 1/1, 2018 at 10:8 Comment(0)
P
2

Stanford NER supports Titles (not perfect though). See demo page at http://corenlp.run/

Petrol answered 12/7, 2018 at 22:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.