i just have started with Stanford CoreNLP, I would like to build a custom NER model to find persons.
Unfortunately, I did not find a good ner model for italian. I need to find these entities inside a resume/CV document.
The problem here is that document like those can have different structure, for example i can have:
CASE 1
- Name: John
- Surname: Travolta
- Last name: Travolta
- Full name: John Travolta
(so many labels that can represent the entity of the person i need to extract)
CASE 2
My name is John Travolta and I was born ...
Basically, i can have structured data (with different labels) or a context where i should find these entities.
What is the best approach for this kind of documents? Can a maxent model work in this case?
EDIT @vihari-piratla
At the moment, i adopt the strategy to find a pattern that has something on the left and something on the right, following this method i have 80/85% to find the entity.
Example:
Name: John
Birthdate: 2000-01-01
It means that i have "Name:" on the left of the pattern and a \n on the right (until it finds the \n). I can create a very long list of patterns like those. I thought about patterns because i do not need names inside "other" context.
For example, if the user writes other names inside a job experience i do not need them. Because i am looking for the personal name, not others. With this method i can reduce false positives because i will look at specific patterns not "general names".
A problem with this method is that i have a big list of patterns (1 pattern = 1 regex), so it does not scale so well if i add others.
If i can train a NER model with all those patterns it will be awesome, but i should use tons of documents to train it well.