I'm currently on a learning project to extract an individuals name from their CV/Resume.
Currently I'm working with Stanford-NER and OpenNLP which both perform with a degree of success out of the box on, tending to struggle on "non-western" type names (no offence intended towards anybody).
My question is - given the general lack of sentence structure or context in relation to an individuals name in a CV/Resume, am I likely to gain any significant improvement in name identification by creating something akin to a CV corpus?
My initial thoughts are that I'd probably have a more success by sentence splitting, removing obvious text and applying a bit of logic to make a best guess on the individual's name.
I can see how training would work if the a name appears in within a structured sentence, however as a standalone entity without context (Akbar Agho for example) I suspect it will struggle regardless of the training.
Is there a level of AI that if given enough data would begin to formulate a pattern for finding a name or should I maybe just go for applying a level of logic based string extraction?
I'd appreciate people's thoughts, opinions and suggestions.
Side note: I having been using PHP with Appache Tika to do the initial text extraction from Doc/Pdf and am experimenting with Stanford and OpenNLP via PHP/Commandline.
Chris