Recognize partial/complete address with NLP framework
Asked Answered
D

1

10

I was wondering the amount of work on NLP framework to get partial (without city) or complete postal address extraction with NLP frameworks from unstructured text? Are NLP frameworks efficient to do this? Also, how difficult is it to "train" Named Entity Recognition modules to match new locations ?

Damp answered 16/11, 2014 at 8:50 Comment(0)
E
10

As long as most addresses are correctly formatted and regular, i.e. contain contact name, street number, street name, separated by commas, you may find rule-based frameworks.

Using unstructured or partially structured text will require more preprocessing and statistics e.g. morpho-syntax and CRF. Stanford tools are the most popular for this purpose. It may also be an interresting direction to search for corpus containing intermediary annotations: not only "LOC", but also "NUMBER", "STREETNAME", "CITY", etc. so as to be able to extract location even if they are not complete. For this kind of annotation, you may have a look at tree-structured approaches.

So the amount of work mostly depends on how much regular are expressions you are looking for.

Evolute answered 19/11, 2014 at 14:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.