Named entity recognition using freebase
Asked Answered
W

1

7

I understand DBPedia spotlight does Named Entity recognition on a given document. To do that it uses the downloaded DBPedia files that are stored in the file system.Refer the URL:https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Run-from-a-JAR.

What I need is an equivalent API like spotlight for Freebase. As much as I browsed I could not find any such tool/API that operates on Freebase triple store. Could some one help?

Westerly answered 27/12, 2013 at 6:55 Comment(0)
C
3

There is currently no equivalent project for named entity recognition in Freebase. However, Freebase has links to DBpedia on sameAs.org so you can use DBpedia spotlight and then resolve the IDs back to Freebase (that data is also available in the Freebase RDF dumps).

If you're looking for a coding project in this area, I think it should be possible to adapt the DBpedia Spotlight code so that you can train its models using Freebase data. The main benefit of this would be that Freebase covers a wider range of entities than DBpedia so you'd get better recall. Also, you may be able to exploit other data in Freebase like "notable types" to get better precision as well.

You should be able to get a good set of "surface forms" of the entity by looking at the /type/object/name and /common/topic/alias properties in Freebase. Any Freebase entity that corresponds to a Wikpedia page will have one or more /type/object/key values in the /wikipedia/en namespace. These correspond to the Wikipedia page names (and redirects) which will allow you to parse through the Wikipedia XML dumps and identify which links on the page correspond to Freebase topics. The Freebase key encoding scheme is described here.

You might also be interested in OpenCalais and AlchemyAPI which provide named entity recognition as a service and provide Freebase IDs in their API responses.

Cusk answered 27/12, 2013 at 7:37 Comment(4)
Thanks so much for the response. It helps. Yes, I am looking for API to do Named Entity Recognition(NER), using any Triple store that comprises of data from Freebase + triples from other sources. From what I understand NER is not just RDF lookup and it has certain sub-steps(which I am not aware of and also not sure those steps are standard). In case if there are any standard steps could you please highlight? Also started going through spotlight source code right now.Westerly
As per the links github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/… and github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/bin/…; that explain about how to use spotlight with own data, the final output is a lucene index out of the (dbpedia)occurrences.In other words Spotlight takes lucene index to dbpedia entity occurrences and annotates entities. Any comments?Westerly
There are a many possible steps in named entity recognition: tokenizing, tagging, looking up entity names, entity type disambiguation, coreference resolution, entity identity reconciliation. I've added some more details to my answer but you should really ask on the DBpedia mailing list if you need help running it with your own data.Cusk
@ShawnSimister I thought your video about the api on the freebase site was quite well done. Given your expertise in this subject perhaps you might consider my recent question?Paredes

© 2022 - 2024 — McMap. All rights reserved.