How to use DBPedia to extract Tags/Keywords from content?
Asked Answered
I

2

20

I am exploring how I can use Wikipedia's taxonomy information to extract Tags/Keywords from my content.

I found articles about DBPedia. DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.

Has anyone used their web services? Do you know how they work and how reliable it is?

Isidraisidro answered 20/1, 2011 at 13:58 Comment(0)
L
21

DBpedia is a fantastic, high quality resource. In order to turn your content into a set of relevant DBpedia concepts, however, you will need to accurately identify them in your text, which involves at least two steps:

  1. Identify DBpedia concepts in your content: This includes recognizing concept names (and alternate names) in text, and also disambiguating among all possible meanings of each phrase. The term "Sun" may refer to dozens of possible concepts according to its disambiguation page including a star, newspapers, person names, etc. This involves entity identification, classification, and linking.

  2. Identify which of those concepts are interesting: For example, do you want the concept "Definite article" showing up when text includes the term "the" (which The redirects to)?

You may want to consider a preexisting text analytics library or service, which supports entity linking to DBpedia. One great tool for topic indexing is Maui, which was developed by Alyona Medelyan during her PhD. Another great open source solution is Wikipedia Miner by David Milne at the same university.

Two commercial services which provide linking to DBpedia concepts are Zemanta and Extractiv (allow some level of free use). DBpedia spotlight option. Others which may provide these capabilities are listed at: https://stackoverflow.com/questions/2119279/is-there-a-better-tool-than-opencalais

Disclosure: I [used to] work at Extractiv (defunct), which is powered by Language Computer Corporation's NLP.

Lumber answered 20/1, 2011 at 16:52 Comment(0)
P
4

You can use Apache Stanbol for this process. Entityhub component of Apache Stanbol provides producing custom DBPedia indexes based on your needs. Then you can use Enhancer component to extract Places, Persons, Locations entities from your text.

Following mail thread may be helpful for you.
http://markmail.org/message/52266yl5ohijxiof

You can access running demos of Apache Stanbol from the following link:
http://dev.iks-project.eu/

You can also ask your further questions to stanbol-dev AT incubator.apache.org.

Padua answered 26/10, 2011 at 20:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.