Example python script that uses DBPedia?
Asked Answered
S

2

14

I am writing a python script to extract "Entity names" from a collection of thousands of news articles from a few countries and languages.

I would like to make use of the amazing DBPedia structured knwoledge, say for example to look up the names of "artists in egypt" and names of "companies in Canada".

(If these information was in SQL form, I would have had no problem.)

I would prefer to download the DBPedia content and use it offline. any ideas of what is needed to do so and how to query it locally from python ?

Skelton answered 20/9, 2011 at 15:34 Comment(0)
A
18

DBpedia content is in RDF format. The dumps can be download from here

Dbpedia is a large dataset in RDF, for handling that amount of data you need to use Triple Store technology. For Dbpedia you will need one of native triple stores, I recommend you to use either Virtuoso or 4store. I personally prefer 4store.

Once you have your triple store set up with Dbpedia in it. You can use SPARQL to query the Dbpedia RDF triples. There are Python libraries that can help you with that. 4store and Virtuoso can give you results back in JSON so you can easily get by without any libraries.

Some simple urllib script like ...

def query(q,epr,f='application/json'):
    try:
        params = {'query': q}
        params = urllib.urlencode(params)
        opener = urllib2.build_opener(urllib2.HTTPHandler)
        request = urllib2.Request(epr+'?'+params)
        request.add_header('Accept', f)
        request.get_method = lambda: 'GET'
        url = opener.open(request)
        return url.read()
    except Exception, e:
        traceback.print_exc(file=sys.stdout)
        raise e 

can help you out to run SPARQL ... for instance

>>> q1 = """
... select ?birthPlace where {
... <http://dbpedia.org/resource/Claude_Monet> <http://dbpedia.org/property/birthPlace> ?birthPlace .
...  }"""
>>> print query(q1,"http://dbpedia.org/sparql")

{ "head": { "link": [], "vars": ["birthPlace"] },
  "results": { "distinct": false, "ordered": true, "bindings": [
    { "birthPlace": { "type": "literal", "xml:lang": "en", "value": "Paris, France" }} ] } }
>>> 

I hope this gives you an idea of how to start.

Aluminum answered 20/9, 2011 at 18:0 Comment(3)
Thx @msalvadores. This works fine with DBPedia.org. Still need to make it work locally on a win7 machine. So definetely Virtuoso (4store only linux). But still could not find a good install tutorial for the windows platformSkelton
Even for Virtuoso you would better of with Linux. In case you want to stick with Virtuoso look at this one virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSUsageWindows Also bear in mind that if you want to load all DBPEDIA you will need a decent powerful machine, maybe a commodity server.Aluminum
can't vote you up because I still do not have enough reputation. But selected your answer as correct!Skelton
A
5

In python3 the answer will look like this using the requests library:

def query(q, epr, f='application/json'):
    try:
        params = {'query': q}
        resp = requests.get(epr, params=params, headers={'Accept': f})
        return resp.text
    except Exception as e:
        print(e, file=sys.stdout)
        raise
Aminopyrine answered 6/1, 2016 at 7:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.