DBpedia content is in RDF format. The dumps can be download from here
Dbpedia is a large dataset in RDF, for handling that amount of data you need to use Triple Store technology. For Dbpedia you will need one of native triple stores, I recommend you to use either Virtuoso or 4store. I personally prefer 4store.
Once you have your triple store set up with Dbpedia in it. You can use SPARQL to query the Dbpedia RDF triples. There are Python libraries that can help you with that. 4store and Virtuoso can give you results back in JSON so you can easily get by without any libraries.
Some simple urllib script like ...
def query(q,epr,f='application/json'):
try:
params = {'query': q}
params = urllib.urlencode(params)
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request(epr+'?'+params)
request.add_header('Accept', f)
request.get_method = lambda: 'GET'
url = opener.open(request)
return url.read()
except Exception, e:
traceback.print_exc(file=sys.stdout)
raise e
can help you out to run SPARQL ... for instance
>>> q1 = """
... select ?birthPlace where {
... <http://dbpedia.org/resource/Claude_Monet> <http://dbpedia.org/property/birthPlace> ?birthPlace .
... }"""
>>> print query(q1,"http://dbpedia.org/sparql")
{ "head": { "link": [], "vars": ["birthPlace"] },
"results": { "distinct": false, "ordered": true, "bindings": [
{ "birthPlace": { "type": "literal", "xml:lang": "en", "value": "Paris, France" }} ] } }
>>>
I hope this gives you an idea of how to start.