Load DBpedia locally using Jena TDB?
Asked Answered
I

1

9

I need to perform a query against DBpedia:

SELECT DISTINCT ?poi ?lat ?long ?photos ?template ?type ?label WHERE {
  ?poi  <http://www.w3.org/2000/01/rdf-schema#label> ?label .
  ?poi <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .
  ?poi <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long .
  ?poi <http://dbpedia.org/property/hasPhotoCollection> ?photos .                      
  OPTIONAL {?poi <http://dbpedia.org/property/wikiPageUsesTemplate> ?template } .
  OPTIONAL {?poi <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } .
  FILTER ( ?lat > x && ?lat < y &&
           ?long > z && ?long < ω && 
           langMatches( lang(?label), "EN" ))
} 

I'm guessing this information is scattered among different dumps (.nt) files and somehow the SPARQL endpoint serves us with a result set. I need to download these different .nt files locally (not all DBpedia), perform only once my query and store the results locally (I don't want to use the SPARQL endpoint).

  • What parts of Jena should I use for this one run?

I m a bit confused reading from this post:

So, you can load the entire DBPedia data into a single TDB location on disk (i.e. a single directory). This way, you can run SPARQL queries over it.

  • How do I load the DBpedia into a single TDB location, in Jena terms, if we got three .nt DBpedia files? How do we apply the above query on those .nt files? (Any code would help.)

  • Example, is this wrong?

 String tdbDirectory = "C:\\TDB";
 String dbdump1 = "C:\\Users\\dump1_en.nt";
 String dbdump2 = "C:\\Users\\dump2_en.nt";
 String dbdump3 = "C:\\Users\\dump3_en.nt";
 Dataset dataset = TDBFactory.createDataset(tdbDirectory);
 Model tdb = dataset.getDefaultModel(); //<-- What is the default model?Should I care?
 //Model tdb = TDBFactory.createModel(tdbdirectory) ;//<--is this prefered?
 FileManager.get().readModel( tdb, dbdump1, "N-TRIPLES" );
 FileManager.get().readModel( tdb, dbdump2, "N-TRIPLES" );
 FileManager.get().readModel( tdb, dbdump3, "N-TRIPLES" );
 String q = "my big fat query";
 Query query = QueryFactory.create(q);
        QueryExecution qexec = QueryExecutionFactory.create(query, tdb);
        ResultSet results = qexec.execSelect();
         while (results.hasNext()) {
         //do something significant with it
 }
qexec.close()
tdb.close() ;
dataset.close();
  • In the above code we used "dataset.getDefaultModel" (to get the default graph as a Jena Model). Is this statement valid? Do we need to create a dataset to perform the query, or should we go with TDBFactory.createModel(tdbdirectory)?
Isologous answered 30/5, 2013 at 9:46 Comment(5)
Is it important to you that you do this all from within Java? You can actually use TDB and run SPARQL queries using the command line tools provided by Jena, all without writing any Java code. Is that an option for you?Discrown
If, as I asked in the previous comment, using TDB locally without writing any Java code is an option, take a look at the second section of this answer, called “Using TDB locally.” If that looks suitable, we can use that as a starting point and then figure out which datasets would need to be downloaded locally.Discrown
@GeorgePaptheodorou Did you end up making any progress with this?Discrown
Hi, thanks for the reply. I tested the code and works at least for dbpedia and by creating 2 seperate classes. 1 for loading the dumps (1 at a time) and 1 class for quering the tdb.I also upgrated my machine to 8GB RAM since the application I made is a vacum for ram. I dont know if the 100 euros I paid worth it since there are alternatives ...(yes it does! I upgraded my pc)Isologous
Since you found a solution, you should write it up in an answer and mark it as accepted, so that other user who find this question will get an answer, too.Discrown
I
8

To let Jena index locally :

/** The Constant tdbDirectory. */
public static final String tdbDirectory = "C:\\TDBLoadGeoCoordinatesAndLabels"; 

/** The Constant dbdump0. */
public static final String dbdump0 = "C:\\Users\\Public\\Documents\\TDB\\dbpedia_3.8\\dbpedia_3.8.owl";

/** The Constant dbdump1. */
public static final String dbdump1 = "C:\\Users\\Public\\Documents\\TDB\\geo_coordinates_en\\geo_coordinates_en.nt";

 ...

Model tdbModel = TDBFactory.createModel(tdbDirectory);<\n>

/*Incrementally read data to the Model, once per run , RAM > 6 GB*/
FileManager.get().readModel( tdbModel, dbdump0);
FileManager.get().readModel( tdbModel, dbdump1, "N-TRIPLES");
FileManager.get().readModel( tdbModel, dbdump2, "N-TRIPLES");
FileManager.get().readModel( tdbModel, dbdump3, "N-TRIPLES");
FileManager.get().readModel( tdbModel, dbdump4, "N-TRIPLES");
FileManager.get().readModel( tdbModel, dbdump5, "N-TRIPLES");
FileManager.get().readModel( tdbModel, dbdump6, "N-TRIPLES");
tdbModel.close();

To query Jena:

String queryStr = "dbpedia query ";

Dataset dataset = TDBFactory.createDataset(tdbDirectory);
Model tdb = dataset.getDefaultModel();

Query query = QueryFactory.create(queryStr);
QueryExecution qexec = QueryExecutionFactory.create(query, tdb);

/*Execute the Query*/
ResultSet results = qexec.execSelect();

while (results.hasNext()) {
    // Do something important
}

qexec.close();
tdb.close() ;
Isologous answered 1/10, 2013 at 13:10 Comment(1)
+1 for returning to a question and providing an answer that works for you. Thanks! You should also consider marking it as accepted.Discrown

© 2022 - 2024 — McMap. All rights reserved.