Jena Text query performance slows down dramatically with large dataset

I am working on querying from a RDF dataset of 2.37 GB with approx 17 million triples in it and lucence index of the dataset is also maintained. I tried text queries of jena-text module which search's on the basis of stored lucene indexes. But its performance is quite slow, it is taking 4 or more seconds for a search query which is very slow.

However when I use luncene index viewer 'luke'. Indexes seems to have no problem and when I search for a particular term in from indexes it takes few miliseconds to search it.

So the problem is that I am unable to recognize that why is it taking so much time when it comes to 'jena-texr'.

Following is the sparql query:

SELECT ?subj ?status ?version ?label 
WHERE {  
      ?subj rdf:type ts:Valueset;
            text:query 'cancer';
            ts:entityStatus ?status;
    OPTIONAL { ?subj ts:versionID ?version . } .
    OPTIONAL { ?subj rdfs:label ?label . } .
}
LIMIT <limit> 
OFFSET <offset>

Here is the jena code:

store.getDataset().begin(ReadWrite.READ) ;
Query query = QueryFactory.create(queryStr);
QueryExecution qexec = QueryExecutionFactory.create(query , store.getDataset()) ;
ResultSet results = qexec.execSelect();
while(results.hasNext()){
     QuerySolution qs = results.next();

And Here is the code for creating indexed dataset.

Dataset baseDS = TDBFactory.createDataset(storePath.trim());
//define index mapping
EntityDefinition entityDef = new EntityDefinition("uri", "property", RDFS.label.asNode());
entityDef.set("property", TS.conceptCode.asNode());
entityDef.set("property", SKOS_XL.literalForm.asNode());
entityDef.set("property", SKOS.note.asNode());
entityDef.set("property", SKOS.definition.asNode());

//create in file lucene
File indexDir = new File(textIndexPath);
Directory luceneDir = null;
try {
luceneDir = FSDirectory.open(indexDir);
} catch (IOException e) {
e.printStackTrace();
}

// Join together into a dataset
Dataset indexedDS = TextDatasetFactory.createLucene(baseDS, luceneDir, entityDef) ;

Kindly can anyone identify if there is any problem with the code and the way indexed dataset is configured. Thanks

Recommended topics

Hot tags