i am trying to build a graph based view of uniprot data using Spark (GraphX) by leveraging the owl/RDF format. I am trying to parse the data using apache jena, but I can't wrap my head around the structure of the rdf file. To better illustrate, here's an example of the type of file I'm trying to process. http://pastebin.com/iSeGs0RZ
For my needs, i have to store/manipulate for instance
<rdfs:seeAlso rdf:resource="http://purl.uniprot.org/string/9606.ENSP00000418960"/>
By that I need to save the token "seeAlso" and the ?predicate? "http://purl.uniprot.org/string/9606.ENSP00000418960"
while trying to load a model in java/scala print(model) displays most of the information but I can't find a way to extract everything from the file.
This is what i'm using to read in the model:
object runner {
val inputFileName = "dataset/test2.xml"
def main(args: Array[String]) {
val model = ModelFactory.createDefaultModel()
// use the FileManager to find the input file
val in = FileManager.get().open(inputFileName)
if (in == null) {
throw new IllegalArgumentException(
"File: " + inputFileName + " not found")
}
model.read(in, "RDF/XML")
val items = model.listObjects()
var count = 0
while (items.hasNext) {
count += 1
val node = items.next()
println(node)
println("\n\n")
}
println(count)
}
}