Read ontology into GraphX from rdf model

i am trying to build a graph based view of uniprot data using Spark (GraphX) by leveraging the owl/RDF format. I am trying to parse the data using apache jena, but I can't wrap my head around the structure of the rdf file. To better illustrate, here's an example of the type of file I'm trying to process. http://pastebin.com/iSeGs0RZ

For my needs, i have to store/manipulate for instance
<rdfs:seeAlso rdf:resource="http://purl.uniprot.org/string/9606.ENSP00000418960"/> By that I need to save the token "seeAlso" and the ?predicate? "http://purl.uniprot.org/string/9606.ENSP00000418960" while trying to load a model in java/scala print(model) displays most of the information but I can't find a way to extract everything from the file.

This is what i'm using to read in the model:

object runner {
  val inputFileName = "dataset/test2.xml"

  def main(args: Array[String]) {
    val model = ModelFactory.createDefaultModel()

    // use the FileManager to find the input file
    val in = FileManager.get().open(inputFileName)
    if (in == null) {
      throw new IllegalArgumentException(
        "File: " + inputFileName + " not found")
    }
    model.read(in, "RDF/XML")
    val items = model.listObjects()
    var count = 0
    while (items.hasNext) {
      count += 1
      val node = items.next()
      println(node)
      println("\n\n")
    }
    println(count)
  }
}

Recommended topics

Hot tags