Read ontology into GraphX from rdf model
Asked Answered
G

0

6

i am trying to build a graph based view of uniprot data using Spark (GraphX) by leveraging the owl/RDF format. I am trying to parse the data using apache jena, but I can't wrap my head around the structure of the rdf file. To better illustrate, here's an example of the type of file I'm trying to process. http://pastebin.com/iSeGs0RZ

For my needs, i have to store/manipulate for instance
<rdfs:seeAlso rdf:resource="http://purl.uniprot.org/string/9606.ENSP00000418960"/> By that I need to save the token "seeAlso" and the ?predicate? "http://purl.uniprot.org/string/9606.ENSP00000418960" while trying to load a model in java/scala print(model) displays most of the information but I can't find a way to extract everything from the file.

This is what i'm using to read in the model:

object runner {
  val inputFileName = "dataset/test2.xml"

  def main(args: Array[String]) {
    val model = ModelFactory.createDefaultModel()

    // use the FileManager to find the input file
    val in = FileManager.get().open(inputFileName)
    if (in == null) {
      throw new IllegalArgumentException(
        "File: " + inputFileName + " not found")
    }
    model.read(in, "RDF/XML")
    val items = model.listObjects()
    var count = 0
    while (items.hasNext) {
      count += 1
      val node = items.next()
      println(node)
      println("\n\n")
    }
    println(count)
  }
}
Germaun answered 22/12, 2015 at 11:52 Comment(1)
I managed to solve this problem by separating the RDF/XML parser and the rest of the code. Step 1 parse rdf/xml to n-triple/n-quad Step 2 parse n-triple with Spark and dump result as vertex&edge object filesGermaun

© 2022 - 2024 — McMap. All rights reserved.