How to read from zipped xml files in Scala code?
Asked Answered
P

3

9

How do I access XML data files directly from a zipped file in my Scala program? Are there any direct ways to programmatically unzip and read contents in my Scala code?

Parthenogenesis answered 1/3, 2011 at 10:59 Comment(0)
T
17

Here are a couple of ways of doing it in 2.8.1:

cat > root.xml << EOF
<ROOT>
<id>123</id>
</ROOT>
EOF
zip root root.xml

and then in the REPL:

val rootzip = new java.util.zip.ZipFile("root.zip")
import collection.JavaConverters._
val entries = rootzip.entries.asScala
entries foreach { e =>
    val x = scala.xml.XML.load(rootzip.getInputStream(e))
    println(x)
}

or something like:

val rootzip = new java.util.zip.ZipFile("root.zip")
import scala.collection.JavaConversions._
rootzip.entries.
  filter (_.getName.endsWith(".xml")).
  foreach { e => println(scala.xml.XML.load(rootzip.getInputStream(e))) }
Tyro answered 8/3, 2011 at 13:56 Comment(1)
Thanks a lot. This one really helped the most. I pasted an implicit method code for converting the Java Enumeration to Scala list. collection.JavaConverters._ and asScala() helped reduce the code complexity. A lot of really helpful examples for both XML and ZIP file reading in scala. Thanks a ton.Parthenogenesis
M
5

You can use the Java package java.util.zip: http://download.oracle.com/javase/6/docs/api/java/util/zip/package-summary.html

Metallic answered 1/3, 2011 at 11:5 Comment(3)
Which does have gzip (though not tar.gz) support, as the OP's tag requests.Rectangular
I see no 'tar' here, just 'gzip'. :) A GZIPInputStream should be just what the doctor ordered. Or, if it's actually a PKZIP file, something else in the same package will work (with an extra helping of accidental complexity)Jamilla
hmmmm...any quick sample code to look at? Am jut being a little lazy, thats it.Parthenogenesis
C
4

I personally prefer TrueZip. It allows you to treat archive files as a virtual file system, providing the same interface as standard Java file I/O.

Corenda answered 2/3, 2011 at 19:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.