How to read a file as a byte array in Scala
Asked Answered
S

8

84

I can find tons of examples but they seem to either rely mostly on Java libraries or just read characters/lines/etc.

I just want to read in some file and get a byte array with scala libraries - can someone help me with that?

Sabbat answered 29/9, 2011 at 13:36 Comment(9)
I think relying on Java libraries is what (almost?) everyone would do, the Scala library included. See for instance the source code of scala.io.Source.Diphyllous
I know Scala relies on Java. But what is the point of a language where I can not even do simple file i/o without using a different language?Sabbat
You're not using a different language, just a standard JVM API that has proved good enough not to need replacing!Everyplace
Hm yeah, you are probably right... Still, it feels like cheating. :)Sabbat
Well, how do you think the Java classes are implemented? Deep down, somewhere, there is a native method: it has just a signature, no Java implementation, and relies on an OS-specific C implementation. Isn't that cheating too? :)Diphyllous
It should be said that Scala on .Net does make this a more pressing issue.Everyplace
@Duncan McGregor: Good point, guess the transition isn't as smooth there...Sabbat
@Philippe: Sure, and using C is only cheating on assembly :P... What I meant is just, that the border between languages is usually rather clearly defined, Scala and Java sort of melt into each other.Sabbat
possible duplicate of What is the proper way to code a read-while loop in Scala?Paramo
L
143

Java 7:

import java.nio.file.{Files, Paths}

val byteArray = Files.readAllBytes(Paths.get("/path/to/file"))

I believe this is the simplest way possible. Just leveraging existing tools here. NIO.2 is wonderful.

Liberati answered 21/1, 2014 at 17:5 Comment(1)
I think that anyone not bound to jvm < 7 should use this.Neysa
A
47

This should work (Scala 2.8):

val bis = new BufferedInputStream(new FileInputStream(fileName))
val bArray = Stream.continually(bis.read).takeWhile(-1 !=).map(_.toByte).toArray
Ashliashlie answered 29/9, 2011 at 20:42 Comment(6)
I think this is a great example of wrapping a Java API function to get Stream semantics. Much appreciated.Gallery
val bis = new java.io.BufferedInputStream(new java.io.FileInputStream(fileName)); if you do not have the java paths importedBecky
Using this approach, is closing the file also needed or is it implicit?Brisance
You need to close it yourselfCatholicity
This approach is slow, since it needs to process each and every byte. Ideally, I/O operations should be block-based.Aldwin
I benchmarked it comparing to buffered approach, it's about 500 times slower on my test. (test config: compute CRC32 of a 14 MB file, which is repeatedly re-read from SSD in RAID-0 - so it's in system file cache; Intel Core i7 2nd gen; 16GB RAM).Tufa
G
6

The library scala.io.Source is problematic, DON'T USE IT in reading binary files.

The error can be reproduced as instructed here: https://github.com/liufengyun/scala-bug

In the file data.bin, it contains the hexidecimal 0xea, which is 11101010 in binary and should be converted to 234 in decimal.

The main.scala file contain two ways to read the file:

import scala.io._
import java.io._

object Main {
  def main(args: Array[String]) {
    val ss = Source.fromFile("data.bin")
    println("Scala:" + ss.next.toInt)
    ss.close

    val bis = new BufferedInputStream(new FileInputStream("data.bin"))
    println("Java:" + bis.read)
    bis.close
  }
}

When I run scala main.scala, the program outputs follows:

Scala:205
Java:234

The Java library generates correct output, while the Scala library not.

Greysun answered 21/1, 2014 at 15:27 Comment(2)
If I set the encoding to Source.fromFile("data.bin", "ISO8859-1"), it works well.Greysun
Maybe it's helpful, but really, this isn't an answer. Introducing a new problem in an answer is not constructive and belongs somewhere else.Monadism
U
5
val is = new FileInputStream(fileName)
val cnt = is.available
val bytes = Array.ofDim[Byte](cnt)
is.read(bytes)
is.close()
Unripe answered 31/3, 2012 at 13:2 Comment(1)
It is not a valid solution. From javadoc of InputStream.available: Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.Pitchdark
T
4

You might also consider using scalax.io:

scalax.io.Resource.fromFile(fileName).byteArray
Theodolite answered 11/6, 2013 at 8:31 Comment(1)
Noticed that the last actions on that repository are 6 years ago - is it still relevant?Sapajou
G
2

You can use the Apache Commons Compress IOUtils

import org.apache.commons.compress.utils.IOUtils

val file = new File("data.bin")
IOUtils.toByteArray(new FileInputStream(file))
Gamy answered 3/8, 2017 at 19:3 Comment(1)
I had to import import org.apache.commons.io.IOUtils instead of the suggested import.Norvol
R
0

Asynchronous File reading using Scala Future and Java NIO2

  def readFile(path: Path)(implicit ec: ExecutionContext): Future[Array[Byte]] = {
    val p = Promise[Array[Byte]]()
    try {
      val channel = AsynchronousFileChannel.open(path, StandardOpenOption.READ)
      val buffer = ByteBuffer.allocate(channel.size().toInt);
      channel.read(buffer, 0L, buffer, onComplete(channel, p))
    }
    catch {
      case t: Exception => p.failure(t)
    }
    p.future
  }

  private def onComplete(channel: AsynchronousFileChannel, p: Promise[Array[Byte]]) = {
    new CompletionHandler[Integer, ByteBuffer]() {
      def completed(res: Integer, buffer: ByteBuffer): Unit = {
        p.complete(Try {
          buffer.array()
        })
      }

      def failed(t: Throwable, buffer: ByteBuffer): Unit = {
        p.failure(t)
      }
    }
  }
Rainy answered 12/3, 2021 at 10:57 Comment(0)
D
-2

I have used below code to read a CSV file.

import scala.io.StdIn.readLine
import scala.io.Source.fromFile

readFile("C:/users/xxxx/Downloads/", "39025968_ccccc_1009.csv")

def readFile(loc :String,filenm :String): Unit ={

  var flnm = fromFile(s"$loc$filenm") // Imported fromFile package

  println("Files testing")
  /*for (line <- flnm.getLines()) {
    printf("%4d %s\n", line.length, line)
  }*/
  flnm.getLines().foreach(println) // getLines() is imported from readLines.
  flnm.close() 
}
Dilution answered 26/10, 2020 at 23:26 Comment(2)
With a question this old (asked over 9 years ago), and with so many answers already submitted, it is helpful to point out how your new answer is different from the previous answers. (And including code that's been commented out just looks sloppy.)Palmation
yeah.. the other answers clearly show a byte array being returned. this is really not clearSlangy

© 2022 - 2024 — McMap. All rights reserved.