How can I read a file to an InputStream then write it into an OutputStream in Scala?
Asked Answered
H

5

23

I'm trying to use basic Java code in Scala to read from a file and write to an OutputStream, but when I use the usual while( != -1 ) in Scala gives me a warning "comparing types of Unit and Int with != will always yield true".

The code is as follows:

    val file = this.cache.get(imageFileEntry).getValue().asInstanceOf[File]
    response.setContentType( "image/%s".format( imageDescription.getFormat() ) )

    val input = new BufferedInputStream( new FileInputStream( file ) )
    val output = response.getOutputStream()

    var read : Int = -1

    while ( ( read = input.read ) != -1 ) {
        output.write( read )
    }

    input.close()
    output.flush()

How am I supposed to write from an input stream to an output stream in Scala?

I'm mostly interested in a Scala-like solution.

Huck answered 3/8, 2011 at 14:12 Comment(2)
Performance-wise, it might be a good idea to use an intermediate buffer instead of reading and writing one byte at a time.Kiri
That's why there is a BufferedInputStream there.Premises
H
45

You could do this:

Iterator 
.continually (input.read)
.takeWhile (-1 !=)
.foreach (output.write)
Hersey answered 3/8, 2011 at 14:35 Comment(4)
Ok, this looks like a real Scala solution, but is this continually call load the file into memory or is it going to call this function as the foreach loop runs? Also, can you elaborate a bit on this takeWhile method? Why didn't you have to use the _ parameter or define a parameter yourself?Premises
@Maurício It's an iterator, so everything is done only on-demand. Until foreach, nothing really happens -- you just get new Iterator objects that do some pre-processing before next or hasNext. On foreach, an output.write is executed for each input.read, and then its value is promptly forgotten and garbage collected.Hersey
It would be nice to have a version using the scala-io incubator projectLifelike
It should be noted that for the maximum performance, both input and output stream should have been buffered (using Buffered{Input,Output}Stream). It will still be significantly slower than the non-Scala-like wayHowse
D
19

If this is slow:

Iterator 
.continually (input.read)
.takeWhile (-1 !=)
.foreach (output.write)

you can expand it:

val bytes = new Array[Byte](1024) //1024 bytes - Buffer size
Iterator
.continually (input.read(bytes))
.takeWhile (-1 !=)
.foreach (read=>output.write(bytes,0,read))
output.close()
Dispensation answered 30/5, 2013 at 23:52 Comment(1)
This is assuming that you're ok just reading 1024 bytes though. What if I don't know how much I need to read until I reach something like a delineator?Far
M
7

Assignment statements always return Unit in Scala, so read = input.read returns Unit, which never equals -1. You can do it like this:

while ({read = input.read; read != -1}) {
  output.write(read)
}
Marvelmarvella answered 3/8, 2011 at 14:18 Comment(3)
You can't have more than one statement in a while/if/for clause. This code yields a compilation error. But thanks for the assignment thing, i thought it would behave like java.Premises
Hi Mauricio! Yes the {} are the key point in his example. Let's call them a block. Everything inside will be executed and the result is eventually returned having the type of the last expression.Deuteranope
sorry, added the braces a little late;)Marvelmarvella
K
5
def stream(inputStream: InputStream, outputStream: OutputStream) =
{
  val buffer = new Array[Byte](16384)

  def doStream(total: Int = 0): Int = {
    val n = inputStream.read(buffer)
    if (n == -1)
      total
    else {
      outputStream.write(buffer, 0, n)
      doStream(total + n)
    }
  }

  doStream()
}
Kiri answered 3/8, 2011 at 15:14 Comment(1)
This is a better solution. The above solutions are horribly slow. One call per byte read of overhead? Really? How is that acceptable in a bulk data mover?Danczyk
I
0

We can copy an inputstream to an outputstream in a generic and type-safe manner using typeclasses. A typeclass is a concept. It's one approach to polymorphism. In particular, it's parametric polymorphism because the polymorphic behavior is encoded using parameters. In our case, our parameters will be generic types to Scala traits.

Let's make Reader[I] and Writer[O] traits, where I and O are input and output stream types, respectively.

trait Reader[I] {
  def read(input: I, buffer: Array[Byte]): Int
}

trait Writer[O] {
  def write(output: O, buffer: Array[Byte], startAt: Int, nBytesToWrite: Int): Unit
}

We can now make a generic copy method that can operate on things that subscribe to these interfaces.

object CopyStreams {

  type Bytes = Int

  def apply[I, O](input: I, output: O, chunkSize: Bytes = 1024)(implicit r: Reader[I], w: Writer[O]): Unit = {
    val buffer = Array.ofDim[Byte](chunkSize)
    var count = -1

    while ({count = r.read(input, buffer); count > 0})
      w.write(output, buffer, 0, count)
  }
}

Note the implicit r and w parameters here. Essentially, we're saying that CopyStreams[I,O].apply will work iff there are Reader[I] and a Writer[O] values in scope. This will make us able to call CopyStreams(input, output) seamlessly.

Importantly, however, note that this implementation is generic. It operates on types that are independent of actual stream implementations.

In my particular use case, I needed to copy S3 objects to local files. So I made the following implicit values.

object Reader {

  implicit val s3ObjectISReader = new Reader[S3ObjectInputStream] {
    @inline override def read(input: S3ObjectInputStream, buffer: Array[Byte]): Int =
      input.read(buffer)
  }
}


object Writer {

  implicit val fileOSWriter = new Writer[FileOutputStream] {
    @inline override def write(output: FileOutputStream,
                               buffer: Array[Byte],
                               startAt: Int,
                               nBytesToWrite: Int): Unit =
      output.write(buffer, startAt, nBytesToWrite)
  }
}

So now I can do the following:

val input:S3ObjectStream = ...
val output = new FileOutputStream(new File(...))
import Reader._
import Writer._
CopyStreams(input, output)
// close and such...

And if we ever need to copy different stream types, we only need to write a new Reader or Writer implicit value. We can use the CopyStreams code without changing it!

Intratelluric answered 13/4, 2015 at 22:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.