Play 2.x : Reactive file upload with Iteratees
Asked Answered
S

4

28

I will start with the question: How to use Scala API's Iteratee to upload a file to the cloud storage (Azure Blob Storage in my case, but I don't think it's most important now)

Background:

I need to chunk the input into blocks of about 1 MB for storing large media files (300 MB+) as an Azure's BlockBlobs. Unfortunately, my Scala knowledge is still poor (my project is Java based and the only use for Scala in it will be an Upload controller).

I tried with this code: Why makes calling error or done in a BodyParser's Iteratee the request hang in Play Framework 2.0? (as a Input Iteratee) - it works quite well but eachElement that I could use has size of 8192 bytes, so it's too small for sending some hundred megabyte files to the cloud.

I must say that's quite a new approach to me, and most probably I misunderstood something (don't want to tell that I misunderstood everything ;> )

I will appreciate any hint or link, which will help me with that topic. If is there any sample of similar usage it would be the best option for me to get the idea.

Surveying answered 11/8, 2012 at 19:13 Comment(1)
Are you looking for rechunking input into bigger chunks?Ledford
L
35

Basically what you need first is rechunk input as bigger chunks, 1024 * 1024 bytes.

First let's have an Iteratee that will consume up to 1m of bytes (ok to have the last chunk smaller)

val consumeAMB = 
  Traversable.takeUpTo[Array[Byte]](1024*1024) &>> Iteratee.consume()

Using that, we can construct an Enumeratee (adapter) that will regroup chunks, using an API called grouped:

val rechunkAdapter:Enumeratee[Array[Byte],Array[Byte]] =
  Enumeratee.grouped(consumeAMB)

Here grouped uses an Iteratee to determine how much to put in each chunk. It uses the our consumeAMB for that. Which means the result is an Enumeratee that rechunks input into Array[Byte] of 1MB.

Now we need to write the BodyParser, which will use the Iteratee.foldM method to send each chunk of bytes:

val writeToStore: Iteratee[Array[Byte],_] =
  Iteratee.foldM[Array[Byte],_](connectionHandle){ (c,bytes) => 
    // write bytes and return next handle, probable in a Future
  }

foldM passes a state along and uses it in its passed function (S,Input[Array[Byte]]) => Future[S] to return a new Future of state. foldM will not call the function again until the Future is completed and there is an available chunk of input.

And the body parser will be rechunking input and pushing it into the store:

BodyParser( rh => (rechunkAdapter &>> writeToStore).map(Right(_)))

Returning a Right indicates that you are returning a body by the end of the body parsing (which happens to be the handler here).

Ledford answered 11/8, 2012 at 19:13 Comment(5)
Nice explanation. Two questions: (1) What does Iteratee.foldM do? Can't find that in the API docs here: playframework.org/documentation/api/2.0.2/scala/… (2) Why is map(Right(_)) needed? Would be great if you could add something about these to your post.Farland
Thank you Sadek, I need some time to test it.Surveying
@Sadache: It seems that Iteratee.foldM[E,A] is placed in the master, but not in the 2.0.3 is that true ? We're going to use stable version for this production. Do yo plan new release soon?Surveying
Yes, but you can also copy code of foldM method for the time being.Ledford
FYI there's more discussion about this answer over at #12609951.Rambunctious
A
3

If your goal is to stream to S3, here is a helper that I have implemented and tested:

def uploadStream(bucket: String, key: String, enum: Enumerator[Array[Byte]])
                (implicit ec: ExecutionContext): Future[CompleteMultipartUploadResult] = {
  import scala.collection.JavaConversions._

  val initRequest = new InitiateMultipartUploadRequest(bucket, key)
  val initResponse = s3.initiateMultipartUpload(initRequest)
  val uploadId = initResponse.getUploadId

  val rechunker: Enumeratee[Array[Byte], Array[Byte]] = Enumeratee.grouped {
    Traversable.takeUpTo[Array[Byte]](5 * 1024 * 1024) &>> Iteratee.consume()
  }

  val uploader = Iteratee.foldM[Array[Byte], Seq[PartETag]](Seq.empty) { case (etags, bytes) =>
    val uploadRequest = new UploadPartRequest()
      .withBucketName(bucket)
      .withKey(key)
      .withPartNumber(etags.length + 1)
      .withUploadId(uploadId)
      .withInputStream(new ByteArrayInputStream(bytes))
      .withPartSize(bytes.length)

    val etag = Future { s3.uploadPart(uploadRequest).getPartETag }
    etag.map(etags :+ _)
  }

  val futETags = enum &> rechunker |>>> uploader

  futETags.map { etags =>
    val compRequest = new CompleteMultipartUploadRequest(bucket, key, uploadId, etags.toBuffer[PartETag])
    s3.completeMultipartUpload(compRequest)
  }.recoverWith { case e: Exception =>
    s3.abortMultipartUpload(new AbortMultipartUploadRequest(bucket, key, uploadId))
    Future.failed(e)
  }

}
Anachronism answered 19/9, 2014 at 7:57 Comment(1)
how do you use this method with a controller?Minor
G
0

For those who are also trying to figure out a solution of this streaming problem, instead of writing a whole new BodyParser, you can also use what has already been implemented in parse.multipartFormData. You can implement something like below to overwrite the default handler handleFilePartAsTemporaryFile.

def handleFilePartAsS3FileUpload: PartHandler[FilePart[String]] = {
  handleFilePart {
    case FileInfo(partName, filename, contentType) =>

      (rechunkAdapter &>> writeToS3).map {
        _ =>
          val compRequest = new CompleteMultipartUploadRequest(...)
          amazonS3Client.completeMultipartUpload(compRequest)
          ...
      }
  }
}

def multipartFormDataS3: BodyParser[MultipartFormData[String]] = multipartFormData(handleFilePartAsS3FileUpload)

I am able to make this work but I am still not sure whether the whole upload process is streamed. I tried some large files, it seems the S3 upload only starts when the whole file has been sent from the client side.

I looked at the above parser implementation and I think everything is connected using Iteratee so the file should be streamed. If someone has some insight on this, that will be very helpful.

Gitlow answered 13/8, 2014 at 22:10 Comment(0)
R
0

add the following to your config file

play.http.parser.maxMemoryBuffer=256K

Rivard answered 11/1, 2016 at 23:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.