How to upload a Java OutputStream to AWS S3
Asked Answered
D

3

20

I create PDF docs in memory as OutputStreams. These should be uploaded to S3. My problem is that it's not possible to create a PutObjectRequest from an OutputStream directly (according to this thread in the AWS dev forum). I use aws-java-sdk-s3 v1.10.8 in a Dropwizard app.

The two workarounds I can see so far are:

  1. Copy the OutputStream to an InputStream and accept that twice the amount of RAM is used.
  2. Pipe the OutputStream to an InputStream and accept the overhead of an extra thread (see this answer)

If i don't find a better solution I'll go with #1, because it looks as if I could afford the extra memory more easily than threads/CPU in my setup.

Is there any other, possibly more efficient way to achive this that I have overlooked so far?

Edit: My OutputStreams are ByteArrayOutputStreams

Daphnedaphnis answered 4/8, 2015 at 9:27 Comment(5)
"I create PDF docs in memory as OutputStreams" - ?? an OutputStream does not store data (possibly except for ByteArrayOutputStream, but then you'd say you created it in memory as a byte array)Parenteau
I use ByteArrayOutputStream. Sorry for the confusion.Daphnedaphnis
I have a similar question - #40268820 . Were you able to find a solution for this? If not, how did you go about doing #1 in your case?Seguidilla
@Omnipresent, you can find what I did in my own answer below.Daphnedaphnis
See https://mcmap.net/q/664397/-how-to-create-a-java-outputstream-for-an-s3-object-and-write-value-to-it for a solution which allows you to stream directly to S3 without being forced to store the entire stream in a byte-array. Automatically uses multi-part transfer if the stream gets too large.Custody
D
11

I solved this by subclassing ConvertibleOutputStream:

public class ConvertibleOutputStream extends ByteArrayOutputStream {
    //Craetes InputStream without actually copying the buffer and using up mem for that.
    public InputStream toInputStream(){
        return new ByteArrayInputStream(buf, 0, count);
    }
}
Daphnedaphnis answered 4/8, 2015 at 12:17 Comment(1)
This needs to be changed to return new ByteArrayInputStream(buf, 0, count);, otherwise unallocated data in buf may be regarded as actual data in the InputStream.Handgrip
M
2

What's the actual type of your OutputStream? Since it's an abstract class, there's no saying where the data actually goes (or if it even goes anywhere).

But let's assume that you're talking about a ByteArrayOutputStream since it at least keeps the data in memory (unlike many many others).

If you create a ByteArrayInputStream out of its buffer, there's no duplicated memory. That's the whole idea of streaming.

Mortmain answered 4/8, 2015 at 9:35 Comment(5)
OK, and how would you suggest I should access the buffer? Would you recommend creating a subclass and providing a public getter for the protected field buf from the ByteArrayOutputStream?Daphnedaphnis
Eh, I didn't realize that BAOS makes a copy of the buffer with toByteArray. Yeah, you should go for the subclass route.Mortmain
Exactly, hence the subclass idea.Daphnedaphnis
There's also several libraries that have a similar class already (ByteArrayBuffer seems to be a common name for them) which will give an InputStream directly. Jackson at least has one.Mortmain
Thanks for your input! I added my own answer to make the subclass solution more transparent.Daphnedaphnis
D
0

another workaround is to use presigned url feature of s3. since presigned url allows you to upload files to s3 with http put or post, it is possible to send your output stream to HttpURLConnection. sample code from amazon

Daylong answered 19/9, 2021 at 1:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.