How to upload a Java OutputStream to AWS S3

Asked 4/8, 2015 at 9:27 Answered 19/9, 2021 at 1:57

Solved java amazon-web-services amazon-s3 inputstream outputstream

I create PDF docs in memory as OutputStreams. These should be uploaded to S3. My problem is that it's not possible to create a PutObjectRequest from an OutputStream directly (according to this thread in the AWS dev forum). I use aws-java-sdk-s3 v1.10.8 in a Dropwizard app.

The two workarounds I can see so far are:

Copy the OutputStream to an InputStream and accept that twice the amount of RAM is used.
Pipe the OutputStream to an InputStream and accept the overhead of an extra thread (see this answer)

If i don't find a better solution I'll go with #1, because it looks as if I could afford the extra memory more easily than threads/CPU in my setup.

Is there any other, possibly more efficient way to achive this that I have overlooked so far?

Edit: My OutputStreams are ByteArrayOutputStreams

Daphnedaphnis answered 4/8, 2015 at 9:27 Comment(5)

"I create PDF docs in memory as OutputStreams" - ?? an OutputStream does not store data (possibly except for ByteArrayOutputStream, but then you'd say you created it in memory as a byte array) – Parenteau 4/8, 2015 at 9:35

I use ByteArrayOutputStream. Sorry for the confusion. – Daphnedaphnis 4/8, 2015 at 9:48

I have a similar question - #40268820 . Were you able to find a solution for this? If not, how did you go about doing #1 in your case? – Seguidilla 26/10, 2016 at 17:23

@Omnipresent, you can find what I did in my own answer below. – Daphnedaphnis 28/10, 2016 at 8:44

See https://mcmap.net/q/664397/-how-to-create-a-java-outputstream-for-an-s3-object-and-write-value-to-it for a solution which allows you to stream directly to S3 without being forced to store the entire stream in a byte-array. Automatically uses multi-part transfer if the stream gets too large. – Custody 23/10, 2020 at 22:53

I solved this by subclassing ConvertibleOutputStream:

public class ConvertibleOutputStream extends ByteArrayOutputStream {
    //Craetes InputStream without actually copying the buffer and using up mem for that.
    public InputStream toInputStream(){
        return new ByteArrayInputStream(buf, 0, count);
    }
}

Daphnedaphnis answered 4/8, 2015 at 12:17 Comment(1)

This needs to be changed to return new ByteArrayInputStream(buf, 0, count);, otherwise unallocated data in buf may be regarded as actual data in the InputStream. – Handgrip 28/9, 2015 at 16:2

What's the actual type of your OutputStream? Since it's an abstract class, there's no saying where the data actually goes (or if it even goes anywhere).

But let's assume that you're talking about a ByteArrayOutputStream since it at least keeps the data in memory (unlike many many others).

If you create a ByteArrayInputStream out of its buffer, there's no duplicated memory. That's the whole idea of streaming.

Mortmain answered 4/8, 2015 at 9:35 Comment(5)

OK, and how would you suggest I should access the buffer? Would you recommend creating a subclass and providing a public getter for the protected field buf from the ByteArrayOutputStream? – Daphnedaphnis 4/8, 2015 at 9:55

Eh, I didn't realize that BAOS makes a copy of the buffer with toByteArray. Yeah, you should go for the subclass route. – Mortmain 4/8, 2015 at 9:59

Exactly, hence the subclass idea. – Daphnedaphnis 4/8, 2015 at 10:1

There's also several libraries that have a similar class already (ByteArrayBuffer seems to be a common name for them) which will give an InputStream directly. Jackson at least has one. – Mortmain 4/8, 2015 at 10:5

Thanks for your input! I added my own answer to make the subclass solution more transparent. – Daphnedaphnis 4/8, 2015 at 12:18

another workaround is to use presigned url feature of s3. since presigned url allows you to upload files to s3 with http put or post, it is possible to send your output stream to HttpURLConnection. sample code from amazon

Daylong answered 19/9, 2021 at 1:57 Comment(0)

Recommended topics

Hot tags