I have a server-side application that runs through a large number of image URLs and uploads the images from these URLs to S3.
The files are served over HTTP. I download them using InputStream
I get from an HttpURLConnection
using the getInputStream
method. I hand the InputStream to AWS S3 Client putObject
method (AWS Java SDK v1) to upload the stream to S3. So far so good.
I am trying to introduce a new external image data source. The problem with this data source is that the HTTP server serving these images does not return a Content-Length
HTTP header. This means I cannot tell how many bytes the image will be, which is a number required by the AWS S3 client to validate the image was correctly uploaded from the stream to S3.
The only ways I can think of dealing with this issue is to either get the server owner to add Content-Length
HTTP header to their response (unlikely), or to download the file to a memory buffer first and then upload it to S3 from there.
These are not big files, but I have many of them.
When considering downloading the file first, I am worried about the memory footprint and concurrency implications (not being able to upload and download chunks of the same file at the same time).
Since I am dealing with many small files, I suspect that concurrency issues might be "resolved" if I focus on the concurrency of the multiple files instead of a single file. So instead of concurrently downloading and uploading chunks of the same file, I will use my IO effectively downloading one file while uploading another.
I would love your ideas on how to do this, best practices, pitfalls or any other thought on how to best tackle this issue.
putObject
javadoc states that "if the caller doesn't provide [the content length], the library will make a best effort to compute the content length by buffer the contents of the input stream into the memory". Did you try not specify the length? – CaracallaContent-Length
header by settingAccept-Encoding: identity
in your request headers. Whether that works depends on why they aren't sending it now, but it's a valid request, so it wouldn't hurt to try. – Sissy