Download multiple files in parallel to a zip-file from S3 using Java
Asked Answered
I

3

9

I'd like to download multiple files from some external source like S3, create a single zip file containing all those files and present the user a link to download that zip file.

Obviously I can process the files sequentially, reading the input stream of each one and writing it to ZipOutputStream.

How can I read all the input file streams in parallel and write to a single output stream so that I can present a download link to the user without making them wait until zip file is fully written?

My current code:

String realpath = getServletContext().getRealPath("/");
response.setContentType("application/zip");

response.setHeader("Content-Disposition","attachment; filename="+fi.replace('/', '-')+"_"+ff.replace('/', '-')+".zip");

ServletOutputStream out = null;
ZipOutputStream zipfile = null;

try
{
    List<Object[]> cfdis = /*my hibernate criteria source, your Database?*/
    out = response.getOutputStream();
    zipfile = new ZipOutputStream(out);
    ZipEntry zipentry = null;
    for(Object[] cfdi:cfdis)
    {
        zipentry = new ZipEntry(cfdi[1].toString()+".xml");
        zipfile.putNextEntry(zipentry);
        InputStream in = new FileInputStream(new File(realpath+cfdi[0].toString()));
        byte[] bytes = new byte[FILEBUFFERSIZE];
        int bytesRead;
        while ((bytesRead = in.read(bytes)) != -1)
        {
            zipfile.write(bytes, 0, bytesRead);
        }
    }
Imprudent answered 26/10, 2015 at 20:30 Comment(3)
@Muhammadismail Instead of putting the code in the comment, please edit your question and include the code there.Iinden
Hopefully I've fixed the question.Deputize
Thanks a lot @wOxxOmImprudent
I
11

Read stream for very first file & start writing it to outputstream

protected void doGet(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException {

    ServletOutputStream sos = response.getOutputStream();
    ZipOutputStream zos = new ZipOutputStream(sos);

    try {
        S3ServiceWrapper s3Service = new S3ServiceWrapper();
        ZipEntry zipentry = null;
        byte bytes[] = new byte[4096];
        response.setContentType("application/zip");
        response.setHeader("Content-Disposition", "attachment; filename=java-s3-download.ZIP");

        for (String objectKey : objectKeys) {
            String name = objectKey.substring(objectKey.lastIndexOf("/"), objectKey.length());
            log.info("Start Writing File::" + name);

            zipentry = new ZipEntry(name);
            zos.putNextEntry(zipentry);
            InputStream in = s3Service.downloadFileAsStream(bucketName, objectKey);
            int bytesRead = -1;

            while ((bytesRead = in.read(bytes)) != -1) {
                zos.write(bytes, 0, bytesRead);
            }

            log.info("Finsih Writing File::" + name);
            in.close();
        }
    } catch (Exception e)

    {
        e.printStackTrace();
    } finally {
        zos.flush();
        zos.closeEntry();
        zos.close();
        sos.close();
    }

}

public InputStream downloadFileAsStream(String bucketName, String objectKey) {
    if (s3Service == null) {
        return null;
    }

    try {
        GetObjectRequest s3ObjectReq = new GetObjectRequest(bucketName, objectKey);
        log.info("Downloading file having key = " + objectKey);
        long startTime=System.currentTimeMillis();
        S3Object downlodedObjectMD = s3Service.getObject(s3ObjectReq);
        log.info("Time to load Stream is "+(System.currentTimeMillis()-startTime)+" ms");
        return downlodedObjectMD.getObjectContent();

    } catch (Exception e) {
        log.error("EXCEPTION = " + e.getMessage(), e);
    }
    return null;
}
Imprudent answered 29/10, 2015 at 17:6 Comment(0)
K
2

There is no advantage in reading the files in parallel, because only one thread can write to the zip file at a time.

You can easily stream the zip file as and when it is written. Check this blog for the full code: https://purushramrajblog.wordpress.com/2015/11/22/dynamically-construct-a-zip-file-and-stream-it-with-jersey/

Korney answered 22/11, 2015 at 3:15 Comment(1)
This is inaccurate. You can generally write to disk faster than you can make a round trip over the network. Having several files being fetched from S3 in parallel CAN in fact speed up the process.Burhans
B
0

Do like this to download a file from S3:

    final BasicAWSCredentials awsCredentials = new BasicAWSCredentials(ACCESS_KEY, SECRET_KEY);
    AmazonS3 amazonS3 = AmazonS3ClientBuilder.standard().withRegion("ap-south-1")
            .withCredentials(new AWSStaticCredentialsProvider(awsCredentials))
            .build();
    final S3Object s3Object = amazonS3.getObject(bucketname, mojo_key_name);
    final S3ObjectInputStream stream = s3Object.getObjectContent();
    InputStreamResource isr =new InputStreamResource(stream);
    return ResponseEntity
            .ok()
            .contentType(MediaType.APPLICATION_JSON)
            .header(HttpHeaders.CONTENT_DISPOSITION, "attachment;filename=" + fileName+ ".zip")
            .body(isr);
Blade answered 14/5, 2020 at 4:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.