How to upload a stream to S3 with AWS SDK v3
Asked Answered
Z

2

38

I have to transfer a file from and API endpoint to two different bucket. The original upload is made using:

curl -X PUT -F "data=@sample" "http://localhost:3000/upload/1/1"

The endpoint where the file is uploaded:

const PassThrough = require('stream').PassThrough;

async function uploadFile (req, res) {
  try {
    const firstS3Stream = new PassThrough();
    const secondS3Stream = new PassThrough();
    req.pipe(firstS3Stream);
    req.pipe(secondS3Stream);

    await Promise.all([
      uploadToFirstS3(firstS3Stream),
      uploadToSecondS3(secondS3Stream),
    ]);
    return res.end();
  } catch (err) {
    console.log(err)
    return res.status(500).send({ error: 'Unexpected error during file upload' });
  }
}

As you can see, I use two PassThrough streams, in order to duplicate the request stream into two readable streams, as suggested in this SO thread.

This piece of code remains unchanged, what is interesting here are the uploadToFirstS3 and uploadToSecondS3 functions. In this minimal example both do exactly the same thing with a different configuration, i will expend only one here.

What Works Well:

const aws = require('aws-sdk');

const s3 = new aws.S3({
  accessKeyId: S3_API_KEY,
  secretAccessKey: S3_API_SECRET,
  region: S3_REGION,
  signatureVersion: 'v4',
});

const uploadToFirstS3 = (stream) => (new Promise((resolve, reject) => {
  const uploadParams = {
    Bucket: S3_BUCKET_NAME,
    Key: 'some-key',
    Body: stream,
  };
  s3.upload(uploadParams, (err) => {
    if (err) reject(err);
    resolve(true);
  });
}));

This piece of code (based on the aws-sdk package) works fine. My issue here is that i want it to run with the @aws-sdk/client-s3 package in order to reduce the size of the project.

What doesn't work:

I first tried to use S3Client.send(PutObjectCommand):

const { S3Client, PutObjectCommand } = require('@aws-sdk/client-s3');

const s3 = new S3Client({
  credentials: {
    accessKeyId: S3_API_KEY,
    secretAccessKey: S3_API_SECRET,
  },
  region: S3_REGION,
  signatureVersion: 'v4',
});

const uploadToFirstS3 = (stream) => (new Promise((resolve, reject) => {
  const uploadParams = {
    Bucket: S3_BUCKET_NAME,
    Key:'some-key',
    Body: stream,
  };
  s3.send(new PutObjectCommand(uploadParams), (err) => {
    if (err) reject(err);
    resolve(true);
  });
}));

Then I tried S3.putObject(PutObjectCommandInput):

const { S3 } = require('@aws-sdk/client-s3');

const s3 = new S3({
  credentials: {
    accessKeyId: S3_API_KEY,
    secretAccessKey: S3_API_SECRET,
  },
  region: S3_REGION,
  signatureVersion: 'v4',
});

const uploadToFirstS3 = (stream) => (new Promise((resolve, reject) => {
  const uploadParams = {
    Bucket: S3_BUCKET_NAME,
    Key:'some-key',
    Body: stream,
  };
  s3.putObject(uploadParams, (err) => {
    if (err) reject(err);
    resolve(true);
  });
}));

The two last examples both give me a 501 - Not Implemented error with the header Transfer-Encoding. I checked req.headers and there is no Transfer-Encoding in it, so I guess the sdk adds in the request to s3 ?

Since the first example (based on aws-sdk) works fine, I'm sure the error is not due to an empty body in the request as suggested in this SO thread.

Still, I thought maybe the stream wasn't readable yet when triggering the upload, thus I wrapped the calls to uploadToFirstS3 and uploadToSecondS3 with a callback triggered by the req.on('readable', callback) event, but nothing changed.

I would like to process the files in memory without storing it on the disk at any time. Is there a way to achieve it using the @aws-sdk/client-s3 package ?

Zipah answered 8/11, 2021 at 14:14 Comment(0)
T
73

In v3 you can use the Upload class from @aws-sdk/lib-storage to do multipart uploads. Seems like there might be no mention of this in the docs site for @aws-sdk/client-s3 unfortunately.

It's mentioned in the upgrade guide here: https://github.com/aws/aws-sdk-js-v3/blob/main/UPGRADING.md#s3-multipart-upload

Here's a corrected version of the example provided in https://github.com/aws/aws-sdk-js-v3/tree/main/lib/lib-storage:

  import { Upload } from "@aws-sdk/lib-storage";
  import { S3Client } from "@aws-sdk/client-s3";

  const target = { Bucket, Key, Body };
  try {
    const parallelUploads3 = new Upload({
      client: new S3Client({}),
      tags: [...], // optional tags
      queueSize: 4, // optional concurrency configuration
      leavePartsOnError: false, // optional manually handle dropped parts
      params: target,
    });

    parallelUploads3.on("httpUploadProgress", (progress) => {
      console.log(progress);
    });

    await parallelUploads3.done();
  } catch (e) {
    console.log(e);
  }

At the time of writing, the following Body types are supported:

  • string
  • Uint8Array
  • Buffer
  • Blob (hence also File)
  • Node Readable
  • ReadableStream

(according to https://github.com/aws/aws-sdk-js-v3/blob/main/lib/lib-storage/src/chunker.ts)

However if the Body object comes from a polyfill or separate realm and thus isn't strictly an instanceof one of these values, you will get an error. You can work around a case like this by cloning the Uint8Array/Buffer or piping the stream through a PassThrough. For example if you are using archiver to upload a .zip or .tar archive, you can't pass the archiver stream directly because it's a userland Readable implementation (at time of writing), so you must do Body: archive.pipe(new PassThrough()).

Thorsten answered 29/11, 2021 at 18:11 Comment(12)
Hey, thanks for sharing, why new S3({}) || new S3Client({})? – Hambrick
Good question, I just copied their example verbatim. That's bizarre... I got it to work using S3Client in my code so I'll just update the example to use that. – Thorsten
πŸ˜„ yeah I'm also successfully just using S3Client, but hoped you might have an answer to their code πŸ˜… – Hambrick
Nope. It's illogical because new S3({}) is always truthy. Maybe they were trying to illustrate that you can use either one (not sure if you can?), but that would be a semantically weird way to do so – Thorsten
without the parallelUploads3.on("httpUploadProgress" .... line, the steam upload never starts / finishes; How can I start or finish the stream upload without listening to httpUploadProgress and without printing the progress? – Investigation
seems strange, I don't recall having that problem myself so I don't know...all I can say is if you're sure you can consistently reproduce that, you could file an issue at github.com/aws/aws-sdk-js-v3 – Thorsten
Hi @Andy, what's the body should be? Can it be a javascript File Object? Since I want to directly use it in the browser. – Tijuana
@huanfeng that would be worth asking as a separate question, I don't have experience using the AWS SDK in the browser – Thorsten
Region is missing i am getting in vue3 – Inhalator
@Thorsten Will this also work with NodeJS.ReadableStream ? – Eliott
@Eliott yes, you can see what is supported here: github.com/aws/aws-sdk-js-v3/blob/main/lib/lib-storage/src/…. But a note of caution: it won't work if instanceof ReadableStream is false for the stream you pass in. This can happen if you get a stream from a 3rd party library that uses a polyfill or in certain environments like jsdom. – Thorsten
@Inhalator you need to set the AWS region then, there are various ways to do it, new S3Client({ region: 'us-west-2' }), environment variable AWS_REGION=us-west-2 etc – Thorsten
O
7

I did come across with the same error that you faced. It seems that they have a known issue that they haven't yet documented accurately:

The error is indeed caused by stream length remaining unknown. We need to improve the error message and the documentation

In order to fix this issue, you just need to specify the Content-length property for PutObjectCommand

Here is the updated snippet:

const { S3 } = require('@aws-sdk/client-s3');

const s3 = new S3({
  credentials: {
    accessKeyId: S3_API_KEY,
    secretAccessKey: S3_API_SECRET,
  },
  region: S3_REGION,
  signatureVersion: 'v4',
});

const uploadToFirstS3 = (passThroughStream) => (new Promise((resolve, reject) => {
  const uploadParams = {
    Bucket: S3_BUCKET_NAME,
    Key:'some-key',
    Body: passThroughStream,
    ContentLength: passThroughStream.readableLength, // include this new field!!
  };
  s3.putObject(uploadParams, (err) => {
    if (err) reject(err);
    resolve(true);
  });
}));
      

Hope it helps!

Obstreperous answered 12/7, 2023 at 18:42 Comment(2)
steam doesnt exist where it's being assigned to the Body property. Should this be passThroughStream instead? – Kylix
yes @devklick, I edited the code snippet accordingly, thanks! – Obstreperous

© 2022 - 2024 β€” McMap. All rights reserved.