Append data to an S3 object
Asked Answered
C

11

147

Let's say that I have a machine that I want to be able to write to a certain log file stored on an S3 bucket.

So, the machine needs to have writing abilities to that bucket, but, I don't want it to have the ability to overwrite or delete any files in that bucket (including the one I want it to write to).

So basically, I want my machine to be able to only append data to that log file, without overriding it or downloading it.

Is there a way to configure my S3 to work like that? Maybe there's some IAM policy I can attach to it so it will work like I want?

Crabber answered 21/1, 2017 at 20:4 Comment(5)
You can't modify objects in S3. Could you just append a new log file? That would be a better model and would support multiple, simultaneous clients.Aerophagia
@Aerophagia Yeah, I thought about that, but the problem is that if an attacker succeeds in accessing my server, he'll have the ability to delete the local file stored on it, before it was sent to the S3 bucket (which let's say happens at the end of the day).Crabber
You might also want to take a look at CloudWatch logs. Let it manage the complexity of collecting and storing your logs, provide searching facilities, retention policies, and allow you to generate alerts based on metrics that you can customize for your logs.Aerophagia
You might also take a look at Google BigQuery. You can use it to solve your problem.Salverform
If your application can handle the change, you can also convert your "append to log file" approach to "add log file to prefix". In other words, each append will technically be a new file, but whatever reads your logs can combine the contents of all these log files by reading everything under the shared prefix.Clench
Y
203

Unfortunately, you can't.

S3 doesn't have an "append" operation.* Once an object has been uploaded, there is no way to modify it in place; your only option is to upload a new object to replace it, which doesn't meet your requirements.

*: Yes, I know this post is a couple of years old. It's still accurate, though.

Yellowweed answered 21/1, 2017 at 20:15 Comment(5)
May i know, By using Multipart Upload can we achieve this?Shortstop
Multipart Upload will allow you to get the data in to S3 without downloading the original object, but it wouldn't allow you to overwrite the original object directly. See e.g. docs.aws.amazon.com/AmazonS3/latest/API/… You could then delete the old object/rename the new one. This, however, is not what the question is asking.Boatright
I think that using Multipart Upload may actually work. All your parts are sequential segments of the same file. If the part succeeds to be uploaded, you could eventually commit the upload to be able to read the file. So, as long as you don't need to read the contents of the file, you could be appending to using the same multipart upload.Refugiorefulgence
@Refugiorefulgence I still don't think it meets the OP's requirements. There is no way I'm aware of to restrict an S3 user to performing multipart uploads which append to an object -- if they can perform a multipart upload, they can upload any content they want.Yellowweed
It is possible to provide an "append interface", as s3fs has done, but only via "no-upload-copy + partial upload + rewrite original", as mentioned by @duskwuff-inactiveRecurrence
Q
32

As the accepted answer states, you can't. The best solution I'm aware of is to use:

AWS Kinesis Firehose

https://aws.amazon.com/kinesis/firehose/

Their code sample looks complicated but yours can be really simple. You keep performing PUT (or BATCH PUT) operations onto a Kinesis Firehose delivery stream in your application (using the AWS SDK), and you configure the Kinesis Firehose delivery stream to send your streamed data to an AWS S3 bucket of your choice (in the AWS Kinesis Firehose console).

enter image description here

It's still not as convenient as >> from the Linux command line, because once you've created a file on S3 you again have to deal with downloading, appending, and uploading the new file but you only have to do it once per batch of lines rather than for every line of data so you don't need to worry about huge charges because of the volume of append operations. Maybe it can be done but I can't see how to do it from the console.

Quinta answered 19/8, 2017 at 2:20 Comment(4)
Note that there is either a max time (900 seconds since file creation) or a max size (128mb file size) on doing this - meaning, Kinesis firehose will append to the same S3 file until it reaches either of those limits: docs.aws.amazon.com/firehose/latest/dev/create-configure.htmlEpic
Can you use a single S3 file as output on the Firehose? It sounds a bit messy having to merge multiple files in a S3 bucket.Harangue
Unfortunately no. I too wish there was a better solution.Quinta
Yeah it's unfortunate. I'm mostly concerned about race condition if I manually download & append records to a single S3 object. I've been thinking about adding the records to SQS and then using some logic with SNS + Lambda to poll the SQS and then write the new entries to the S3 object.Harangue
O
15

Objects on S3 are not append-able. You have 2 solutions in this case:

  1. copy all S3 data to a new object, append the new content and write back to S3.

    function writeToS3(input) {
        var content;
        var getParams = {
            Bucket: 'myBucket', 
            Key: "myKey"
        };
    
        s3.getObject(getParams, function(err, data) {
            if (err) console.log(err, err.stack);
            else {
                content = new Buffer(data.Body).toString("utf8");
                content = content + '\n' + new Date() + '\t' + input;
                var putParams = {
                    Body: content,
                    Bucket: 'myBucket', 
                    Key: "myKey",
                    ACL: "public-read"
                };
    
                s3.putObject(putParams, function(err, data) {
                    if (err) console.log(err, err.stack); // an error occurred
                    else {
                        console.log(data);           // successful response
                    }
                });
            }
        });  
    }
    
  2. Second option is to use Kinesis Firehose. This is fairly straightforward. You need to create your firehose delivery stream and link the destination to S3 bucket. That's it!

    function writeToS3(input) {
        var content = "\n" + new Date() + "\t" + input;
        var params = {
        DeliveryStreamName: 'myDeliveryStream', /* required */
        Record: { /* required */
            Data: new Buffer(content) || 'STRING_VALUE' /* Strings will be Base-64 encoded on your behalf */ /* required */
        }
        };
    
        firehose.putRecord(params, function(err, data) {
        if (err) console.log(err, err.stack); // an error occurred
        else     console.log(data);           // successful response
        }); 
    }
    
Occasionalism answered 7/11, 2018 at 16:35 Comment(1)
Can you use a single S3 file as output?Harangue
K
7

You can:

  1. Set up Multipart Upload
  2. Call UploadPartCopy specifying the existing S3 object as a source
  3. Call UploadPart with the data you want to append
  4. Close Multipart Upload.

There is a number of limitations for example your existing object must be larger then 5MB ( however if it is smaller copying it to the client should be fast enough for most cases).

It is not as nice as straight append but at least you do not need to copy data back and forth from aws to the local machine.

Knossos answered 22/10, 2021 at 15:56 Comment(0)
H
3

In case anyone wants to append data to an object with an S3-like service, the Alibaba Cloud OSS (Object Storage Service) supports this natively.

OSS provides append upload (through the AppendObject API), which allows you to directly append content to the end of an object. Objects uploaded by using this method are appendable objects, whereas objects uploaded by using other methods are normal objects. The appended data is instantly readable.

Hypsometer answered 13/12, 2019 at 0:27 Comment(0)
M
3

The problem we were facing was creating a several gigabyte big s3 file without ever the entirety of it into RAM. The approach below does combine several files by appending them on the end of each other, so depending on your needs, this could be a viable solution.

The solution we came up with was:

  1. Upload the file in chunks into an AWS S3 folder
  2. Use AWS Athena to define a table based on that S3 folder by running
CREATE EXTERNAL TABLE IF NOT EXISTS `TrainingDB`.`TrainingTable` (`Data` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('collection.delim' = '\n')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://your-bucket-name/TrainingTesting/';

  1. Generate a combination of all the results in that table by running
UNLOAD (SELECT * FROM "TrainingDB"."TrainingTable") 
TO 's3://your-bucket/TrainingResults/results5' 
WITH ( format = 'TEXTFILE', compression='none' )

this will append all the files on the end of each other and provide you with one files with all the chunks you were trying to append. This is overkill if you're just trying to combine a few small files, in which case just pulliing the original file down and writing to the end will probably be better (as the other answers suggest)

Modify answered 25/10, 2022 at 10:24 Comment(0)
T
2

As others have stated previously, S3 objects are not append-able.
However, another solution would be to write out to CloudWatch logs and then export the logs you want to S3. This would also prevent any attackers who access your server from deleting from your S3 bucket, since Lambda wouldn't require any S3 permissions.

Toggle answered 25/7, 2019 at 20:20 Comment(1)
This is a good solution to the original problem. We shouldn't ask "I can't get Y to solve X, how do I get Y to work?" but rather "How can I solve X?" which I think this does in a better way.Synaesthesia
C
2

I had a similar issue where I had to write errors to a log file in S3 during a long-running proces (couple of hours). So I didn't had a file locally to create a one-time stream, but I had to append the errors to a file on runtime.

So what you can do is keeping an open connection with a specific file and write to the file when you want:

const { S3 } = require('aws-sdk')
const { PassThrough } = require('stream')

// append to open connection
const append = (stream, data ) => new Promise(resolve => {
  stream.write(`${data}\n`, resolve)
})

const openConnectionWithS3 = async () => {
  const s3 = new S3({
    credentials: {
      accessKeyId: process.env.AWS_ACCESS_KEY_ID,
      secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
    },
    endpoint: process.env.AWS_S3_ENDPOINT,
    region: process.env.AWS_DEFAULT_REGION,
  })
  const fileName = 'test.log'
  const bucketName = 'my-bucket'
  // create pass through stream. This stream we use to write data to
  // but this stream we also use to pass the same data to aws
  const pass = new PassThrough()

  // dont resolve the promise, but keep it open and await for the result when the long running process is done
  const promise = s3
    .upload({
      Bucket: bucketName,
      Key: fileName,
      // pass the stream as body, aws will handle the stream from now
      Body: pass,
    })
    .promise()

  // write data to our open connection.
  // we can even write it on different places
  for (let i = 0; i < 100000; i++) {
    await append(pass, `foo${i}`)
  }

  // here we resolve the promise and close the connection
  await Promise.all([
    // push null to the stream, the stream now knows after the
    // 1000 foo's it should stop writing
    pass.push(null),
    promise,
  ])
}

openConnectionWithS3()

It will append items to a file in S3, and resolves when it is done.

Craven answered 8/11, 2022 at 10:26 Comment(1)
The problem in the above approach is that this is not real streaming. Your program first writes everything to stream and only then uploads it to S3. Such that memory has to be as big as the file is and if it fails in the middle it has to start from the beginning. I checked memeory and it keeps growing until the entire stream is filled.Loquacity
K
2

Yes you can, with s3fs.

import s3fs

s3 = s3fs.S3FileSystem(anon=False)

# Create a file just like you do on a local system
path_to_your_file = "s3://my-bucket/my-key/my_file.txt

with s3.open('path_to_your_file, 'w') as f:
    f.write(f"This is a new QA file!\n")

# Now append to the file just like you do on a local system.
with s3.open('path_to_your_file, 'a') as f:
    f.write(f"----------------------------------------------------------!\n")

If you check the file on s3 you will see the dotted line appended. You must configure s3fs to work with your local ( CLI tools ).

I hope it helps!

Krishnakrishnah answered 18/10, 2023 at 20:18 Comment(0)
A
1

S3 bucket does not allow you to append existing objects, the way which can be used to do this, is first use the get method to get the data from S3 bucket then add the new data you want to append in it locally and then push it back to S3 bucket.

As, It is not possible to append to an existing S3 object. You will need to replace it with a new object with the data appended to it. This means that you would need to upload the entire object (log file) each time a new entry is appended to it. This won't be very efficient.

You could have log entries sent to a SQS queue and when the queue size reaches a set number, you could have the log messages batched together and added as an object in your S3 bucket. This still won't satisfy your requirement of appending to a single object

Amador answered 12/2, 2021 at 8:53 Comment(0)
E
-1

I had the similar issue and this is what I had asked

how to Append data in file using AWS Lambda

Here's What I come up with to solve the above problem:

Use getObject to retrive from the existing file

   s3.getObject(getParams, function(err, data) {
   if (err) console.log(err, err.stack); // an error occurred
   else{
       console.log(data);           // successful response
       var s3Projects = JSON.parse(data.Body);
       console.log('s3 data==>', s3Projects);
       if(s3Projects.length > 0) {
           projects = s3Projects;
       }   
   }
   projects.push(event);
   writeToS3(); // Calling function to append the data
});

Write function to append in the file

   function writeToS3() {
    var putParams = {
      Body: JSON.stringify(projects),
      Bucket: bucketPath, 
      Key: "projects.json",
      ACL: "public-read"
     };

    s3.putObject(putParams, function(err, data) {
       if (err) console.log(err, err.stack); // an error occurred
       else     console.log(data);           // successful response
        callback(null, 'Hello from Lambda');
     });
}

Hope this help!!

Eucaine answered 7/9, 2017 at 9:27 Comment(5)
Your writeToS3 function will overwrite a file, not append to it.Yellowweed
@duskwuff-inactive- agreed, and also it suffers from race conditions if two methods try to work on the same object, but this is not really different from languages that have immutable strings or types -- you simulate an append by returning/overwriting with a new object.Bobette
This is useful because it has the advantage of not consuming additional bandwidth if your app that appends data is outside of the AWS network.Horseflesh
this is not appendKnossos
Like others have said, this is not appending. You're just downloading the entire file, modifying it, and then re-uploading the entire thing with Lambda. And even this probably won't scale due to Lambda's performance constraints. If the file's too large, the Lambda function will time out or run out of memory, which is presumably one of the reasons why Op wants to append to the file in-place.Internationalist

© 2022 - 2024 — McMap. All rights reserved.