Processing AWS Lambda messages in Batches
Asked Answered
I

7

9

I am wondering something, and I really can't find information about it. Maybe it is not the way to go but, I would just like to know.

It is about Lambda working in batches. I know I can set up Lambda to consume batch messages. In my Lambda function I iterate each message, and if one fails, Lambda exits. And the cycle starts again.

I am wondering about slightly different approach Let's assume I have three messages: A, B and C. I also take them in batches. Now if the message B fails (e.g. API call failed), I return message B to SQS and keep processing the message C.

Is it possible? If it is, is it a good approach? Because I see that I need to implement some extra complexity in Lambda and what not.

Thanks

Inurbane answered 8/2, 2019 at 15:59 Comment(0)
C
22

There's an excellent article here. The relevant parts for you are...

  • Using a batchSize of 1, so that messages succeed or fail on their own.
  • Making sure your processing is idempotent, so reprocessing a message isn't harmful, outside of the extra processing cost.
  • Handle errors within your function code, perhaps by catching them and sending the message to a dead letter queue for further processing.
  • Calling the DeleteMessage API manually within your function after successfully processing a message.

The last bullet point is how I've managed to deal with the same problem. Instead of returning errors immediately, store them or note that an error has occurred, but then continue to handle the rest of the messages in the batch. At the end of processing, return or raise an error so that the SQS -> lambda trigger knows not to delete the failed messages. All successful messages will have already been deleted by your lambda handler.

sqs = boto3.client('sqs')

def handler(event, context):
    failed = False

    for msg in event['Records']:
        try:
            # Do something with the message.
            handle_message(msg)
        except Exception:
            # Ok it failed, but allow the loop to finish.
            logger.exception('Failed to handle message')
            failed = True
        else:
            # The message was handled successfully. We can delete it now.
            sqs.delete_message(
                QueueUrl=<queue_url>,
                ReceiptHandle=msg['receiptHandle'],
            )

    # It doesn't matter what the error is. You just want to raise here
    # to ensure the trigger doesn't delete any of the failed messages.
    if failed:
        raise RuntimeError('Failed to process one or more messages')

def handle_msg(msg):
    ...
Cowpox answered 9/2, 2019 at 23:33 Comment(5)
Thanks for the reply. Do you have maybe an example of your implementation. I will have messages that connect to DB for sure. Any example is appreciated.Inurbane
@VedranMaricevic I added an example as requested.Cowpox
Awesome. Thank you.Inurbane
Super interesting! One question that might make this code even better: Is there a way, from within that lambda, to know the queue_url? That way, the delete_message would be even more autonomous?Woodruff
Good question. It's been a few years but from what I remember we simply passed the queue url through configuration.Cowpox
R
3

For Node.js, check out https://www.npmjs.com/package/@middy/sqs-partial-batch-failure.

const middy = require('@middy/core')
const sqsBatch = require('@middy/sqs-partial-batch-failure')

const originalHandler = (event, context, cb) => {
  const recordPromises = event.Records.map(async (record, index) => { /* Custom message processing logic */ })
  return Promise.allSettled(recordPromises)
}

const handler = middy(originalHandler)
  .use(sqsBatch())

Check out https://medium.com/@brettandrews/handling-sqs-partial-batch-failures-in-aws-lambda-d9d6940a17aa for more details.

Riflery answered 7/2, 2020 at 6:48 Comment(0)
D
2

As of Nov 2019, AWS has introduced the concept of Bisect On Function Error, along with Maximum retries. If your function is idempotent this can be used.

In this approach you should throw an error from the function even if one item in the batch is failing. AWS with split the batch into two and retry. Now one half of the batch should pass successfully. For the other half the process is continued till the bad record is isolated.

Danille answered 25/3, 2020 at 18:56 Comment(1)
According to the docs this is only for Kinesis and Dynamo streams. Unfortunately, I don't see this same functionality for SQS triggers. docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/…Cowpox
T
1

Like all architecture decisions, it depends on your goal and what you are willing to trade for more complexity. Using SQS will allow you to process messages out of order so that retries don't block other messages. Whether or not that is worth the complexity depends on why you are worried about messages getting blocked.

I suggest reading about Lambda retry behavior and Dead Letter Queues.

Tertian answered 8/2, 2019 at 16:55 Comment(0)
L
1

If you want to retry only the failed messages out of a batch of messages it is totally doable, but does add slight complexity.

A possible approach to achieve this is iterating through a list of your events (ex [eventA, eventB, eventC]), and for each execution, append to a list of failed events if the event failed. Then, have an end case that checks to see if the list of failed events has anything in it, and if it does, manually send the messages back to SQS (using SQS sendMessageBatch).

However, you should note that this puts the events to the end of the queue, since you are manually inserting them back.

Anything can be a "good approach" if it solves a problem you are having without much complexity, and in this case, the issue of having to re-execute successful events is definitely a problem that you can solve in this manner.

Ligon answered 8/2, 2019 at 17:40 Comment(1)
In case of faiure if you keep on adding messages to the queue it will likely be a failure again, best approach is to delete the success messages if there was any failure and raise an exception and then let DLQ do its thing.Tody
W
0

SQS/Lambda supports reporting batch failures. How it works is within each batch iteration, you catch all exceptions, and if that iteration fails add that messageId to an SQSBatchResponse. At the end when all SQS messages have been processed, you return the batch response.

Here is the relevant docs section: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting

To use this feature, your function must gracefully handle errors. Have your function logic catch all exceptions and report the messages that result in failure in batchItemFailures in your function response. If your function throws an exception, the entire batch is considered a complete failure.

Worms answered 18/7, 2022 at 20:40 Comment(0)
N
0

To add to the answer by David:

SQS/Lambda supports reporting batch failures. How it works is within each batch iteration, you catch all exceptions, and if that iteration fails add that messageId to an SQSBatchResponse. At the end when all SQS messages have been processed, you return the batch response.

Here is the relevant docs section: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting

I implemented this, but a batch of A, B and C, with B failing, would still mark all three as complete. It turns out you need to explicitly define the lambda event source mapping to expect a batch failure to be returned. It can be done by adding the key of FunctionResponseTypes with the value of a list containing ReportBatchItemFailures. Here is the relevant docs: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting

My sam template looks like this after adding this:

Type: SQS
Properties:
    Queue: my-queue-arn
    BatchSize: 10
    Enabled: true
    FunctionResponseTypes:
        - ReportBatchItemFailures

Newsstand answered 30/9, 2022 at 8:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.