Making SQS message visible again using partial batch response
Asked Answered
P

1

9

Assuming:

  • there is AWS Lambda that processes messages in 10-elements batches from AWS SQS FIFO queue with 25 available message group ids (assign in a random fashion)
  • the processing of a single message takes ~30-60 sec
  • SQS message visibility timeout is set to 10 min
  • integration trigger Lambda-SQS has 'Partial batch response' enabled

When Lambda returns a partial batch response with ids of the failed messages, e.x.:

{
    "batchItemFailures": [
        {
            "itemIdentifier": "d4c7c57f-c12c-4639-abe3-3a0d37690790"
        }
    ]
}

The successfully processed messaged are deleted from the queue. However, the failed messages from the batch are still waiting for the visibility timeout instead of being made visible to the consumers immediately. Is the behaviour for failed messages correct? Or the only way to make them visible immediately is to call SQS API and change visibility timeout to 0?

Protuberate answered 12/3, 2022 at 11:31 Comment(0)
P
2

While I cant speak for a FIFO SQS queue, I can say that I am seeing this behavior with a Standard SQS queue.

I assumed that items returned in the batchItemFailures would immediately be set to visible, but they do indeed seem to remain in flight until their VisibilityTimeout expires.

My solution was to manually change the message visibility timeout to 0 via the SQS API before returning the batchItemFailures.

Pathan answered 3/11, 2022 at 1:36 Comment(2)
I don't see anything in the documentation that suggests that failed items will become visible immediately. And, given that the behavior historically was to remain in flight until the visibility timeout expired, I would not be surprised if the batched failure behavior is the same wrt visibility.Pontificate
@Pontificate Correct. There is nothing in the documentation suggesting failed batch items would become immediately visible. My assumption was based on what I thought made logical sense. It seems odd to me that something that is known to have failed would have to wait for its VisibilityTimeout to expire before being added back into the queue. However, with testing this proved to be the case. It is certainly helpful to know the historical behavior of SQS messages.Pathan

© 2022 - 2024 — McMap. All rights reserved.