Amazon SQS Long Polling not returning all messages
Asked Answered
I

4

12

I have a requirement to read all messages in my Amazon SQS queue in 1 read and then sort it based on created timestamp and do business logic on it.

To make sure all the SQS hosts are checked for messages, I enabled long polling. The way I did that was to set the default wait time for the queue as 10 seconds. (Any value more than 0 will enable long polling).

However when I tried to read the queue, it still did not give me all the messages and I had to do multiple reads to get all the messages. I even enabled long polling through code per receive request, still did not work. Below is the code I am using.

AmazonSQSClient sqsClient = new AmazonSQSClient(new ClasspathPropertiesFileCredentialsProvider());
sqsClient.setEndpoint("sqs.us-west-1.amazonaws.com");
String queueUrl = "https://sqs.us-west-1.amazonaws.com/12345/queueName";
ReceiveMessageRequest receiveRequest = new ReceiveMessageRequest().withQueueUrl(queueUrl).withMaxNumberOfMessages(10).withWaitTimeSeconds(20);
List<Message> messages = sqsClient.receiveMessage(receiveRequest).getMessages();

I have 3 messages in the queue and each time I run the code I get a different result, sometimes I get all 3 messages, sometimes just 1. The visibility timeout I set as 2 seconds, just to eliminate the messages becoming invisible as the reason for not seeing them in the read. This is the expected behavior for short polling. Long polling is supposed to eliminate multiple polls. Is there anything I am doing wrong here?

Thanks

Incendiarism answered 13/1, 2014 at 19:41 Comment(0)
D
29

Long polling is supposed to eliminate multiple polls

No, long polling is supposed to eliminate a large number of empty polls and false empty responses when messsages are actually available. A long poll in SQS won't sit and wait for the maximum amount of wait time just looking for more things to return, or keep searching once it's found something. A long poll in SQS only waits long enough to find something:

Long polling allows the Amazon SQS service to wait until a message is available in the queue before sending a response. So unless the connection times out, the response to the ReceiveMessage request will contain at least one of the available messages (if any) and up to the maximum number requested in the ReceiveMessage call.

http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html (emphasis added)

So, the “something” that SQS finds and returns may be all of the messages (up to your max), or a subset of the messages, because, as has been mentioned, SQS is a distributed system. There was likely an architectural decision to be made between "return as quickly as possible once we've found something" and "search the entire system for everything possible up to the maximum number of message the client will accept" ... and, given those alternatives, it seems reasonable that most applications would prefer the faster response of "give me whatever you can, as quickly as you can."

You don't know that you've actually drained a queue until you get back an empty response from a long poll.

Dentoid answered 14/1, 2014 at 1:4 Comment(3)
Getting an empty return from a long poll is not a guarantee that the queue is empty. A better indicator of emptiness is getting 0 back when you ask for the number of messages in the queue, but that also isn't guaranteed.Corolla
@Corolla I have never encountered a condition where a long poll returned 0 messages in spite of there being messages in the queue, although it's theoretically possible -- however, this does contradict the documented assertion that a long poll response "will contain at least one of the available messages" if there are any. If that occurs, it should be the exception.Dentoid
What should be the wait time configured for long polling? How do we know that the set time interval is enough for SQS to guarantee that there are no more messages?Pheidippides
D
3

As pointed out by Michael - sqlbot, SQS does not guarantee returning all (or the requested number of) messages even in case of Long Polling. Long Polling just ensures that you do not get false empty responses - i.e. your read requests do not return any messages even when there are messages in the queue.

I had done some experiments around this and found that the number of messages returned in the response approaches the number of the messages requested as you increase the number of messages in the queue. Typically, with 1000+ messages in the queue, in my experiments, I could see that it returned 10 messages (which is by the way the max that can be returned for a read request) everytime. In fact this behavior was observed for Short Polling as well. Even with 100+ messages, the number of messages returned was not 10 all the time, although a good percentage of those requests returned 10 messages back. Obviously, this is not guaranteed, but that is what you would typically see.

I had documented the findings from my experiments in one of my blogs - posting a link to the same below in case you would like to see more details of the experiment.

http://pragmaticnotes.com/2017/11/20/amazon-sqs-long-polling-versus-short-polling/

Doxia answered 23/11, 2017 at 6:36 Comment(1)
@Mortiz I had modified my post with more details. Can you guys review it again and revote as necessaryDoxia
Y
1

Because SQS is, on the back-end, a distributed system, there is no guarantee that any particular request will be able to return the maximum number of messages that are being polled for.

You just have to keep calling, till you are confident enough that you have as many items as you would expect, or that the queue has been emptied.

Yestreen answered 13/1, 2014 at 20:20 Comment(2)
Then the long polling doesn't really help in this use case. Even though it's looking at all the message hosted servers and therefore does know about all the messages that are in the queue, yet it can sometimes return a subset of the messages. Something just doesn't sound right.Incendiarism
Kaputabo, Long polling is not designed to help this use case. That is the truth. In fact, it sounds like SQS wasn't designed for your use case. Maybe you should look in to using DynamoDB?Corolla
O
0

Set the execution time out to a value greater than 0. I have set execution timeout to 2 seconds and it is now returning all 9 messages available in the queue.

Overkill answered 9/3, 2022 at 4:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.