How is a gRPC queue managed? is there a size limitation for a gRPC queue?

I am trying to understand how gRPC queues are managed and if there are any size limitations on gRPC queue size.

According to this SO post requests are queued:

If your server already processing maximum_concurrent_rpcs number of requests concurrently, and yet another request is received, the request will be rejected immediately.

If the ThreadPoolExecutor's max_workers is less than maximum_concurrent_rpcs then after all the threads get busy processing requests, the next request will be queued and will be processed when a thread finishes its processing.

According to this GitHub post the queue is managed by the gRPC server:

So maximum_concurrent_rpcs gives you a way to set an upper bound on the number of RPCs waiting in the server's queue to be serviced by a thread.

But this Microsoft post cofused me, saying requests are queued on the client:

When the number of active calls reaches the connection stream limit, additional calls are queued in the client. Queued calls wait for active calls to complete before they are sent.

Pay attention though, that here Microsoft is talking about connection stream limit. When that limit is reached, a queue is formed on the client.

Are there 2 types of queues? One that is created on the server (gRPC queue) when some limits are met (as mentioned above), and another created on the client when this connection stream limit is reached.

And what is the size limit of a gRPC queue? I mean, it is limited only by the underlying hardware (RAM)?

Is there any chance we can get the server to fail because of a huge queue size? Is it possible to limit this queue size?

And if we are talking about 2 different queues, can we manage and limit the one on the client too?

I am especially interested in python's point of view.

Thanks!

P.S. I am assuming when people are talking about gRPC queues they are talking about a queue created on the server.

Recommended topics

Hot tags