gRPC client failing with "CANCELLED: io.grpc.Context was cancelled without error"
Asked Answered
L

2

8

I have a gRPC server written in C++ and a client written in Java. Everything was working fine using a blocking stub. Then I decided that I want to change one of the calls to be asynchronous, so I created an additional stub in my client, this one is created with newStub(channel) as opposed to newBlockingStub(channel). I didn't make any changes on the server side. This is a simple unary RPC call.

So I changed

Empty response = blockingStub.callMethod(request);

to

asyncStub.callMethod(request, new StreamObserver<Empty>() {
    @Override
    public void onNext(Empty response) {
       logInfo("asyncStub.callMethod.onNext");
    }

    @Override
    public void onError(Throwable throwable) {
       logError("asyncStub.callMethod.onError " + throwable.getMessage());
    }

    @Override
    public void onCompleted() {
        logInfo("asyncStub.callMethod.onCompleted");
    }
});

Ever since then, onError is called when I use this RPC (Most of the time) and the error it gives is "CANCELLED: io.grpc.Context was cancelled without error". I read about forking Context objects when making an RPC call from within an RPC call, but that's not the case here. Also, the Context seems to be a server side object, I don't see how it relates to the client. Is this a server side error propagating back to the client? On the server side everything seems to complete successfully, so I'm at a loss as to why this is happening. Inserting a 1ms sleep after calling asyncStub.callMethod seems to make this issue go away, but defeats the purpose. Any and all help in understanding this would be greatly appreciated.

Some notes:

  1. The processing time on the server side is around 1 microsecond
  2. Until now, the round trip time for the blocking call was several hundred microseconds (This is the time I'm trying to cut down, as this is essentially a void function, so I don't need to wait for a response)
  3. This method is called multiple times in a row, so before it used to wait until the previous one finished, now they just fire off one after the other.
  4. Some snippets from the proto file:
service EventHandler {
  rpc callMethod(Msg) returns (Empty) {}
}

message Msg {
  uint64 fieldA = 1;
  int32 fieldB = 2;
  string fieldC = 3;
  string fieldD = 4;
}

message Empty {

}
Luxemburg answered 14/10, 2020 at 14:49 Comment(2)
Some updates: I changed the server code to be an async server and the problem still persists. I also moved to a FutureStub in the client, but I have the same issue. I also added a semaphore that only allows 2 concurrent RPC calls at the same time and the issue still occurs, when I limit the semaphore to 1 concurrent RPC call at a time I can still reproduce this, but if I explicitly call future.get() after calling the RPC the issue disappears.Luxemburg
There seems to be some issue of requests happening at the same time, so when onSuccess is called for a successful RPC, the semaphore is released and then the next RPC starts and after that the first RPC leaves the onSuccess function and that must be what kills the second RPC. Any ideas?Luxemburg
L
9

So it turns out that I was wrong. The context object is used by the client too. The solution was to do the following:

Context newContext = Context.current().fork();
Context origContext = newContext.attach();
try {
    // TODO: Call async RPC here
} finally {
    newContext.detach(origContext);
}

Hopefully this can help someone else in the future.

Luxemburg answered 18/10, 2020 at 9:48 Comment(0)
W
0

I've seen a lot of older answers that mention using attach()/detach() but the grpc docs explicitly mention using run instead.

This in fact solved the problem for me:

Context.current().fork().run {
    doSend(request)
}
Widower answered 28/3, 2023 at 16:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.