Python ProcessPoolExecutor process inside gRPC server dies without throwing any errors
Asked Answered
J

2

7

I have a gRPC server which uses ProcessPoolExecutor with max_workers=1 to run a script. Here is why i had to use a ProcessPoolExecutor inside run_brain_application with max_workers=1. Below is the simplified version of the server script.

I've put two print statements to print the begining and ending of the process.

The issue is that sometimes, for some requests, the process starts but never finishes. I mean it prints START, but I never see the STOP. The same request works the next time, hence the issue is not with the request. Also, if there is an exception inside the run_application, it's thrown without any issues.

I've checked all the related questions on Stackoverflow, but most of them (1, 2) are about their processes not throwing exceptions, which is not my case.

Any thoughts greatly appreciated!

Thanks

class BrainServer(brain_pb2_grpc.BrainServiceServicer):

    def run_brain_application(self, request, context):
            request_dict = MessageToDict(
                request, 
                preserving_proto_field_name=True,
                keep_null=True
            )

            print("START")
            with futures.ProcessPoolExecutor(max_workers=1) as executor:
                result = executor.submit(run_application, request_dict).result()
            res = result.get('data')
            print("STOP")
            report = dict_to_protobuf(result_pb2.Result, {res_type: res}, strict=False)

            return report

        except Exception as e:
            _log_fatal(
                msg=str(e)
            )
            raise e


def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=16))
    brain_pb2_grpc.add_BrainServiceServicer_to_server(BrainServer(), server)

    server.add_insecure_port(f"[::]:{GRPC_PORT}")
    server.start()
    server.wait_for_termination()

if __name__ == '__main__':
    serve()
Jonathanjonathon answered 29/11, 2020 at 19:3 Comment(1)
Regarding the reasoning, did you run into that issue yourself? It's been 7 years after all. You could also try using joblib instead of ProcessPoolExecutor. It's manual claims it is better at handling errors.Chalky
C
0

This is a hard one to pinpoint.

I would first recommend to add timeouts to your Executors.

Also I would recommend to use only ThreadPoolExecutor and see if the issue is resolved. The python logger and maybe some other components are not behaving nicely when used in a multi process environment. See. My guess is that you are using python resources which are not supposed to be shared across processes, but they are (The logger is my immediate suspect)

Cycle answered 11/7, 2021 at 8:59 Comment(0)
C
0

Creating a child process with os.fork when a parent process is multi-threaded is problematic.
The multiprocesing module which concurrent.futures.ProcessPoolExecutor uses internally gives more than one way to create a child process. These ways are called start methods and they're named as fork, spawn and forkserver.

The forkserver method is recommended for a multi-threaded parent to create children because in forkserver method a server is created which is single-threaded and the multi-threaded parent requests the server to create a child. Since the server is single-threaded os.forking is not problematic.

The concurrent.futures.ProcessPoolExecutor takes mp_context parameter which it uses to set the start method for the processes in the pool.
So you should pass mp_context.

import multiprocessing as mp

ctx = mp.get_context('forkserver')
with futures.ProcessPoolExecutor(max_workers=1, mp_context=ctx) as executor:
    result = executor.submit(run_application, request_dict).result()
res = result.get('data')

Contexts and start methods

Candlestick answered 18/7, 2021 at 18:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.