Where is one supposed to call torch.distributed.destroy_process_group() in Pytorch?
Asked Answered
M

1

12

I noticed that the docs do not have that function. Thus, it's unclear where one should be calling that. Does one have to:

  1. call it at the end of each worker code (i.e. inside of mp.spawn)
  2. or call it outside of mp.spawn i.e. by the main process

Note there is a gitissue requesting to put this function on the docs: https://github.com/pytorch/pytorch/issues/48203

this is an example of what 2 means:

def test_setup():
    print('test_setup')
    if torch.cuda.is_available():
        world_size = torch.cuda.device_count()
    else:
        world_size = 4
    master_port = find_free_port()
    mp.spawn(setup_process, args=(world_size, master_port), nprocs=world_size)
    dist.destroy_process_group()
    print('successful test_setup!')
Multifaceted answered 29/3, 2021 at 16:19 Comment(0)
S
1

You can notice that in the PyTorch tutorial the function cleanup() is called at the end of each process (i.e. inside mp.spawn)

def cleanup():
    dist.destroy_process_group()
Sherilyn answered 27/4, 2023 at 13:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.