I was looking at CycleGAN's official pytorch implementation and there, author chained the parameters of both networks and used a single optimizer for both network. How does this work? Is it better than using two different optimizers for two different networks ?
all_params = chain(module_a.parameters(), module_b.parameters())
optimizer = torch.optim.Adam(all_params)