Why some people chain the parameters of two different networks and train them with same optimizer?

Asked 17/5, 2020 at 3:10 Answered 5/4, 2023 at 10:0

python deep-learning pytorch generative-adversarial-network

I was looking at CycleGAN's official pytorch implementation and there, author chained the parameters of both networks and used a single optimizer for both network. How does this work? Is it better than using two different optimizers for two different networks ?

all_params = chain(module_a.parameters(), module_b.parameters())
optimizer = torch.optim.Adam(all_params)

Haddock answered 17/5, 2020 at 3:10 Comment(0)

From chain documentation: https://docs.python.org/3/library/itertools.html#itertools.chain

itertools.chain(*iterables)

    Make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhausted.

As parameters() gives you an iterable, you can use the optimizer to simultaneously optimize parameters for both of the networks. So, same optimizer states will be used for both models (Modules), if you use two different optimizers, the parameters will be optimized separately.

If you have a composite network, it becomes necessary to optimize the parameters (of all) at the same time, hence using a single optimizer for all of them is the way to go.

Comedienne answered 17/5, 2020 at 3:18 Comment(6)

This seems like a restatement of OP's question. The OP seems to understand that the same optimizer is used to optimize parameters of both networks but is asking how that works and why not use two optimizers. – Hydranth 17/5, 2020 at 3:25

so if we use single optimizer, does loss values of model B affect the change in model A ? – Haddock 17/5, 2020 at 5:11

Yes, if you use the same for both models, it also depends on how the loss is defined. – Comedienne 17/5, 2020 at 5:14

won't that sabotage the training of generative networks in cycle GAN ? – Haddock 17/5, 2020 at 5:44

self.loss_G = self.loss_G_A + self.loss_G_B + self.loss_cycle_A + self.loss_cycle_B + self.loss_idt_A + self.loss_idt_B  self.loss_G.backward()

Here it sums all the losses of both generators and optimize them. I feel different optimizers should be used and losses should not be added – Haddock 17/5, 2020 at 5:46

Sorry, I'm not actually very familiar with the cyclegan implementation, will be back after looking into that. – Comedienne 17/5, 2020 at 5:57

It makes sense to optimize both generators together (and adding both losses), because of the "cycle". The cycle loss uses both generators - G_B(G_A(A)) and G_A(G_B(B)). I think, if you would use separate optimizers you would need to call backward() on both losses before calling step() to achieve the same effect (this not have to be true for all optimization algorithms).

In official code, parameters of discriminator are also chained, but you could easily use separate optimizers (again, it not have to be true for other optimizations algorithms) because loss of D_A do not depend on D_B.

Mattson answered 13/5, 2021 at 16:16 Comment(0)

Check out the official repo Issues.

https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues?q=is%3Aissue+optimizer+is%3Aclosed

https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/177

https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/1381

Good luck.

Momism answered 5/4, 2023 at 10:0 Comment(0)

Recommended topics

Hot tags