Libtorch/Pytorch Dilemma when combined static library into one STATIC library
Asked Answered
A

0

0

I have around 26 static libraries such as liba.a, libb.a, libc.a, ..., libz.a. There are two catches here:

1) circular dependencies between for example liba.a and libb.a;

2) some lib*.a has static global registration code which is unreferenced, but shall NOT be stripped away.

Thanks to stackoverflow, I managed to solve above two problems with the ld option -Wl,--whole-archive -la -lb -lc -ld -le ...(omitted)... -lz -Wl,--no-whole-archive -lpthread -lm -ldl -lrt -fopenmp , and the executable binary works! It is also explained in ld linker question: the --whole-archive option

Now I need to combine all 26 lib*.a into one static library liball.a. Also thanks to stackoverflow, the below MRI script.mri successuflly produces liball.a through the command ar -M < script.mri

create liball.a
addlib liba.a
addlib libb.a
addlib libc.a
addlib libd.a
... //omitted
addlib libz.a
save
end

However, here comes the issue when linking with newly combined static library liball.a:

1) with the option -Wl,--whole-archive -lall -Wl,--no-whole-archive -lpthread -lm -ldl -lrt -fopenmp FAILS!! It generates thousands of multiple defined symbols error or undefined symbols error!!

2) without the option -Wl,--whole-archive, the link command with-lall -lpthread -lm -ldl -lrt -fopenmp successfully produced the executable binary. However, the binary fails to execute , complaining some Device Registration code error. I understand that this is caused by some CPU initialization code getting pruned away when linking. Below is the detailed error

 p INTERNAL ASSERT FAILED at ../c10/core/impl/DeviceGuardImplInterface.h:132, please report a bug to PyTorch. DeviceGuardImpl for cpu is not available (getDeviceGuardImpl at ../c10/core/impl/DeviceGuardImplInterface.h:132)

This isssue is also mentioned here https://github.com/pytorch/pytorch/issues/14367

One way to solve it is to use selective registration. Could anyone share more details on this?

This question is NOT a duplicate of How to merge two "ar" static libraries into one? The MRI script method is from highest vote answer of the above link, and it is not working. Please remove the duplicate mark so that people can contribute. Thanks.

Adversity answered 27/5, 2019 at 9:24 Comment(7)
What does "complaining some Device Registration code error..." mean?Unsaddle
@Unsaddle thank you for asking. Please refer to updated post.Adversity
Possible duplicate of How to merge two "ar" static libraries into one?Suctorial
@Suctorial Thanks for your comment. However, I do not think it is a duplicate. The method mentioned in the link you provided is used here, but does not work.Adversity
So you tried it? You unpacked all the libraries into .o files and regenerated the ar library? You get duplicated symbols as expected, hm. Maybe unpack all libraries to .o files, and then re-compile with gcc?Suctorial
Those lib*.a have duplicate symbols, thus cannot use the -x way to do it. ar command MRI script is the native way to do it.Adversity
@Suctorial Could you please remove the duplicate mark. The MRI script used here is from the highest vote answer in the link you provided. Thanks.Adversity

© 2022 - 2024 — McMap. All rights reserved.