Why are deep learning libraries so huge?
Asked Answered
C

1

11

I've recently downloaded all packages from PyPI. One interesting observation was that of the Top-15 of the biggest packages, all execept one are deep learning packages:

I looked at mxnet-cu90. It has exactly one huge file: libmxnet.so (936.7MB). What does this file contain? Is there any way to make it smaller?

I'm especially astonished that those libraries are so huge considering that one usually uses them on top of CUDA + cuDNN, which I thought would do the heavy lifting.

As a comparison, I looked at related libraries with which you can also build deep learning libraries:

  • numpy: 6MB
  • sympy: 6MB
  • pycuda: 3.6MB
  • tensorflow-cpu: 116MB (so the GPU version needs 241 MB more or around 3x the size!)
Courtund answered 13/1, 2020 at 17:13 Comment(8)
Not sure I follow the question. Why do you think that they should be smaller than they are? What do you mean by "huge"?Isleana
They are by far the biggest ones on PyPI. mxnet-cu90 is 600 MB. Except for a pure-data package, the next biggest package is less than 350 MB (I need to check how much less). Also, numpy is ~6MB. So 1% of mxnet. sympy is 6MB as well. pycuda is 3.6MB.Courtund
@EJoshuaS-ReinstateMonica I've added a couple of numbers for comparison.Courtund
Yeah, those numbers do seem weirdly high now that you include those details. I'll vote to reopen.Isleana
Can you clarify those sizes? For mxnet-cu90 you list both 940 MB and 600 MB, while PyPI lists 490 MB.Notwithstanding
I looked at this version of mxnet-cu90. 600 MB is the compressed size, 940MB is the uncompressed. All other sizes refer to the compressed sizeCourtund
it is large because they packaged cudnn insideExorcist
@Exorcist do you want to post this as an answer? Maybe give some more details about why this is so much and why they package cudnn inside in the first place?Courtund
E
6

Deep learning frameworks are large because they package CuDNN from NVIDIA into their wheels. This is done for the convenience of downstream users.

CuDNN are the primitives that the frameworks call to execute highly optimised neural network ops (e.g. LSTM)

The unzipped version of CuDNN for windows 10 is 435MB.

Exorcist answered 18/1, 2020 at 13:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.