How to see sizes of installed pip packages?
Asked Answered
C

16

84

On Linux Debian, how can I list all installed python pip packages and the size (amount of disk space used) that each one takes up?

Charlottecharlottenburg answered 14/12, 2015 at 11:40 Comment(0)
A
17

Go to the package site to find the size e.g. https://pypi.python.org/pypi/pip/json

Then expand releases, find the version, and look up the size (in bytes).

Ancell answered 14/12, 2015 at 11:44 Comment(6)
I'm aware of that, but I want to list all packages installed and the actual size on disk.Charlottecharlottenburg
Sorry, the pip command line tool can't do that. If you want to list everything you have installed then use 'pip freeze'Ancell
Then from there you can find each package (probably in /usr/local/bin/) and use 'du -sh directoryName' on it to find the sizeAncell
what about listing the pip directory with sizes?Charlottecharlottenburg
Also displayed on the site: pypi.org/project/pip/#filesBarthelemy
How is this marked as the answer when it clear doesn't answer the question?Pippas
M
78

Modified for pip version 18 and above:

pip list \
  | tail -n +3 \
  | awk '{print $1}' \
  | xargs pip show \
  | grep -E 'Location:|Name:' \
  | cut -d ' ' -f 2 \
  | paste -d ' ' - - \
  | awk '{print $2 "/" tolower($1)}' \
  | xargs du -sh 2> /dev/null \
  | sort -hr

This command shows pip packages, sorted by descending order of sizes.

Mason answered 25/3, 2020 at 14:25 Comment(7)
Just add LANG=C at the very beginning if your terminal isn't originally in English, because "Location:|Name:" would'nt match otherwise... Thus LANG=C pip list | tail -n +3 | awk '{print $1}' | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - - | awk '{print $2 "/" tolower($1)}' | xargs du -sh 2> /dev/null | sort -hr and voilà!Bullheaded
this is the correct answer for current pip/python3. (answer marked as correct isn't very representative of total size of pkg)Coyne
I noticed that some packages would not appear in the results because their physical directory names differ from their claimed project names (for example, beautifulsoup4 would be installed as bs4.) Looks like currently we don't have a perfect solution unless we do a deep and serious scan (of dist-info or something like that).Recursion
Command chaining at its finest! :)Cesta
nice answer - just beware that this skips some packages because they aren't necessarily installed inside site-packages according to the name provided in the pip list description (often to do with hyphens and underscores)Sunderland
and i supposed i should add that if you want to ensure that you include all packages, you can do so by navigating to the directory above site-packages e.g. env/lib/python3.10 and then running the command: du -sh ./site-packages/* | sort -hrSunderland
I would suggest adding a caret symbol (^) on the grep ... part, because there are some packages with edge cases, like scipy, where Name: matches more than one line, and that inserts wrong lines all along the pipeline.Beauharnais
S
39

Could please try this one(A bit long though, maybe there are better solutions):

$ pip list \
  | xargs pip show \
  | grep -E 'Location:|Name:' \
  | cut -d ' ' -f 2 \
  | paste -d ' ' - - \
  | awk '{print $2 "/" tolower($1)}' \
  | xargs du -sh \
  2> /dev/null

the output should look like this:

80K     /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/blinker
3.8M    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/docutils
296K    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/ecdsa
340K    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/execnet
564K    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/fabric
1.4M    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/flask
316K    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/httplib2
1.9M    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/jinja2
...

should works if the package is installed in Location/Name. (location and name are from pip show <package>)


pip show <package> will show you the location:

---
Metadata-Version: 2.0
Name: Flask
Version: 0.10.1
Summary: A microframework based on Werkzeug, Jinja2 and good intentions
Home-page: http://github.com/mitsuhiko/flask/
Author: Armin Ronacher
Author-email: [email protected]
License: BSD
Location: /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages
Requires: itsdangerous, Werkzeug, Jinja2

we get the Name and Location to join them to get the location, finally use du -sh to get the package size.

Semicolon answered 14/12, 2015 at 12:48 Comment(7)
works great. to sort by size, we can add: | sort -h to the above pip list | xargs pip show.... commandLaris
or gsort on Mac OS X from homebrew, because standard sort on Mac does not have the -h flagVedette
i corrected this command for last python version on my answerMortensen
everything here mostly worked for me. I'm using pip 18.0 which outputs a header, so I added in a tail -n +3 | awk '{print $1}' in between the pip list` and pip showMonkhmer
I replaced both pip commands with pip3 as I'm on a Mac where pip is used for Python 2 and pip3 for Python 3; then (similar to what @Monkhmer did) I used | sed '1,2d' between pip3 list and xargs pip3 show to remove the 2 header rows in the pip3 list output; then to chop off the full path, I added | sed -E 's/\/Library\/Frameworks\/Python.framework\/Versions\/3.7\/lib\/python3.7\/site-packages\///g'; then for reverse sort and size in bytes I added | sed -E 's/([0-9]).([0-9])M/\1\200000/g ; s/ +([0-9]+)M/\1000000/g ; s/([0-9]).([0-9])K/\1\200/g ; s/ +([0-9]+)K/\1000/g' | sort -rnAlberta
I just noticed that this ignores some packages like PyAudio, although I haven't figured out what causes this yet.Alberta
I just noticed that the original command string unfortunately filters out installed packages for which there is just a .py file installed by pip and no directory, like PyAudio.Alberta
P
28

New version for new pip list format:

pip2 list --format freeze \
   |awk -F = {'print $1'} \
   | xargs pip2 show \
   | grep -E 'Location:|Name:' \
   | cut -d ' ' -f 2 \
   | paste -d ' ' - - \
   | awk '{print $2 "/" tolower($1)}' \
   | xargs du -sh \
   2> /dev/null \
  |sort -h
Primary answered 28/7, 2018 at 12:11 Comment(1)
This also works with pip3: pip3 list --format freeze|awk -F = {'print $1'}| xargs pip3 show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - - | awk '{print $2 "/" tolower($1)}' | xargs du -sh 2> /dev/null|sort -hMunafo
C
22

There is a simple Pythonic way to find it out though.

Here is the code. Let's call this file pipsize.py.

import os
import pkg_resources

def calc_container(path):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size



dists = [d for d in pkg_resources.working_set]

for dist in dists:
    try:
        path = os.path.join(dist.location, dist.project_name)
        size = calc_container(path)
        if size/1000 > 1.0:
            print (f"{dist}: {size/1000} KB")
            print("-"*40)
    except OSError:
        '{} no longer exists'.format(dist.project_name)

When run with python pipsize.py this will print out something like,

pip 21.1.2: 8651.906 KB
----------------------------------------
numpy 1.20.3: 25892.871 KB
----------------------------------------
numexpr 2.7.3: 1627.361 KB
----------------------------------------
zict 2.0.0: 48.54 KB
----------------------------------------
yarl 1.6.3: 1395.888 KB
----------------------------------------
widgetsnbextension 3.5.1: 4609.962 KB
----------------------------------------
webencodings 0.5.1: 54.768 KB
----------------------------------------
wcwidth 0.2.5: 452.214 KB
----------------------------------------
uvicorn 0.14.0: 257.515 KB
----------------------------------------
tzlocal 2.1: 67.11 KB
----------------------------------------
traitlets 5.0.5: 800.71 KB
----------------------------------------
tqdm 4.61.0: 289.412 KB
----------------------------------------
tornado 6.1: 2898.264 KB

Commit answered 10/6, 2021 at 4:3 Comment(3)
I like this. I did some modifications for mine(eg. KB to MB, sort by alphabet), it was a lot of help.Dogma
Here's the modified code for showing MB instead of KB and sort by size in descending order: gist.github.com/AnsonH/fd634ba4298376f2abd8e00f99b01be8Scandal
This is actually really faster than other alternatives suggested above.Ostracoderm
A
17

Go to the package site to find the size e.g. https://pypi.python.org/pypi/pip/json

Then expand releases, find the version, and look up the size (in bytes).

Ancell answered 14/12, 2015 at 11:44 Comment(6)
I'm aware of that, but I want to list all packages installed and the actual size on disk.Charlottecharlottenburg
Sorry, the pip command line tool can't do that. If you want to list everything you have installed then use 'pip freeze'Ancell
Then from there you can find each package (probably in /usr/local/bin/) and use 'du -sh directoryName' on it to find the sizeAncell
what about listing the pip directory with sizes?Charlottecharlottenburg
Also displayed on the site: pypi.org/project/pip/#filesBarthelemy
How is this marked as the answer when it clear doesn't answer the question?Pippas
F
8

All of the above solutions do not list packages with dashes in them: PIP converts them to underscores in the folder names:

pip list --format freeze | awk -F = {'print $1'} | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - - | awk '{gsub("-","_",$1); print $2 "/" tolower($1)}' | xargs du -sh 2> /dev/null | sort -h

And for Mac users:

pip3 list --format freeze | awk -F = {'print $1'} | xargs pip3 show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - - | awk '{gsub("-","_",$1); print $2 "/" tolower($1)}' | xargs du -sh 2> /dev/null | sort -h
Flay answered 23/12, 2020 at 11:47 Comment(0)
M
5

Here's how,

  1. pip3 show numpy | grep "Location:"
  2. this will return path/to/all/packages
  3. du -h path/to/all/packages
  4. last line will contain size of all packages in MB

Note: You may put any package name in place of numpy

Messere answered 11/5, 2020 at 14:34 Comment(0)
R
5

How

 $ du -h -d 1 "$(pip -V | cut -d ' ' -f 4 | sed 's/pip//g')" | grep -vE "dist-info|_distutils_hack|__pycache__" | sort -h

Pros

No need to convert these:
case (Django:django)
hyphen (django-q:django_q)
naming (djangorestframework-gis:rest_framework_gis)

Cons

Dependencies and some unknown directories revealed as well...

Radloff answered 31/3, 2021 at 10:0 Comment(0)
M
3

History :

There is no command or applications developed for that purpose at the moment, we need to check that manually

Manual Method I :

du /usr/lib/python3.5/ --max-depth=2 | sort -h
du /usr/lib64/python3.5/ --max-depth=2 | sort -h

This does not include packages/files installed out of that directory, thus said we will get 95% with those 2 simples command

Also if you have other version of python installed, you need to adapt the directory

Manual Method II :

pip list | sed '/Package/d' | sed '/----/d' | sed -r 's/\S+//2' | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - - | awk '{print $2 "/" $(find $2 -maxdepth 1 -iname $1)}' | xargs du -sh  | sort -h

Search the install directory with the package name with case insensitive

Manual Method II Alternative I :

pip list | sed '/Package/d' | sed '/----/d' | sed -r 's/\S+//2' | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - -| awk '{print $2 "/" tolower($1)}' | xargs du -sh | sort -h

Search the install directory with the package name with lowered case

Manual Method II Alternative II :

pip list | sed '/Package/d' | sed '/----/d' | sed -r 's/\S+//2' | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - -| awk '{print $2 "/" $1}' | xargs du -sh | sort -h

Search the install directory with the package name

Note :

For methods using du, output lines starting with du: cannot access need to be checked manually; The command use the install directory and add to it the name of the package but some times the package name and directory name are different...

Make it simple :

  • Use first method then
  • Use second method and just check manually package outside python classic directory
Mortensen answered 15/4, 2018 at 6:2 Comment(0)
H
2

On Mac, I navigate to the site-packages folder and do

du -h -d 1 | sort -rh | grep -v "dist-info"   

On linux you need --max-depth 1 instead of -d 1. But I think that should work.

Homoousian answered 12/7, 2023 at 16:30 Comment(0)
K
1

You can just run part 1 by it's self for all the current packages python tool-size.py will total them all up for you

If you want to know the exact size of a particular pip package including all its dependencies, i've created a little bash and python combo to achieve this

( based off the excellent package walking code answer above https://mcmap.net/q/183476/-how-to-see-sizes-of-installed-pip-packages )

Steps :

  1. create a python script to check all currently installed pip packages
  2. create a shell script to create a brand new python environment and install package to test, and run the script from step 1
  3. run shell script
  4. profit :)

Step 1

create a python script called tool-size.py

#!/usr/bin/env python

import os
import pkg_resources

def calc_container(path):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size

def calc_installed_sizes():
    dists = [d for d in pkg_resources.working_set]

    total_size = 0
    print (f"Size of Dependencies")
    print("-"*40)
    for dist in dists:
        # ignore pre-installed pip and setuptools
        if dist.project_name in ["pip", "setuptools"]:
            continue
        try:
            path = os.path.join(dist.location, dist.project_name)
            size = calc_container(path)
            total_size += size
            if size/1000 > 1.0:
                print (f"{dist}: {size/1000} KB")
                print("-"*40)
        except OSError:
            '{} no longer exists'.format(dist.project_name)

    print (f"Total Size (including dependencies): {total_size/1000} KB")

if __name__ == "__main__":
    calc_installed_sizes()

Step 2

create a bash script called tool-size.sh

#!/usr/bin/env bash

# uncomment to to debug
# set -x

rm -rf ~/.virtualenvs/tool-size-tester
python -m venv ~/.virtualenvs/tool-size-tester
source ~/.virtualenvs/tool-size-tester/Scripts/activate
pip install -q $1
python tool-size.py
deactivate

Step 3

run script with package you want to get the size of

tool-size.sh xxx

say for truffleHog3

$ ./tool-size.sh truffleHog3

Size of Dependencies
----------------------------------------
truffleHog3 2.0.6: 56.46 KB
----------------------------------------
smmap 4.0.0: 108.808 KB
----------------------------------------
MarkupSafe 2.0.1: 40.911 KB
----------------------------------------
Jinja2 3.0.1: 917.551 KB
----------------------------------------
gitdb 4.0.7: 320.08 KB
----------------------------------------
Total Size (including dependencies): 1443.81 KB

Keeper answered 15/6, 2021 at 15:30 Comment(0)
O
1

Starting with Python 3.10 you can get the on-disk sizes of installed Python packages using a script like

import importlib.metadata
for d in importlib.metadata.distributions():
    print(sum(f.locate().stat().st_blocks*512 for f in d.files), d.name)

Or from command line on single line

python -c 'for d in __import__("importlib.metadata").metadata.distributions(): print(sum(f.locate().stat().st_blocks*512 for f in d.files), d.name)

It works starting Python 3.8 if you replace d.name with d.metadata['Name'].

Offcenter answered 23/11, 2023 at 7:46 Comment(1)
Great approach! But I had to modify it quite a bit to make it work in my case..Joesphjoete
J
1

A modified version of Marko Kohtala's answer:
One-liner:

python -c "for d in __import__('importlib.metadata').metadata.distributions(): print('{:>12.3f} KiB   {}'.format(sum(0 if not f.locate().is_file() else f.locate().stat().st_size for f in d.files) / 1024, d.name))"

The same, but more readable:

import importlib.metadata
for d in importlib.metadata.distributions():
    d_size = 0
    for f in d.files:
        if f.locate().is_file():
            d_size += f.locate().stat().st_size
    print('{:>12.3f} KiB   {}'.format(d_size/1024, d.name))

Example output:

      60.752 KiB   multipledispatch
     318.895 KiB   natsort
   64329.371 KiB   numpy
     288.076 KiB   packaging
   54892.789 KiB   pandas
      28.006 KiB   pandas-flavor
    7185.510 KiB   pip
   77101.011 KiB   pyarrow
    1088.491 KiB   pyjanitor      
     644.466 KiB   python-dateutil
    1033.665 KiB   pytz
  147559.953 KiB   scipy
    3810.577 KiB   setuptools
      64.252 KiB   six       
     303.010 KiB   tabulate  
     572.733 KiB   tzdata
     523.449 KiB   wheel
    9488.667 KiB   xarray

Motivation for this modification:

  1. uses st_size (size in bytes) instead of st_blocks (size taken on disk)
  2. hence works of both Windows and Linux (python 3.10)
  3. resilient to missing files (personally, I run into them a lot)
  4. slightly better formatting
Joesphjoete answered 17/2, 2024 at 23:33 Comment(0)
C
0

Building on @Tirtha and @AnsonH answers, here is my version:

It features:

  • line showing the total space,
  • a line showing the space taken by all the small libraries,
  • a table-like formatting to display everything in decreasing order.
# Run `python pipsize.py` in Terminal to show size of pip packages
# Credits: https://mcmap.net/q/183476/-how-to-see-sizes-of-installed-pip-packages
# Credits: https://gist.github.com/AnsonH/fd634ba4298376f2abd8e00f99b01be8

import os
import pkg_resources

sort_in_descending = True   # Show packages in descending order


def calc_container(path):
    total_size = 0
    for dirpath, _, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size


total_size = 0
max_size = 0
max_dist_length = 0
dists = [d for d in pkg_resources.working_set]
dists_with_size = {}

for dist in dists:
    try:
        max_dist_length = max(max_dist_length, len(str(dist)))
        path = os.path.join(dist.location, dist.project_name)
        size = calc_container(path)
        total_size += size
        max_size = max(max_size, size)
        dists_with_size[size] = dist
    except OSError:
        '{} no longer exists'.format(dist.project_name)

# Sort packages size
dists_with_size = dict(sorted(dists_with_size.items(), reverse=sort_in_descending))


def str_spacer(name: str, max_len: int = max_dist_length) -> str:
    n_spaces = max_len - len(str(name))
    return f"{n_spaces * ' '}"


def human_readable_size(size: int, decimal_places: int = 2, max_unit: str = "PiB"):
    units = ['B', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB']

    if max_unit not in units:
        raise ValueError(f"specified max unit not in available units. Available units: {units}")

    for unit in units:
        if size < 1024.0 or unit == max_unit:
            break
        size /= 1024.0

    return f"{size:.{decimal_places}f} {unit}"


def table_printer(text: str, size: int):
    print(f"{text} {str_spacer(text)}{human_readable_size(size, max_unit='MiB')}")


# print total statement
table_printer("TOTAL", total_size)
max_size_text = human_readable_size(max_size, max_unit="MiB")
print("=" * (1 + max_dist_length + len(max_size_text)))

# print size for each distro
count_small_libs = 0
small_lib_size = 0
for size, dist in dists_with_size.items():
    if size/1000000 > 1.0:
        table_printer(dist, size)
    else:
        count_small_libs += 1
        small_lib_size += size

# print remaining size for small distros
small_lib_text = f"{count_small_libs} libs smaller than 1.0 MB"
print()
table_printer(small_lib_text, small_lib_size)

Running the script in python outputs:

TOTAL                           1341.58 MiB
==========================================
kaleido 0.2.1                   253.34 MiB
torch 1.13.0                    232.95 MiB
scipy 1.8.1                     93.77 MiB
pyarrow 10.0.0                  81.60 MiB
safetensors 0.4.1               1.14 MiB
fsspec 2023.12.2                1.08 MiB
coverage 7.4.0                  1.05 MiB
pyod 1.1.2                      1.03 MiB
pycparser 2.21                  1001.23 KiB

92 libs smaller than 1.0 MB     27.70 MiB
Cresol answered 5/1, 2024 at 10:45 Comment(0)
P
0

I like @Tirtha's solution. Here's my upgraded version that takes the path to a requirements.txt as an optional argument and only shows the sizes of the packages contained therein.

Useful if you want to know the size of dependencies for a specific project.

import os
import sys
import pkg_resources
from numpy import loadtxt


# Usage:  python3 pipsize.py [requirements.txt]
if len(sys.argv) == 2:
    with open(sys.argv[1], 'r') as file:
        requirements = file.read().splitlines()
else:
    requirements = []


def calc_container(path):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size


dists = [d for d in pkg_resources.working_set]


for dist in dists:

    if requirements:
        if dist.project_name not in requirements:
            continue

    try:
        path = os.path.join(dist.location, dist.project_name)
        size = calc_container(path)
        if size/1000 > 1.0:
            print (f"{dist}: {size/1000} KB")
            print("-"*40)
    except OSError:
        print(f"{dist.project_name} no longer exists")
Pippas answered 20/2, 2024 at 6:53 Comment(0)
V
-1

Here is code to return total size of Python packags in [MB] with individual package size:

import pkg_resources

def calc_container(path):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size

dists = [d for d in pkg_resources.working_set]
total_size = 0

for dist in dists:
    try:
        path = os.path.join(dist.location, dist.project_name)
        size = calc_container(path)
        total_size += size
        if size / (1024*1024) > 1.0:
            print(f"{dist}: {size / (1024*1024):.2f} MB")
            print("-" * 40)
    except OSError:
        print(f"{dist.project_name} no longer exists")

print("Total size of installed packages:", total_size / (1024*1024), "MB")
Vituperation answered 3/4, 2024 at 4:26 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.