I install a lot of the same packages in different virtualenv environments. Is there a way that I can download a package once and then have pip install from a local cache?
This would reduce download bandwidth and time.
I install a lot of the same packages in different virtualenv environments. Is there a way that I can download a package once and then have pip install from a local cache?
This would reduce download bandwidth and time.
According to the Pip documentation:
Starting with v6.0, pip provides an on by default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the
--no-cache-dir
option.
Therefore, the updated answer is to just use pip with its defaults if you want a download cache.
From the pip news, version 0.1.4:
Added support for an environmental variable $PIP_DOWNLOAD_CACHE which will cache package downloads, so future installations won’t require large downloads. Network access is still required, but just some downloads will be avoided when using this.
To take advantage of this, I've added the following to my ~/.bash_profile
:
export PIP_DOWNLOAD_CACHE=$HOME/.pip_download_cache
or, if you are on a Mac:
export PIP_DOWNLOAD_CACHE=$HOME/Library/Caches/pip-downloads
PIP_DOWNLOAD_CACHE
directory. For instance, I now have quite a few Django packages. virtualenvs
on the airplane, but it's still great.export PIP_DOWNLOAD_CACHE="${XDG_CACHE_HOME:-$HOME/.cache}/pip"
–
Parlin In my opinion, pip2pi
is a much more elegant and reliable solution for this problem.
From the docs:
pip2pi builds a PyPI-compatible package repository from pip requirements
pip2pi
allows you to create your own PyPI index by using two simple commands:
To mirror a package and all of its requirements, use pip2tgz
:
$ cd /tmp/; mkdir package/
$ pip2tgz packages/ httpie==0.2
...
$ ls packages/
Pygments-1.5.tar.gz
httpie-0.2.0.tar.gz
requests-0.14.0.tar.gz
To build a package index from the previous directory:
$ ls packages/
bar-0.8.tar.gz
baz-0.3.tar.gz
foo-1.2.tar.gz
$ dir2pi packages/
$ find packages/
/httpie-0.2.0.tar.gz
/Pygments-1.5.tar.gz
/requests-0.14.0.tar.gz
/simple
/simple/httpie
/simple/httpie/httpie-0.2.0.tar.gz
/simple/Pygments
/simple/Pygments/Pygments-1.5.tar.gz
/simple/requests
/simple/requests/requests-0.14.0.tar.gz
To install from the index you built in step 2., you can simply use:
pip install --index-url=file:///tmp/packages/simple/ httpie==0.2
You can even mirror your own index to a remote host with pip2pi
.
pip2tgz
detects if you have already downloaded the package to the designated directory, so if you run the same install line or several install lines that have overlapping dependencies, it will only download each package once. –
Elastic Newer Pip versions now cache downloads by default. See this documentation:
https://pip.pypa.io/en/stable/topics/caching/
Create a configuration file named ~/.pip/pip.conf
, and add the following contents:
[global]
download_cache = ~/.cache/pip
On OS X, a better path to choose would be ~/Library/Caches/pip
since it follows the convention other OS X programs use.
pip.conf
with a download_cache
setting that points to the same system-wide directory. –
Landry PIP_DOWNLOAD_CACHE has some serious problems. Most importantly, it encodes the hostname of the download into the cache, so using mirrors becomes impossible.
The better way to manage a cache of pip downloads is to separate the "download the package" step from the "install the package" step. The downloaded files are commonly referred to as "sdist files" (source distributions) and I'm going to store them in a directory $SDIST_CACHE.
The two steps end up being:
pip install --no-install --use-mirrors -I --download=$SDIST_CACHE <package name>
Which will download the package and place it in the directory pointed to by $SDIST_CACHE. It will not install the package. And then you run:
pip install --find-links=file://$SDIST_CACHE --no-index --index-url=file:///dev/null <package name>
To install the package into your virtual environment. Ideally, $SDIST_CACHE would be committed under your source control. When deploying to production, you would run only the second pip command to install the packages without downloading them.
Starting in version 6.0, pip
now does it's own caching:
- DEPRECATION
pip install --download-cache
andpip wheel --download-cache
command line flags have been deprecated and the functionality removed. Since pip now automatically configures and uses it’s internal HTTP cache which supplants the--download-cache
the existing options have been made non functional but will still be accepted until their removal in pip v8.0. For more information please see https://pip.pypa.io/en/latest/reference/pip_install.html#caching
More information from the above link:
Starting with v6.0, pip provides an on by default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the
--no-cache-dir
option.
pip wheel is an excellent option that does what you want with the extra feature of pre-compiling the packages. From the official docs:
Build wheels for a requirement (and all its dependencies):
$ pip wheel --wheel-dir=/tmp/wheelhouse SomePackage
Now your /tmp/wheelhouse
directory has all your dependencies precompiled, so you can copy the folder to another server and install everything with this command:
$ pip install --no-index --find-links=/tmp/wheelhouse SomePackage
Note that not all the the packages will be completely portable across machines. Some packages will be built specifically for the Python version, OS distribution and/or hardware architecture that you're using. That will be specified in the file name, like -cp27-none-linux_x86_64
for CPython 2.7 on a 64-bit Linux, etc.
Using pip only (my version is 1.2.1), you can also build up a local repository like this:
if ! pip install --find-links="file://$PIP_SDIST_INDEX" --no-index <package>; then
pip install --download-directory="$PIP_SDIST_INDEX" <package>
pip install --find-links="file://$PIP_SDIST_INDEX" --no-index <package>
fi
In the first call of pip, the packages from the requirements file are looked up in the local repository (only), and then installed from there. If that fails, pip retrieves the packages from its usual location (e.g. PyPI) and downloads it to the PIP_SDIST_INDEX
(but does not install anything!). The first call is "repeated" to properly install the package from the local index.
(--download-cache
creates a local file name which is the complete (escaped) URL, and pip cannot use this as an index with --find-links
. --download-cache
will use the cached file, if found. We could add this option to the second call of pip, but since the index already functions as a kind of cache, it does not necessarily bring a lot. It would help if your index is emptied, for instance.)
A simpler option is basket
.
Given a package name, it will download it and all dependencies to a central location; without any of the drawbacks of pip cache. This is great for offline use.
You can then use this directory as a source for pip
:
pip install --no-index -f file:///path/to/basket package
Or easy_install
:
easy_install -f ~/path/to/basket -H None package
You can also use it to update the basket whenever you are online.
There is a new solution to this called pip-accel, a drop-in replacement for pip
with caching built in.
The pip-accel program is a wrapper for pip, the Python package manager. It accelerates the usage of pip to initialize Python virtual environments given one or more requirements files. It does so by combining the following two approaches:
Source distribution downloads are cached and used to generate a local index of source distribution archives.
Binary distributions are used to speed up the process of installing dependencies with binary components (like M2Crypto and LXML). Instead of recompiling these dependencies again for every virtual environment we compile them once and cache the result as a binary *.tar.gz distribution.
Paylogic uses pip-accel to quickly and reliably initialize virtual environments on its farm of continuous integration slaves which are constantly running unit tests (this was one of the original use cases for which pip-accel was developed). We also use it on our build servers.
We've seen around 10x speedup from switching from pip
to pip-accel
.
I found the following to be useful for downloading packages and then installing from those downloads:
pip download -d "$SOME_DIRECTORY" some-package
Then to install:
pip install --no-index --no-cache-dir --find-links="$SOME_DIRECTORY"
Where $SOME_DIRECTORY
is the path to the directory that the packages are to be downloaded to.
I think the package "pip-accel" must be a good choice.
© 2022 - 2024 — McMap. All rights reserved.