How do I install from a local cache with pip?
Asked Answered
M

11

156

I install a lot of the same packages in different virtualenv environments. Is there a way that I can download a package once and then have pip install from a local cache?

This would reduce download bandwidth and time.

Magically answered 26/1, 2011 at 15:38 Comment(2)
Note that as of pip 6.0 (2014-12-22), pip will cache by default. See pip.pypa.io/en/stable/reference/pip_install.html#caching for details.Amontillado
It doesn't just reduce download bandwidth time, it also can eliminate the time spent crawling the PyPI index to check available versions of packages, and if you are caching wheels, it can eliminate time spent building wheels for packages that don't provide them. It adds up to a very substantial speed boost.Shoplifter
M
131

Updated Answer 19-Nov-15

According to the Pip documentation:

Starting with v6.0, pip provides an on by default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the --no-cache-dir option.

Therefore, the updated answer is to just use pip with its defaults if you want a download cache.

Original Answer

From the pip news, version 0.1.4:

Added support for an environmental variable $PIP_DOWNLOAD_CACHE which will cache package downloads, so future installations won’t require large downloads. Network access is still required, but just some downloads will be avoided when using this.

To take advantage of this, I've added the following to my ~/.bash_profile:

export PIP_DOWNLOAD_CACHE=$HOME/.pip_download_cache

or, if you are on a Mac:

export PIP_DOWNLOAD_CACHE=$HOME/Library/Caches/pip-downloads

Notes

  1. If a newer version of a package is detected, it will be downloaded and added to the PIP_DOWNLOAD_CACHE directory. For instance, I now have quite a few Django packages.
  2. This doesn't remove the need for network access, as stated in the pip news, so it's not the answer for creating new virtualenvs on the airplane, but it's still great.
Magically answered 26/1, 2011 at 15:38 Comment(6)
Maybe better idea is to put it into .bashrc, because bash_profile is executed only during login. That's up to you, and anyway it's a good advice :)Ritualist
On macs it is loaded at the beginning of any shell.Exemplify
PIP_DOWNLOAD_CACHE is seriously flawed and I wouldn't recommend using it for things like getting packages out to your deployment machines. It also still relies on pypi.python.org being reachable. Great for a local development cache, but not suitable for heavier uses.Boneblack
@Boneblack Could you comment on why it is seriously flawed? If you don't want PyPI to be reachable, that's what --no-index is for; a download cache is surely orthogonal to reaching PyPI or not!Monition
@Monition slacy's answer below explains why Pip's download cache is flawed. I've also seen pip install taking longer with cache enabled, bizarrely. pip-accel and basket appear to be better options.Debbiedebbra
If you are using this method, please use the XDG cache dir, ie, export PIP_DOWNLOAD_CACHE="${XDG_CACHE_HOME:-$HOME/.cache}/pip"Parlin
S
54

In my opinion, pip2pi is a much more elegant and reliable solution for this problem.

From the docs:

pip2pi builds a PyPI-compatible package repository from pip requirements

pip2pi allows you to create your own PyPI index by using two simple commands:

  1. To mirror a package and all of its requirements, use pip2tgz:

    $ cd /tmp/; mkdir package/
    $ pip2tgz packages/ httpie==0.2
    ...
    $ ls packages/
    Pygments-1.5.tar.gz
    httpie-0.2.0.tar.gz
    requests-0.14.0.tar.gz
    
  2. To build a package index from the previous directory:

    $ ls packages/
    bar-0.8.tar.gz
    baz-0.3.tar.gz
    foo-1.2.tar.gz
    $ dir2pi packages/
    $ find packages/
    /httpie-0.2.0.tar.gz
    /Pygments-1.5.tar.gz
    /requests-0.14.0.tar.gz
    /simple
    /simple/httpie
    /simple/httpie/httpie-0.2.0.tar.gz
    /simple/Pygments
    /simple/Pygments/Pygments-1.5.tar.gz
    /simple/requests
    /simple/requests/requests-0.14.0.tar.gz
    
  3. To install from the index you built in step 2., you can simply use:

    pip install --index-url=file:///tmp/packages/simple/ httpie==0.2
    

You can even mirror your own index to a remote host with pip2pi.

Shortsighted answered 21/9, 2012 at 6:40 Comment(3)
+1 pip2pip works great!! I don't like relying on network connectivity that much. It fails when you most need it.Styria
this works great, it answers my question #18052717 , can yon answer there as well ?Sleekit
Maybe it was implied, but it's worth mentioning explicitly: pip2tgz detects if you have already downloaded the package to the designated directory, so if you run the same install line or several install lines that have overlapping dependencies, it will only download each package once.Elastic
L
35

For newer Pip versions:

Newer Pip versions now cache downloads by default. See this documentation:

https://pip.pypa.io/en/stable/topics/caching/

For older Pip versions:

Create a configuration file named ~/.pip/pip.conf, and add the following contents:

[global]
download_cache = ~/.cache/pip

On OS X, a better path to choose would be ~/Library/Caches/pip since it follows the convention other OS X programs use.

Landry answered 11/4, 2013 at 12:22 Comment(4)
And If I wanted to store them globally for other users of the same PC to access? How would I do that? I figure the config file would have to be placed in /etc or something.Strow
@batandwa: That might work. If not, you could try this: make sure that all the users have a pip.conf with a download_cache setting that points to the same system-wide directory.Landry
The link in the answer is not directly working anymore. This is probably the new location: pip.pypa.io/en/stable/topics/cachingCarding
@Carding Thanks, I've edited the answer. (Feel free to edit the answer yourself in the future, for edits like this.)Landry
B
32

PIP_DOWNLOAD_CACHE has some serious problems. Most importantly, it encodes the hostname of the download into the cache, so using mirrors becomes impossible.

The better way to manage a cache of pip downloads is to separate the "download the package" step from the "install the package" step. The downloaded files are commonly referred to as "sdist files" (source distributions) and I'm going to store them in a directory $SDIST_CACHE.

The two steps end up being:

pip install --no-install --use-mirrors -I --download=$SDIST_CACHE <package name>

Which will download the package and place it in the directory pointed to by $SDIST_CACHE. It will not install the package. And then you run:

pip install --find-links=file://$SDIST_CACHE --no-index --index-url=file:///dev/null <package name> 

To install the package into your virtual environment. Ideally, $SDIST_CACHE would be committed under your source control. When deploying to production, you would run only the second pip command to install the packages without downloading them.

Boneblack answered 27/8, 2012 at 18:23 Comment(5)
Gabriel -- It's not downloaded twice, just once in the first step and then installed from local cache in the second. What are you seeing?Boneblack
If I run the first step twice, it'll download it twice, right? At least it happened here. I'll need to know that the first step has been executed for this package at least once before executing it, otherwise it'll download the same file twice. How can I check either if I need to execute it or it has been downloaded before?Ligon
You probably just want to use pip2pi as the other answer suggests. :)Boneblack
does this download the dependencies as well?Aviculture
I use pip 18.1 and option --no-install is not present. Any idea on how to update this answer?Scorper
B
14

Starting in version 6.0, pip now does it's own caching:

  • DEPRECATION pip install --download-cache and pip wheel --download-cache command line flags have been deprecated and the functionality removed. Since pip now automatically configures and uses it’s internal HTTP cache which supplants the --download-cache the existing options have been made non functional but will still be accepted until their removal in pip v8.0. For more information please see https://pip.pypa.io/en/latest/reference/pip_install.html#caching

More information from the above link:

Starting with v6.0, pip provides an on by default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the --no-cache-dir option.

Borderline answered 29/12, 2014 at 4:40 Comment(0)
S
9

pip wheel is an excellent option that does what you want with the extra feature of pre-compiling the packages. From the official docs:

Build wheels for a requirement (and all its dependencies):

$ pip wheel --wheel-dir=/tmp/wheelhouse SomePackage

Now your /tmp/wheelhouse directory has all your dependencies precompiled, so you can copy the folder to another server and install everything with this command:

$ pip install --no-index --find-links=/tmp/wheelhouse SomePackage

Note that not all the the packages will be completely portable across machines. Some packages will be built specifically for the Python version, OS distribution and/or hardware architecture that you're using. That will be specified in the file name, like -cp27-none-linux_x86_64 for CPython 2.7 on a 64-bit Linux, etc.

Shout answered 29/12, 2014 at 21:27 Comment(0)
S
5

Using pip only (my version is 1.2.1), you can also build up a local repository like this:

if ! pip install --find-links="file://$PIP_SDIST_INDEX" --no-index <package>; then
    pip install --download-directory="$PIP_SDIST_INDEX" <package>
    pip install --find-links="file://$PIP_SDIST_INDEX" --no-index <package>
fi

In the first call of pip, the packages from the requirements file are looked up in the local repository (only), and then installed from there. If that fails, pip retrieves the packages from its usual location (e.g. PyPI) and downloads it to the PIP_SDIST_INDEX (but does not install anything!). The first call is "repeated" to properly install the package from the local index.

(--download-cache creates a local file name which is the complete (escaped) URL, and pip cannot use this as an index with --find-links. --download-cache will use the cached file, if found. We could add this option to the second call of pip, but since the index already functions as a kind of cache, it does not necessarily bring a lot. It would help if your index is emptied, for instance.)

Sosna answered 10/12, 2012 at 12:21 Comment(0)
C
3

A simpler option is basket.

Given a package name, it will download it and all dependencies to a central location; without any of the drawbacks of pip cache. This is great for offline use.

You can then use this directory as a source for pip:

pip install --no-index -f file:///path/to/basket package

Or easy_install:

easy_install -f ~/path/to/basket -H None package

You can also use it to update the basket whenever you are online.

Committee answered 22/3, 2014 at 11:5 Comment(1)
Limitations (from the official page): Basket downloads source distributions only, it cannot download packages that are not hosted on PyPI and it ignores version requirements (e.g. "nose>=1.1.2"), always downloading the latest version.Shout
D
3

There is a new solution to this called pip-accel, a drop-in replacement for pip with caching built in.

The pip-accel program is a wrapper for pip, the Python package manager. It accelerates the usage of pip to initialize Python virtual environments given one or more requirements files. It does so by combining the following two approaches:

  • Source distribution downloads are cached and used to generate a local index of source distribution archives.

  • Binary distributions are used to speed up the process of installing dependencies with binary components (like M2Crypto and LXML). Instead of recompiling these dependencies again for every virtual environment we compile them once and cache the result as a binary *.tar.gz distribution.

Paylogic uses pip-accel to quickly and reliably initialize virtual environments on its farm of continuous integration slaves which are constantly running unit tests (this was one of the original use cases for which pip-accel was developed). We also use it on our build servers.

We've seen around 10x speedup from switching from pip to pip-accel.

Debbiedebbra answered 2/10, 2014 at 13:50 Comment(0)
M
0

I found the following to be useful for downloading packages and then installing from those downloads:

pip download -d "$SOME_DIRECTORY" some-package

Then to install:

pip install --no-index --no-cache-dir --find-links="$SOME_DIRECTORY"

Where $SOME_DIRECTORY is the path to the directory that the packages are to be downloaded to.

Military answered 25/5, 2021 at 22:52 Comment(0)
A
-1

I think the package "pip-accel" must be a good choice.

Allegiance answered 11/6, 2018 at 2:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.