Python: Should I save PyPi packages offline as a backup?
Asked Answered
C

3

8

My Python projects heavily depends on PyPi packages.
I want to make sure that: in any time in the future: the packages required by my apps will always be available online on PyPi.
For example:-
I found a project on Github that requires PyQt4.
when I tried to run it on my Linux machine,
it crashed on startup because it can't find PyQt4 package on PyPi.

NB: I know that PyQt4 is deprecated

I searched a lot to find an archive for PyPi that still holds PyQt4 package, but I couldn't find them anywhere.

so I had to rewrite that app to make it work on PyQt5.
I only changed the code related to the UI (ie: PyQt4).
other functions were still working.

so the only problem with that app was that PyQt4 package was removed from PyPi.



so, my question is: should I save a backup of the PyPi packages I use ?
Chewy answered 13/5, 2022 at 16:41 Comment(7)
pypi.org/project/PyQt4 ?Tolliver
This might be a stupid question, but did you try pip install PyQt4?Tolliver
yes, I had tried that. PyQt4 is not installable by this way. i.sstatic.net/otZF9.pngChewy
It's not a bad idea to use a local Nexus repository for something like thisSnowshoe
Ah, my mistake, it's listed on PyPi even though the downloadable files are elsewhere. You could try installing those. Although, submitting a pull request to the Github project with your updates is probably the best option, long-term.Tolliver
What's wrong in keeping some backup. After all it's going to consume some space in your hard disc/server. Also, Even after having the backup, you're going to continue using the already installed version only. So, I think it depends upon the individual's preference. I have a friend who always avoid elevator and climb thru staircase even to the 15th floor just because he came across a power failure when he was inside the elevator once. But We can't tell that he is wrong. What is wrong when he is comfortable in being that much cautious?Hickson
Maybe a Docker image?Silicosis
P
5

Short version:

YES, if you want availability... The next big question is how best to keep a backup version of the dependencies? There are some suggestions at the end of this answer.

Long version:

Your question touches on the concept of "Availability" which is one of the three pillars of Information Assurance (or Information Security). The other two pillars are Confidentiality and Integrity... The CIA triad.

PyPI packages are maintained by the owners of those packages, a project that depends on a package and list it as a dependency must take into account the possibility that the owner of the package will pull the package or a version of the package out of PyPI at any moment.

Important Python packages with many dependencies usually are maintained by foundations or organizations that are more responsible with dealing with downstream dependent packages and projects. However keeping support for old packages is very costly and requires extra effort and usually maintainers set a date for end of support, or publish a package lifecycle where they state when a specific version will be removed from the public PyPI server.

Once that happens, the dependents have to update their code (as you did), or provide the original dependency via alternative means.

This topic is very important for procurement in libraries, universities, laboratories, companies, and government agencies where a software tool might have dependencies on other software packages (or ecosystem), and where "availability" should be addressed adequately.

Addressing this risk might mean anything from ensuring high availability at all costs, to living with the risk of losing one or more dependencies... A risk management approach should be used to make informed choices affecting the "security" of your project.

Also it should be noted that, some packages require binary executable or binary libraries or access to a an online API service, which should also be available for the package to work properly, and that complicates the risk analysis and complicates the activities necessary to address availability.

Now to make sure that dependencies are always available... I quickly compiled the following list. Note that each option has pros and cons. You should evaluate these and other options based on your needs:

  1. Store the virtual environment along with the code. Once you create a virtual environment and install the packages you require for the project in that virtual environment, you can keep the virtual environment as part of your repository for example for posterity.
  2. Host your own PyPI instance (or mirror) and keep a copy of packages you depend upon hosted on it: https://packaging.python.org/en/latest/guides/hosting-your-own-index/
  3. Use an "artifact management tool" such as Artifactory from https://jfrog.com/artifact-management/, where you can not only host python packages but also Docker images, nmap packages, and other kinds of artifacts.
  4. Get the source code of all dependencies, and always build from source.
  5. Create a Docker image where the project works properly and keep backups of the image.
  6. If the package requires an online API service, think about replacing that service or mocking it by one you can control.
Pistol answered 5/2 at 19:0 Comment(2)
A HACK that works is to create a virtual environment for your all libraries, you activate it for using as you should. Also, you can check it into your repo for keeps. This is NOT ideal, but if you want to be able to run it on a device without ability to download the libraries that is what I would do.Fluff
Thank you Alessandro, I believe your method is covered in #1. right?Pistol
S
2

Given that the package files are available on PyPI, you can use pip to download the *.whl files compiled for specific OS via:

pip download --only-binary=:all: package_name

or the source distribution files *.tar.gz via:

pip download --no-binary=:all: package_name

Both should download the package files for all the available platforms when you specify :all:.

Alternatively, if the package files are not directly available on PyPI but on some archive as is the case for PyQt4, you can manually download those files.

Once you have the package files (either *.whl binaries or *.tar.gz), you should be able to install them without any internet connection from your local package files by:

pip install /path/to/local/package.whl # for *.whl files
pip install /path/to/local/package.tar.gz # for source *.tar.gz files

However, if you decide to backup your package files to a network storage location e.g. Google drive, you'd need an internet connection, since in this case, pip needs to retrieve the files from the URL and then install the package via:

pip install https://drive.google.com/drive/home/package_name.tar.gz
Sarcous answered 5/2 at 18:34 Comment(0)
N
1

I use Nexus Repository Manager OSS: https://help.sonatype.com/en/download.html

You can proxy pypi and host your own pypi internal images, can be set up directly from a docker image.

And you get the speedup and benefit of local pip install

You use global variables and pip parameters to set it up: Create and use a PyPi proxy repository on nexus

And Nexus can serve lots of other images: Docker, Nuget, Gems, Maven etc. So you have it one place. Restapi for everything.

Nephogram answered 6/2 at 9:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.