I am trying to create a docker image with a bunch of python libraries.
On doing a pip install --no-cache-dir -r requirements.txt
, it keeps failing with mismatched hashes for libraries, especially pyspark.
The failure message looks like -
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
pyspark==2.4.6 from https://files.pythonhosted.org/packages/e9/e4/5c15ab8d354c4e3528510821865e6748209a9b0ff6a1788f4cd36cc2a5dc/pyspark-2.4.6.tar.gz#sha256=b4b319a3ffd187a3019f654ae1c8ac38048bcec2940f8cecdef829302d166feb (from -r requirements.txt (line 4)):
Expected sha256 b4b319a3ffd187a3019f654ae1c8ac38048bcec2940f8cecdef829302d166feb
Got e15b72fe55a366df7329932882c56328874152cf618950c7ce45e11f1c9dc5d1
Some stuff that I have already tried, consulting other stack overflow threads -
- Remove
__pycache__
and~/.cache/
directories. - Downgrade pip to 20.0.2, since there were threads saying that this was caused by pip20.1.0
- Also tried using
--no-cache-dir
as you can see. - Build the docker image without caching using -
docker build --no-cache . -t labydock-image
The Dockerfile looks something like this -
FROM python:3.7.7-stretch
USER root
RUN apt-get update && apt-get install -y --no-install-recommends openjdk-8-jdk
WORKDIR /labyrinth
COPY ./dodo.py .
COPY ./requirements.txt .
RUN pip install pip==20.0.2
RUN rm -rf ~/.cache
RUN rm -rf __pycache__
RUN pip install --no-cache-dir -r requirements.txt
And the corresponding requirements.txt file looks like this -
mockito==1.2.1
py4j==0.10.7
pypandoc==1.5
pyspark==2.4.6
pytest
pandas
doit
koalas
requests
presto-client
mysql-connector-python
Any help would be highly appreciated. Please let me know if you need more details
FROM
? What does yourrequirements.txt
file look like? – Effulgentdocker build --no-cache .
, maybe one of the intermediate layers is corrupt – Effulgentdocker build -t --no-cache my_image .
– Cheliceracurl https://files.pythonhosted.org/packages/e9/e4/5c15ab8d354c4e3528510821865e6748209a9b0ff6a1788f4cd36cc2a5dc/pyspark-2.4.6.tar.gz > temp.bin
then runsha256sum temp.bin
. What do you get? – Effulgentshasum temp.bin
gives the following output -fd8707942264863bcc635f3ab7a6a1ec9b47784d temp.bin
( On a mac. Hence had to use shasum. ) – Taveras