Sha256 Mismatch for pip install on fresh docker image [closed]
Asked Answered
T

1

6

I am trying to create a docker image with a bunch of python libraries.

On doing a pip install --no-cache-dir -r requirements.txt, it keeps failing with mismatched hashes for libraries, especially pyspark.

The failure message looks like -

ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    pyspark==2.4.6 from https://files.pythonhosted.org/packages/e9/e4/5c15ab8d354c4e3528510821865e6748209a9b0ff6a1788f4cd36cc2a5dc/pyspark-2.4.6.tar.gz#sha256=b4b319a3ffd187a3019f654ae1c8ac38048bcec2940f8cecdef829302d166feb (from -r requirements.txt (line 4)):
        Expected sha256 b4b319a3ffd187a3019f654ae1c8ac38048bcec2940f8cecdef829302d166feb
             Got        e15b72fe55a366df7329932882c56328874152cf618950c7ce45e11f1c9dc5d1

Some stuff that I have already tried, consulting other stack overflow threads -

  1. Remove __pycache__ and ~/.cache/ directories.
  2. Downgrade pip to 20.0.2, since there were threads saying that this was caused by pip20.1.0
  3. Also tried using --no-cache-dir as you can see.
  4. Build the docker image without caching using - docker build --no-cache . -t labydock-image

The Dockerfile looks something like this -

FROM python:3.7.7-stretch

USER root
RUN apt-get update && apt-get install -y --no-install-recommends openjdk-8-jdk

WORKDIR /labyrinth
COPY ./dodo.py .
COPY ./requirements.txt .

RUN pip install pip==20.0.2

RUN rm -rf ~/.cache
RUN rm -rf __pycache__

RUN pip install --no-cache-dir -r requirements.txt

And the corresponding requirements.txt file looks like this -

mockito==1.2.1
py4j==0.10.7
pypandoc==1.5
pyspark==2.4.6
pytest
pandas
doit
koalas
requests
presto-client
mysql-connector-python

Any help would be highly appreciated. Please let me know if you need more details

Taveras answered 4/7, 2020 at 7:14 Comment(8)
What docker image are you inheriting FROM? What does your requirements.txt file look like?Effulgent
@Omer Added these details.Taveras
Interesting, the Dockerfile builds successfully on my machineEffulgent
Try docker build --no-cache ., maybe one of the intermediate layers is corruptEffulgent
can you try with docker build -t --no-cache my_image .Chelicera
Tried building the docker image without cache. Did not work. :(Taveras
Try, on your host machine, to run curl https://files.pythonhosted.org/packages/e9/e4/5c15ab8d354c4e3528510821865e6748209a9b0ff6a1788f4cd36cc2a5dc/pyspark-2.4.6.tar.gz > temp.bin then run sha256sum temp.bin. What do you get?Effulgent
shasum temp.bin gives the following output - fd8707942264863bcc635f3ab7a6a1ec9b47784d temp.bin ( On a mac. Hence had to use shasum. )Taveras
T
2

After much deliberation, I could figure this out with the help of a colleague.

Our Root cause analysis is as follows - When we download a library, the server itself sends a SHA to be verified post download, to prevent tampering with the library. Now, due to poor internet connection, the library was getting corrupted while downloading and hence the sha mismatch happened. It sounds incredibly weird to me since I assumed pip would have detected this failure and prompted that the library download failed.

Anyway, we ran the same docker image on a VM and it worked like a charm.

Taveras answered 5/7, 2020 at 6:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.