Python Google Cloud Function won't install Pandas
Asked Answered
H

2

7

I'm deploying a Python function as a Google Cloud Function. It tests fine locally and deploys to GCP without complaint. However, when I actually execute it, it crashes with...

Error: function terminated. Recommended action: inspect logs for termination reason. Details:
The pandas library is not installed, please install pandas to use the to_dataframe() function.

My requirements.txt is as follows (and have verified that it is actually being uploaded when the function is deployed)...

appdirs==1.4.3
APScheduler==3.6.3
beautifulsoup4==4.8.2
cachetools==4.0.0
certifi==2019.11.28
chardet==3.0.4
click==7.1.1
distlib==0.3.0
filelock==3.0.12
Flask==1.1.1
google-api-core==1.16.0
google-api-python-client==1.8.0
google-auth==1.12.0
google-auth-httplib2==0.0.3
google-cloud-bigquery==1.24.0
google-cloud-core==1.3.0
google-cloud-storage==1.26.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
grpcio==1.27.2
httplib2==0.17.0
idna==2.9
itsdangerous==1.1.0
Jinja2==2.11.1
MarkupSafe==1.1.1
numpy==1.18.2
pandas==1.0.3
pipenv==2018.11.26
protobuf==3.11.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
python-dateutil==2.8.1
pytz==2019.3
requests==2.23.0
rsa==4.0
six==1.14.0
soupsieve==2.0
tzlocal==2.0.0
uritemplate==3.0.1
urllib3==1.25.8
virtualenv==20.0.15
virtualenv-clone==0.5.4
Werkzeug==1.0.1
wget==3.2

Here is some more detail from the cloud function log...

 severity: "ERROR"  
 textPayload: "Traceback (most recent call last):
  File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 383, in run_background_function
    _function_handler.invoke_user_function(event_object)
  File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 217, in invoke_user_function
    return call_user_function(request_or_event)
  File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 214, in call_user_function
    event_context.Context(**request_or_event.context))
  File "/user_code/main.py", line 27, in sync_nyt_counties
    macdata()
  File "/user_code/main.py", line 108, in macdata
    df = query_job.to_dataframe()
  File "/env/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 3374, in to_dataframe
    create_bqstorage_client=create_bqstorage_client,
  File "/env/local/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1706, in to_dataframe
    raise ValueError(_NO_PANDAS_ERROR)
ValueError: The pandas library is not installed, please install pandas to use the to_dataframe() function.
" 

I'm pulling my hair out! And ideas?

Thanks!

Update

To eliminate other possible influences, I created a minimal function to demonstrate the problem. Previously, I would only see the error upon execution of the function when the Google BigQuery API was attempting to use pandas. Now I've moved the problem to the fore by adding an import in my main.py. Now I get a failure when trying to deploy the function (don't have to wait until runtime anymore).

main.py

import pandas as pd

def hello_world(request):
    """Responds to any HTTP request.
    Args:
        request (flask.Request): HTTP request object.
    Returns:
        The response text or any set of values that can be turned into a
        Response object using
        `make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
    """
    request_json = request.get_json()
    if request.args and 'message' in request.args:
        return request.args.get('message')
    elif request_json and 'message' in request_json:
        return request_json['message']
    else:
        return f'Hello World!'

requirements.txt

pandas

Deployment command...

gcloud functions deploy hello_world --runtime python37 --trigger-http

error

Deploying function (may take a while - up to 2 minutes)...failed.
ERROR: (gcloud.functions.deploy) OperationError: code=3, message=Function failed on loading user code. Error message: Code in file main.py can't be loaded.
Did you list all required modules in requirements.txt?
Detailed stack trace: Traceback (most recent call last):
  File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 305, in check_or_load_user_function
    _function_handler.load_user_function()
  File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 184, in load_user_function
    spec.loader.exec_module(main)
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/user_code/main.py", line 1, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

gcloud version

Google Cloud SDK 287.0.0
alpha 2019.05.17
beta 2019.05.17
bq 2.0.56
core 2020.03.30
gsutil 4.49

Additional Notes:

  • I'm running this on Windows 10
  • I thought maybe I had something screwy in my GCP project, so I tried deploying this to a different project. Same result!

Is it possible it's a problem related to the client-side? I wouldn't think so, but I'm brand new to Python and I feel like I did some weird things to my Python installation at some point and I don't know if the cloud SDK uses some of it behind the scenes when deploying.

Hawkie answered 4/4, 2020 at 18:3 Comment(14)
What do you see in the logs of that Cloud Function?Garcon
Pretty much the same thing, but just gives a line number to the google lib that is attempting to use panda. I'll update my post with some of that log output in case I'm missing something.Hawkie
Can you show us the code that's attempting to use pandas? Is there a stacktrace?Bertabertasi
The issue must be related to the usage. I've just deployed a hello-world Python Cloud Function adding the listed dependencies to requirements.txt and importing pandas successfully.Garcon
By the way, how did you generate your requirements.txt file? It looks like it might be the output of pip freeze, which means that you probably don't need all these dependencies for your application. For example, Flask shouldn't be necessary here.Bertabertasi
Keep in mind that Cloud Functions are best suited for single-purpose tasks. If you need complex functionality take a look into alternative compute options such as GAE, Cloud Run, GCE or GKEGarcon
Thanks for looking into this. I agree that it must be something I'm doing. Maybe I should do like you and start fresh with a minimal function using pandas and go from there. And yes, you're right I did use pip freeze. I planned on cleaning all that up at a later point (I also did try with pandas as the only dependency, same result).Hawkie
@DustinIngram I'm not using it directly, it is a Google BigQuery API that is calling it... uery_job.to_dataframe(). I've added additional detail to my original post.Hawkie
I don't see any reason that the BigQuery client library would be failing, it's just doing a try: import pandas; except ImportError: pandas = None and then later if pandas is None: raise ValueError(_NO_PANDAS_ERROR): github.com/googleapis/python-bigquery/blob/master/google/cloud/…Bertabertasi
Sure, in that case can you show us the code attempting to use the client library? Without some example we can't try to reproduce this.Bertabertasi
@DustinIngram I gotta run for a little bit. When I get back, I'm going to start from scratch and create a minimal test case. That may lead me down the road to fixing the problem (which I'll post here) or I may still have the problem and then can provide you guys with everything to reproduce. Thanks for your help!!Hawkie
Now I've tried using the basic hello_world function from the GCP doc and if I 'import pandas' it fails upon deployment. I must be doing something really stupid somewhere :( I've updated my original post with lots of new details.Hawkie
Is this still an issue? Did you attempt to make a clean install? Have you attempted to use the latest pandas version? pandas 1.2.4Pennoncel
Does this answer your question? Unable to install pandas for pythonHetrick
T
2

Given that the OP attempted creating a minimal function to test pandas working inside Cloud Functions without success, I think it would be helpful and relevant to post an answer useful for anyone attempting to do the same process. This is so that doubts can be cleared up for current visitors, as this question constantly receives visits due to the generic title.

Installing/Configuring Google Cloud SDK

As OP was trying to deploy their function from their local machine, it is required to install the GCP Cloud SDK. This can be done following the documentation. Since I did this from scratch (in Debian 10, with Python 3.7.3), these are the commands I used from the doc:

Adding the Cloud SDK package sources:

echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list

Importing the GCP public key

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -

Installing the Cloud SDK

sudo apt-get update && sudo apt-get install google-cloud-sdk

Seeing the installed utilities with gcloud components list:

enter image description here

Once the SDK is installed, you must authenticate using your account credentials running gcloud auth login. This will open a browser window for you to grant access. Once authenticated, choose the project to which you want to deploy your Cloud Function with gcloud config set project <PROJECT_NAME>

Writing your function

I kept this function rather simple, to make this answer less complex for general users and since the OP also attempted a minimal example without success. I used sample code from the GCP documentation and pandas examples. The only dependency required is pandas, but you can also install flask and functions-framework to test the functions locally.

main.py

import pandas as pd
from flask import escape

def pandas_http(request):
    df = pd.DataFrame(
        {
            "Name": [
                "Braund, Mr. Owen Harris",
                "Allen, Mr. William Henry",
                "Bonnell, Miss. Elizabeth",
            ],
            "Age": [22, 35, 58],
            "Sex": ["male", "male", "female"],
        }
    )

    request_json = request.get_json(silent=True)
    request_args = request.args

    if request_json and 'name' in request_json:
        name = request_json['name']
    elif request_args and 'name' in request_args:
        name = request_args['name']
    else:
        name = 'World'

    return 'Hello {}! here is your data: \n{}'.format(escape(name), df)

requirements.txt

pandas
flask

Finally, I used the following deploy command from within the same directory containing the main.py:

gcloud functions deploy pandas-gcp-test --entry-point pandas_http --runtime python37 --trigger-http

It works as expected once deployed, by checking with the built in function tester from the Cloud Console, the function outputs the dataframe correctly and pandas does in fact run within the function:

enter image description here

Tanganyika answered 21/1, 2022 at 22:6 Comment(0)
M
0

I don't like this answer because doing it the documented way and it working doesn't explain why OP was wrong. I removed the version number from Pandas in my requirements.txt and that is how I solved the issue.

Marriage answered 22/8 at 21:38 Comment(2)
Please comment the answer you do not like, instead of posting another answer yourself (if this is your answer).Antimasque
Please comment the answer you do not like, instead of posting another answer yourself (if this is your answer).Antimasque

© 2022 - 2024 — McMap. All rights reserved.