Using psycopg2 with Lambda to Update Redshift (Python)
Asked Answered
J

9

27

I am attempting to update Redshift from a Lambda function using python. To do this, I am attempting to combine 2 code fragments. Both fragments are functional when I run them separately.

  1. Updating Redshift from PyDev for Eclipse

    import psycopg2
    
    conn_string = "dbname='name' port='0000' user='name' password='pwd' host='url'"
    conn = psycopg2.connect(conn_string)
    
    cursor = conn.cursor()
    
    cursor.execute("UPDATE table SET attribute='new'")
    conn.commit()
    cursor.close()
    
  2. Receiving Content Uploaded to S3 Bucket (Pre-Built Template Available on Lambda)

    from __future__ import print_function
    
    import json
    import urllib
    import boto3
    
    print('Loading function')
    
    s3 = boto3.client('s3')
    
    
    def lambda_handler(event, context):
        #print("Received event: " + json.dumps(event, indent=2))
    
        # Get the object from the event and show its content type
        bucket = event['Records'][0]['s3']['bucket']['name']
        key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf8')
    
        try:
            response = s3.get_object(Bucket=bucket, Key=key)
            print("CONTENT TYPE: " + response['ContentType'])
            return response['ContentType']
    
        except Exception as e:
            print(e)
            print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
            raise e
    

Since both of these segments worked, I tried to combine them so that I could update Redshift upon the upload of a file to s3:

from __future__ import print_function

import json
import urllib
import boto3
import psycopg2

print('Loading function')

s3 = boto3.client('s3')


def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))

    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf8')

    conn_string = "dbname='name' port='0000' user='name' password='pwd' host='url'"

    conn = psycopg2.connect(conn_string)

    cursor = conn.cursor()

    cursor.execute("UPDATE table SET attribute='new'")
    conn.commit()
    cursor.close()

    try:
        response = s3.get_object(Bucket=bucket, Key=key)
        print("CONTENT TYPE: " + response['Body'].read())
        return response['Body'].read()
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

Since I am using an outside library, I need to create a deployment package. I created a new folder (lambda_function1) and moved my .py file (lambda_function1.py) to that folder. I ran the following command to install psycopg2 in that folder:

pip install psycopg2 -t \lambda_function1

I receive the following feedback:

Collecting psycopg2
  Using cached psycopg2-2.6.1-cp34-none-win_amd64.whl
Installing collected packages: psycopg2
Successfully installed psycopg2-2.6.1 

I then zipped the contents of the directory. And uploaded that zip to my lambda function. When I upload a document to the bucket the function monitors, I receive the following error in my cloudwatch log:

Unable to import module 'lambda_function1': No module named _psycopg 

When I look in the library, the only thing named "_psycopg" is "_psycopg.pyd".

What is causing this problem? Does it matter that Lambda uses Python 2.7 when I use 3.4? Does it matter that I zipped the contents of my file on a Windows machine? Has anyone been able to successfully connect to Redshift from lambda?

Jodoin answered 13/4, 2016 at 19:35 Comment(1)
Sadly you will not be able to use the Windows-created wheel file on Lambda. You may be able to use pg8000, which is Pure Python, with Redshift on Lambda.Coagulum
P
32

In order for this to work you need to build psycopg2 with statically linked libpq.so library. Check out this repo https://github.com/jkehler/awslambda-psycopg2. It has already build psycopg2 package and instructions how to build it yourself.

Back to your questions:

What is causing this problem?

psycopg2 needs to be built and compiled with statically linked libraries for Linux.

Does it matter that Lambda uses Python 2.7 when I use 3.4?

Yes it does, lambda only supports 2.7 version. Just create virtual environment and install all necessary packages in there.

Does it matter that I zipped the contents of my file on a Windows machine?

As long as all the libraries you zipped could ran on Linux it doesn't

Has anyone been able to successfully connect to Redshift from lambda?

yes.

Pyuria answered 13/4, 2016 at 20:28 Comment(5)
Thank you SO much! I downloaded and extracted the zip from git hub and pasted it into my deployment package and it worked immediately (no need to alter the version of Python). Very grateful for your help!Jodoin
When I try a update, similar to part 1 of the question : import psycopg2 , conn_string = "dbname='............ etc , conn = psycopg2.connect(conn_string) , cursor = conn.cursor() , cursor.execute("Update Query") It gives an error as : DatabaseError: SSL SYSCALL error: Operation timed outBoyse
@Pyuria I copied the psycopg2 folder from the git and pastes lambda zip still getting below error. Unable to import module 'core_handler': No module named psycopg2._psycopgAshleyashli
Update: Lambda does now support either python 2.7 or python 3.6Ramble
I forked the repo in the answer, compiled with Python 3.7 + PostgreSQL 10.7, here it is: github.com/pzmosquito/awslambda-psycopg2Valkyrie
M
17

I just came across this same problem. I stumbled across the same github project that was noted in the other answer which explained the problem as follows:

Due to AWS Lambda missing the required PostgreSQL libraries in the AMI image, we needed to compile psycopg2 with the PostgreSQL libpq.so library statically linked libpq library instead of the default dynamic link.

This has been noted in the previous answer, and I started to follow the instructions to build myself a version of psycopg2 with a statically linked PostgreSQL library. I found a much easier option though. I noticed on the psycopg2 github page the following:

You can also obtain a stand-alone package, not requiring a compiler or external libraries, by installing the psycopg2-binary package from PyPI:

$ pip install psycopg2-binary

The binary package is a practical choice for development and testing but in production it is advised to use the package built from sources.

When I pip installed the psycopg2-binary package and included it in my requirements.txt file I was able to connect to my postgresql database from a lambda function flawlessly. I am using chalice which I highly recommend. I realize that psycopg2 recommends not using the binary version for production, but I don't see a huge difference between using the binary version or compiling and statically linking it yourself. Someone please correct me if I'm wrong on that.

Morgenthaler answered 28/4, 2019 at 0:36 Comment(2)
This found for meHaphazard
I'm curious what OS you're using. My guess is that this would only work if the platform that you're running pip install psycopg2-binary is binary-compatible with AWS Lambda.Secco
R
12

To use psycopg2 with aws lambda, use import aws-psycopg2

As aws supports psycopg2 but the way to import psycopg2 is a bit different as aws itself has a compiled library for psycopg2, so we need to import is using aws-psycopg2

Rivy answered 1/8, 2019 at 15:6 Comment(3)
the only solutionFrangible
What are you installing though? This?Molal
Only thing that worked for meInunction
I
12

oh boy! while some of the answers may be really great and working! Just stumbled upon this https://pypi.org/project/aws-psycopg2/ and it worked like a charm for me. Steps:

mkdir aws-psycopg2

cd aws-psycopg2

vi get_layer_packages.sh

    export PKG_DIR="python"
    
    rm -rf ${PKG_DIR} && mkdir -p ${PKG_DIR}
    
    docker run --rm -v $(pwd):/foo -w /foo lambci/lambda:build-python3.6 \
        pip install -r requirements.txt --no-deps -t ${PKG_DIR}

vi requirements.txt

    aws-psycopg2

then do :
chmod +x get_layer_packages.sh

./get_layer_packages.sh

zip -r aws-psycopg2.zip .

Upload this zip to the AWS Lambda Layer!

Isoelectronic answered 23/3, 2020 at 17:2 Comment(2)
You sir saved me A LOT of time and trouble, thank you !Scrubby
Clean, I also got to setup my local docker for the first time. ThxCoelostat
F
5

This is the simplest way I found in all the methods I tried:

I added a lamdba layer to my lambda functions that needed psycopg2. Here is a list of available Lambda layers: https://github.com/jetbridge/psycopg2-lambda-layer

I'm using the serverless framework and this is what my Lambda function looks like:

functions:
  example:
    handler: handler.example
    layers:
      - arn:aws:lambda:us-east-1:898466741470:layer:psycopg2-py37:3
    events:
      - http:
          path: example
          method: post
          authorizer: aws_iam
          cors: true
Funda answered 8/8, 2021 at 19:27 Comment(1)
How did you generate the arn for that python package?Conversazione
N
4

Another way to use psycopg2 on lambda (if you are programing on windows and using python 3.6 on lambda)

  1. In your machine create a directorie called python
  2. Donwload the lib psycopg2 from https://pypi.org/project/psycopg2-binary/, search for the correct pakage (amazon linux is 86_64): psycopg2_binary-2.8.4-cp36-cp36m-manylinux1_x86_64.whl
  3. Unzip this to the directorie python that you created, you can use 7-zip for .whl
  4. Zip the directorie python
  5. Now in the AWS lambda panel, create a layer with the python.zip that you have
  6. Finally add the layer to your lambda funcion
Noctambulism answered 22/1, 2020 at 22:4 Comment(3)
This solution worked! I had to go to releases and get latest manylinux x86_64 package for Python 3.8 and it worked like a charm!Dissonant
This is 100% the solution! Also mentioned here: https://github.com/jkehler/awslambda-psycopg2/issues/51Sarazen
I tried the below and it worked thanks psycopg2_binary-2.9.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whlDespondency
F
1

Assuming your packaging is correct, the no module named psycopg2 error typically indicates the binary file(s) of your psycopg2 deployment is incorrect for your target OS or Python version.

For Lambdas, we have found that the psycopg2 binary works (using manylinux_x86_64). There is a reported risk of segfault due to the presence of competing libssl binaries though we haven't had that yet. (this is basically a +1 for jshammon's answer above)

The "proper solution" is probably jkehlers recompile specifically for Lambda missing only lib_pq.so, but it doesn't currently support ssl+py3.7 and we are too Windows to recompile it ourselves.

Flin answered 8/7, 2019 at 13:24 Comment(0)
S
0

So many answers but didn't work for me! Please note AWS-Lambda doesn't have most of DB-related libraries preinstalled and so you need to add a zip file with code and libraries to get it working. Please follow the steps here: Link

Sophistication answered 30/9, 2020 at 18:51 Comment(0)
A
0

I have read a few solutions including the ones where user customised the psychopg2 module year on Github and the issue still persist.

I recently solved it by uninstalling the current psychopg2 on my Macbook and then went to pip site to install it.

  1. Try installing pip install psycopg2 on your terminal.
  2. pip freeze (to repack the modules that you have on your directory before you wrap it).
  3. Upload it on Lambda. It worked for me. Although I have to say this module is just a bandaid solution.
Aubree answered 11/8, 2022 at 4:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.