Connecting Python/pandas to Redshift when SSL is required

Asked 24/6, 2015 at 15:32 Answered 29/5, 2022 at 11:33

python ssl pandas psycopg2 amazon-redshift

My company recently changed our Redshift cluster and now they require an SSL connection. In the past I've connected Python/pandas to Redshift via the method I've detailed here: http://measureallthethin.gs/blog/connect-python-and-pandas-to-redshift/

From the SQLAlchemy documentation, looks like all I need to do is add connect_args={'sslmode':'require'} to the create_engine() call, as this thread pointed out: How do I connect to Postgresql using SSL from SqlAchemy+pg8000?

However, I now get this error:

OperationalError: (psycopg2.OperationalError) sslmode value "require" invalid when SSL support is not compiled in

I use the Anaconda distribution for a number of packages, and found I needed to update my psycopg2 package per these instructions: https://groups.google.com/a/continuum.io/d/msg/conda/Fqv93VKQXAc/mHqfNK8xZWsJ

However, even after updating psycopg2 I'm still getting the same error and am at a loss at this point on how to further debug. I'd like to figure this out so I can get our Redshift data directly into pandas.

Hitherto answered 24/6, 2015 at 15:32 Comment(1)

Try using either "verify-full" or "verify-ca". When I had previously tried connecting using any other sslmode, I received the message:"[Amazon](500155) The value for property sslmode is invalid. Valid values are: verify-full, verify-ca." – Priory 21/3, 2019 at 18:29

AWS has developed an Amazon Redshift connector for Python (here is the GitHub repo) that helps in the process.

In order to install it on may install from the source

git clone https://github.com/aws/amazon-redshift-python-driver.git
cd redshift_connector
pip install .

Or from the binary using PyPi

pip install redshift_connector

Or Conda

conda install -c conda-forge redshift_connector

Here is an example

import redshift_connector

# Connects to Redshift cluster using AWS credentials
conn = redshift_connector.connect(
    host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com',
    database='dev',
    user='awsuser',
    password='my_password'
 )

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("create Temp table book(bookname varchar,author varchar)")
cursor.executemany("insert into book (bookname, author) values (%s, %s)",
                    [
                        ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                        ('A Brief History of Time', 'Stephen Hawking')
                    ]
                  )
cursor.execute("select * from book")

result: tuple = cursor.fetchall()
print(result)
>> (['One Hundred Years of Solitude', 'Gabriel García Márquez'], ['A Brief History of Time', 'Stephen Hawking'])

Note that one of the Connection Parameters one can pass is SSL (If SSL is enabled). The default value is TRUE.

Narcosynthesis answered 22/5, 2021 at 7:39 Comment(2)

for anyone considering this for aws lambda... :thumbs-down: redshift_connector didn't work for my lambda use. i bundled it but lambda wanted scramp and then asn1crypto and then i stopped. know how to do this for aws lambda? – Icky 25/5, 2021 at 18:22

Is this of help to you? Is the scrap issue similar to this? – Criticism 26/5, 2021 at 7:20

How about something like this ?

pip3 install pg8000


import pg8000
host =  "***.amazonaws.com"
port = 5439

db_name = "dev"
master_username = "dev"
master_user_password = 'TestPassword1'


connection = pg8000.connect(
    host=host,
    port=port,
    database=db_name,
    user=master_username,
    password=master_user_password,
)
connection.autocommit = True
import uuid
prefix = 'table_'
with connection.cursor() as cursor:
  table_name = f"{prefix}{str(uuid.uuid4())[:4]}"
  tbl_create_sql = f"CREATE TABLE {table_name} (a int, b int);"
  cursor.execute(tbl_create_sql)
  # Step 2: insert a number of rows
  ROWS_TO_INSERT = 5

Alfrediaalfredo answered 29/5, 2022 at 11:33 Comment(0)

Recommended topics

Hot tags