How to connect Amazon Redshift to python
Asked Answered
P

5

19

This is my python code and I want to connect my Amazon Redshift database to Python, but it is showing error in host.

Can anyone tell me the correct syntax? Am I passing all the parameters correctly?

con=psycopg2.connect("dbname = pg_table_def, host=redshifttest-icp.cooqucvshoum.us-west-2.redshift.amazonaws.com, port= 5439, user=me, password= secret")

This is the error:

OperationalError: could not translate host name "redshift://redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com," to address: Unknown host

Picaresque answered 20/7, 2017 at 10:34 Comment(2)
A Better Solution is already present in - #44243669Curtcurtail
Please include the module name you are using to establish a connectionEidolon
S
45

It appears that you wish to run Amazon Redshift queries from Python code.

The parameters you would want to use are:

  • dbname: This is the name of the database you entered in the Database name field when the cluster was created.
  • user: This is you entered in the Master user name field when the cluster was created.
  • password: This is you entered in the Master user password field when the cluster was created.
  • host: This is the Endpoint provided in the Redshift management console (without the port at the end): redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com
  • port: 5439

For example:

con=psycopg2.connect("dbname=sales host=redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com port=5439 user=master password=secret")
Syntactics answered 20/7, 2017 at 11:41 Comment(20)
Hello sir, i did same, but it is giving me operational error in host name, i have mentioned the error please look at it and help mePicaresque
Your host name is different to my host name. It is just the DNS name, not the extra stuff. Your dbname is also different. It should just be the name you entered when the cluster was created.Syntactics
Yes , i know but my host name is correct and i am sure about it, though its giving me errorPicaresque
The host should be set to a name that can be resolved by DNS. A hostname starting with redshift:// is invalid. Your dbname is also invalid.Syntactics
Yes i removed that and corrected both and then run the code but still error is samePicaresque
Then you should Edit your question because it is still showing the old values.Syntactics
The dbname , should be the name of database table which i have in red shift, right ?Picaresque
No, as per my answer above, the dbname is the name of the database you entered in the "Database name" field when the cluster was created. It is not the name of a table.Syntactics
Yes , i have changed to database name and username and password is changed to what i have for amazon redshift and now i have edited the question,though getting same errorPicaresque
Are you still receiving the error "Unknown host"? If you are using redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com as your host, then the reason is that this DNS name does not resolve, probably because it does not exist. You should use the Endpoint listed in your Amazon Redshift console. You can test it by doing a PING against the name, which should return an IP address.Syntactics
Actually, the correct host name is " host=redshifttest-icp.cooqucvshoum.us-west-2.redshift.amazonaws.com " and this do exist , i didnt posted correct host name firstPicaresque
Yes, that name works. Please update the Question to show your current error message.Syntactics
Ah! It's the commas. I had my example wrong. See The psycopg2 module content and there are two formats -- one as a long string with no commas, and another with comma-separated individual arguments. So, remove the commas from your string and see what happens.Syntactics
Hello sir, i want to approach you as i am having some problem with AWS VPC. can you help me to resolve?Picaresque
This appears to be a different topic, so please create a new Question.Syntactics
I have posted that questionPicaresque
Can you tell me how to approach youPicaresque
Thank you sir for your helpPicaresque
What if my redshift url is a jdbc connection string like: jdbc:redshift://<database_name>.<corporation>.com:5439/<database_name>?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactoryArezzini
@Arezzini Please create a new question rather than asking via a comment on an old question.Syntactics
P
2

Old question but I just arrived here from Google.

The accepted answer doesn't work with SQLAlchemy, although it's powered by psycopg2:

sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'dbname=... host=... port=... user=... password=...'

What worked:

create_engine(f"postgresql://{REDSHIFT_USER}:{REDSHIFT_PASSWORD}@{REDSHIFT_HOST}:{REDSHIFT_PORT}/{REDSHIFT_DATABASE}")

Which works with psycopg2 directly too:

psycopg2.connect(f"postgresql://{REDSHIFT_USER}:{REDSHIFT_PASSWORD}@{REDSHIFT_HOST}:{REDSHIFT_PORT}/{REDSHIFT_DATABASE}")

Using the postgresql dialect works because Amazon Redshift is based on PostgreSQL.

Hope it can help other people!

Protector answered 14/9, 2021 at 16:41 Comment(0)
S
0

To connect to redshift, you need the postgres+psycopg2 Install it as For Python 3.x:

pip3 install psycopg2-binary

And then use

return create_engine(
        "postgresql+psycopg2://%s:%s@%s:%s/%s"
        % (REDSHIFT_USERNAME, urlquote(REDSHIFT_PASSWORD), REDSHIFT_HOST, RED_SHIFT_PORT,
           REDSHIFT_DB,)
    )
Subordinary answered 25/7, 2022 at 6:4 Comment(0)
P
-2

Well, for Redshift the idea is made COPY from S3, is faster than every different way, but here is some example to do it:

first you must install some dependencies

for linux users sudo apt-get install libpq-dev

for mac users brew install libpq

install with pip this dependencies pip3 install psycopg2-binary pip3 install sqlalchemy pip3 install sqlalchemy-redshift

import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker


#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<<
DATABASE = "dwtest"
USER = "youruser"
PASSWORD = "yourpassword"
HOST = "dwtest.awsexample.com"
PORT = "5439"
SCHEMA = "public"

S3_FULL_PATH = 's3://yourbucket/category_pipe.txt'
ARN_CREDENTIALS = 'arn:aws:iam::YOURARN:YOURROLE'
REGION = 'us-east-1'

############ CONNECTING AND CREATING SESSIONS ############
connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###########################################################



############ RUNNING COPY ############
copy_command = '''
copy category from '%s'
credentials 'aws_iam_role=%s'
delimiter '|' region '%s';
''' % (S3_FULL_PATH, ARN_CREDENTIALS, REGION)
s.execute(copy_command)
s.commit()
######################################



############ GETTING DATA ############
query = "SELECT * FROM category;"
rr = s.execute(query)
all_results =  rr.fetchall()

def pretty(all_results):
    for row in all_results :
        print("row start >>>>>>>>>>>>>>>>>>>>")
        for r in row :
            print(" ---- %s" % r)
        print("row end >>>>>>>>>>>>>>>>>>>>>>")

pretty(all_results)
s.close()
######################################
Platform answered 23/1, 2019 at 17:1 Comment(0)
S
-3

The easiest way to query AWS Redshift from python is through this Jupyter extension - Jupyter Redshift

Not only can you query and save your results but also write them back to the database from within the notebook environment.

Sankhya answered 17/12, 2018 at 10:43 Comment(1)
Cool idea. But premature to start publicizing it.Almira

© 2022 - 2024 — McMap. All rights reserved.